faithless1
asked on
SED Output Column Before and After Match
Hi I have a large file with this type of data (tab delimited)
aaron ross who is he dating
aaron spelling dating
aaron t jeffries dating profile
aarp dating service
a basic guide to dating edwin m knowles china
abbreviation for ipa in dating
abbreviations gl dating
I'm looking for a way to output column before match and after (i.e. mtach dating). In case when dating is the last or the first, I'm looking to output 2 columns before or after.
Example of desired output:
is he dating
aaron spelling dating
jeffries dating profile
aarp dating service
to dating edwin
iipa in dating
abbreviations gl dating
aaron ross who is he dating
aaron spelling dating
aaron t jeffries dating profile
aarp dating service
a basic guide to dating edwin m knowles china
abbreviation for ipa in dating
abbreviations gl dating
I'm looking for a way to output column before match and after (i.e. mtach dating). In case when dating is the last or the first, I'm looking to output 2 columns before or after.
Example of desired output:
is he dating
aaron spelling dating
jeffries dating profile
aarp dating service
to dating edwin
iipa in dating
abbreviations gl dating
Your example output is correct although your description was a bit murky and I can understand why aPresence didn't understand. Your intention is to always output exactly three columns, either the leftmost three columns if the match is leftmost, the rightmost three columns if the match is rightmost, otherwise the left ad right adjacent columns along with the matched column. I have attached a Python program which generates the exact output you describe given the exact sample input you describe.
A couple of notes: the program acepts input in a file called "columnmatch.txt" and writes the output to "columnmatch.out". It could be easily modified to accommodate other input and output schemes. Also the match I use is rather primitive, just a case-insensitive, whole-word match on the column value. For more sophisticated matching, there are rich regular-expression searching features available in the Python "re" library. Just say "import re" in the program and do a little Googling to find out its capabilities.
columnmatch.py
A couple of notes: the program acepts input in a file called "columnmatch.txt" and writes the output to "columnmatch.out". It could be easily modified to accommodate other input and output schemes. Also the match I use is rather primitive, just a case-insensitive, whole-word match on the column value. For more sophisticated matching, there are rich regular-expression searching features available in the Python "re" library. Just say "import re" in the program and do a little Googling to find out its capabilities.
columnmatch.py
This should work:
perl -ane 'chomp; if( $F[0] eq "dating" ){print "dating @F[1,2]\n"} elsif( $F[-1] eq "dating" ){print "@F[-3,-2] dating\n"} else{foreach $i(0..$#F){ if( $F[$i] eq "dating" ){print "$F[$i-1] dating $F[$i+1]\n"; last}}}' input_file
perl -ane 'chomp; if( $F[0] eq "dating" ){print "dating @F[1,2]\n"} elsif( $F[-1] eq "dating" ){print "@F[-3,-2] dating\n"} else{foreach $i(0..$#F){ if( $F[$i] eq "dating" ){print "$F[$i-1] dating $F[$i+1]\n"; last}}}' input_file
Here's a less terse version
use strict;
while( <> ) {
chomp;
my @F = split;
if( $F[0] eq "dating" ) {
print "dating @F[1,2]\n";
} elsif( $F[-1] eq "dating" ) {
print "@F[-3,-2] dating\n";
} else {
foreach my $i(0..$#F) {
if( $F[$i] eq "dating" ) {
print "$F[$i-1] dating $F[$i+1]\n";
last;
}
}
}
}
I think the correct output you are looking for is this (your example is incorrect):
is he dating
aaron spelling dating
jeffries dating profile
aarp dating service
to dating edwin
ipa in dating
abbreviations gl dating
I diff'd the requested output and my output and they look correct... except for the iipa typo on line 6 and the leading space on line 5.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks!!
Hi faithless1,
out of curiousity, what made you choose shaleesh's solution over the Perl solution I offered earlier?
I believe mine produced the expected output.
Please let me know.
Thanks.
out of curiousity, what made you choose shaleesh's solution over the Perl solution I offered earlier?
I believe mine produced the expected output.
Please let me know.
Thanks.
Faithless1 clearly wanted a SED solution. I was the first one to figure out what he was really asking for and my solution was the first to create the correct output, but it was in Python. It would have been preferable had faithles1 not categorized this question in the perl and python zones which wasted the efforts of a numer of Perl and Python experts.
Thanks Richard.
test5.in:
aaron ross who is he dating
aaron spelling dating
aaron t jeffries dating profile
aarp dating service
a basic guide to dating edwin m knowles china
abbreviation for ipa in dating
abbreviations gl dating
sample output:
is he dating
aaron spelling dating
t jeffries dating profile
aarp dating service
guide to dating edwin m
ipa in dating
abbreviations gl dating
Open in new window