?
Solved

SED Output Column Before and After Match

Posted on 2010-08-31
11
Medium Priority
?
606 Views
Last Modified: 2013-12-26
Hi I have a large file with this type of data (tab delimited)

aaron ross who is he dating
aaron spelling dating
aaron t jeffries dating profile
aarp dating service
a basic guide to dating edwin m knowles china
abbreviation for ipa in dating
abbreviations gl dating


I'm looking for a way to output column before match and after (i.e. mtach dating). In case when dating is the last or the first, I'm looking to output 2 columns before or after.

Example of desired output:

is he dating
aaron spelling dating
jeffries dating profile
aarp dating service
 to dating edwin
iipa in dating
abbreviations gl dating
0
Comment
Question by:faithless1
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
  • 2
  • +2
11 Comments
 
LVL 6

Expert Comment

by:apresence
ID: 33574158
Your example desired output is incorrect.  I know you asked for SED, but attached is a PERL script that will accomplish what you are asking for.  Assumption is test5.in contains the input tab delimited data you specified in the example.

test5.in:
aaron ross who is he dating
aaron spelling dating
aaron t jeffries dating profile
aarp dating service
a basic guide to dating edwin m knowles china
abbreviation for ipa in dating
abbreviations gl dating

sample output:
is he dating
aaron spelling dating
t jeffries dating profile
aarp dating service
guide to dating edwin m
ipa in dating
abbreviations gl dating
cat test5.in | perl -ne 'print "$1\n" if /(([^\s]+\s){0,2}dating(\s[^\s]+){0,2})/'

Open in new window

0
 
LVL 5

Expert Comment

by:-Richard-
ID: 33577360
Your example output is correct although your description was a bit murky and I can understand why aPresence didn't understand.  Your intention is to always output exactly three columns, either the leftmost three columns if the match is leftmost, the rightmost three columns if the match is rightmost, otherwise the left ad right adjacent columns along with the matched column.  I have attached a Python program which generates the exact output you describe given the exact sample input you describe.  

A couple of notes:  the program acepts input in a file called "columnmatch.txt" and writes the output to "columnmatch.out".   It could be easily modified to accommodate other input and output schemes.  Also the match I use is rather primitive, just a case-insensitive, whole-word match on the column value.  For more sophisticated matching, there are rich regular-expression searching features available in the Python "re" library.  Just say "import re" in the program and do a little Googling to find out its capabilities.

columnmatch.py
0
 
LVL 10

Expert Comment

by:jeromee
ID: 33577479
This should work:

perl -ane 'chomp; if( $F[0] eq "dating" ){print "dating @F[1,2]\n"} elsif( $F[-1] eq "dating" ){print "@F[-3,-2] dating\n"} else{foreach $i(0..$#F){ if( $F[$i] eq "dating" ){print "$F[$i-1] dating $F[$i+1]\n"; last}}}' input_file
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 10

Expert Comment

by:jeromee
ID: 33577560
Here's a less terse version

use strict;
while( <> ) {
   chomp; 
   my @F = split;
   if( $F[0] eq "dating" ) {
      print "dating @F[1,2]\n";
   } elsif( $F[-1] eq "dating" ) {
      print "@F[-3,-2] dating\n";
   } else {
      foreach my $i(0..$#F) { 
	 if( $F[$i] eq "dating" ) {
	    print "$F[$i-1] dating $F[$i+1]\n"; 
	    last;
	 }
      }
   }
}

Open in new window

0
 
LVL 6

Expert Comment

by:apresence
ID: 33581581
I think the correct output you are looking for is this (your example is incorrect):


is he dating
aaron spelling dating
jeffries dating profile
aarp dating service
to dating edwin
ipa in dating
abbreviations gl dating

Open in new window

0
 
LVL 10

Expert Comment

by:jeromee
ID: 33581838
I diff'd the requested output and my output and they look correct... except for the iipa typo on line 6 and the leading space on line 5.

0
 
LVL 3

Accepted Solution

by:
shaleesh earned 2000 total points
ID: 33597695
I hope this helps:



$ cat exam
aaron ross who is he dating
aaron spelling dating
aaron t jeffries dating profile
aarp dating service
a basic guide to dating edwin m knowles china
abbreviation for ipa in dating
abbreviations gl dating
$ cat exam |sed 's/.*\( \([a-z]*\) \([a-z]*\) dating\)/\1/'|sed 's/^[ ]*//'|sed 's/\( dating \([a-z]*\) \([a-z]*\)\).*/\1/'|sed 's/\(\([a-z]*\) dating \([a-z]*\)\).*/\1/'| sed 's/*^[ ]//'|sed 's/.*\( \([a-z]*\) dating \([a-z]*\)\)/\1/'|sed 's/^[ ]*//'
is he dating
aaron spelling dating
jeffries dating profile
aarp dating service
to dating edwin
ipa in dating
abbreviations gl dating
$
0
 

Author Closing Comment

by:faithless1
ID: 33607348
Thanks!!
0
 
LVL 10

Expert Comment

by:jeromee
ID: 33607670
Hi faithless1,
out of curiousity, what made you choose shaleesh's solution over the Perl solution I offered earlier?
I believe mine produced the expected output.
Please let me know.

Thanks.
0
 
LVL 5

Expert Comment

by:-Richard-
ID: 33616799
Faithless1 clearly wanted a SED solution.  I was the first one to figure out what he was really asking for and my solution was the first to create the correct output, but it was in Python.   It would have been preferable had faithles1 not categorized this question in the perl and python zones which wasted the efforts of a numer of Perl and Python experts.
0
 
LVL 10

Expert Comment

by:jeromee
ID: 33618899
Thanks Richard.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Background Still having to process all these year-end "csv" files received from all these sources (including Government entities), sometimes we have the need to examine the contents due to data error, etc... As a "Unix" shop, our only readily …
Flask is a microframework for Python based on Werkzeug and Jinja 2. This requires you to have a good understanding of Python 2.7. Lets install Flask! To install Flask you can use a python repository for libraries tool called pip. Download this f…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question