?
Solved

remove a combination of patterns from a file

Posted on 2016-08-11
15
Medium Priority
?
87 Views
Last Modified: 2016-08-12
I need to remove from a file, through sed command, any strings not containing @british.com  or @ba.com
0
Comment
Question by:sunny82
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
  • 4
15 Comments
 
LVL 5

Expert Comment

by:Abhimanyu Suri
ID: 41753096
Is it that you only need "sed" based solution or grep will work for you as well, anyways

lets say, you have a file

com.txt

jhfuierbv@british.com
huihwegbfuewv@ba.com
hvd;ivhav@hfrihv.com
fewuivguhuier@nhvn.com
erbv@british.com
egbfuewv@ba.com

egrep "@british.com|@ba.com" com.txt > com_new.txt

jhfuierbv@british.com
huihwegbfuewv@ba.com
erbv@british.com
egbfuewv@ba.com

sed -n -e  '/@british.com/p' -e '/@ba.com/p' com.txt > com_new.txt


jhfuierbv@british.com
huihwegbfuewv@ba.com
erbv@british.com
egbfuewv@ba.com
0
 

Author Comment

by:sunny82
ID: 41753125
its not working if the file is tab-delimited.

Say like this ->

erbv@british.com         huihwegbfuewv@ba.com
hvd     ivhav@hfrihv.com
fewuivguhuier@nhvn.com  erbv@british.com
erbv@british.com        egbfuewv@abc.com

It should only show ->
erbv@british.com         huihwegbfuewv@ba.com
erbv@british.com
erbv@british.com

Either egrep or sed will do.
0
 
LVL 5

Expert Comment

by:Abhimanyu Suri
ID: 41753148
Please try

egrep -o -e "\w*@british.com" -e "\w*@ba.com" com.txt

In case \w doesn't work for you, please try

grep -oh -e "[[:alpha:]]*@british.com" -e "[[:alpha:]]*@ba.com" com.txt
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 12

Expert Comment

by:tel2
ID: 41753155
Will you accept a Perl solution, Sunny?
Type:
    perl -v
from your command line, and if you don't get an error message, then Perl is probably installed.

Hi Abhimanyu   8)
0
 
LVL 5

Expert Comment

by:Abhimanyu Suri
ID: 41753164
Hey tel2, just replied you on the other one :D

And Sunny, you should go with perl if environment allows :)
0
 
LVL 12

Expert Comment

by:tel2
ID: 41753176
Good to meet you again, Abhimanyu.   8)

Looking at your 1st solution:
    egrep -o -e "\w*@british.com" -e "\w*@ba.com" com.txt
It will (incorrectly) match things like:
    abc@britishxcom.uk
because '.' matches any character.
Also, '\w' will match '_' (which is good), but won't match things like '-' or '.' which are valid in email addresses.
I think this would solve all the above problems:
    egrep -o "(\w|\.|-)*@(british|ba)\.com"
but it needs to not match if there's nothing valid before the '@' (e.g. an address which is just '@ba.com'), then you might have to do something like this:
    egrep -o "(\w|\.|-)(\w|\.|-)*@(british|ba)\.com"
I expect that still wouldn't meet the RFC for all valid email addresses, but would hopefully cater for all those which will be run into by the user.

And your 2nd solution:
    grep -oh -e "[[:alpha:]]*@british.com" -e "[[:alpha:]]*@ba.com" com.txt
This also has the above mentioned problems, but won't even match '_' in addresses.
0
 
LVL 5

Expert Comment

by:Abhimanyu Suri
ID: 41753184
\. Was a miss
\w .. I was not aware that it doesn't support ./-

Thanks for corrections
Apologies for brevity, in commute :)

Cheers to experts-exchange, probably I would not have ever known about \w supported characters, being an Oracle DBA I just deal with ORA error codes ;)
0
 
LVL 12

Expert Comment

by:tel2
ID: 41753190
Glad to be of service, Abhimanyu.

Thanks for introducing me to grep's "-o" switch.
0
 
LVL 5

Expert Comment

by:Abhimanyu Suri
ID: 41753207
Anither thought, for the very last solution i.e.

grep -oh -e "[[:alpha:]]*@british.com" -e "[[:alpha:]]*@ba.com" com.txt

How about replacing alpha with alnum

That should be able to handle alphanumeric values not sure though
0
 
LVL 12

Expert Comment

by:tel2
ID: 41753210
Yes, [[::alnum::]] will match alphanumerics (which brings it closer to '\w'), but it still won't handle '_', '-' & '.'.
0
 

Author Comment

by:sunny82
ID: 41753229
Thank you. I wiil try the solutions tomorrow and let you now. As always, many many thanks
0
 

Author Comment

by:sunny82
ID: 41753230
@tel2, yes any perl solutions for this will be much appreciated also
0
 
LVL 12

Expert Comment

by:tel2
ID: 41753246
Hi Sunny,

Before I try Perl, what problems are you having with Abhimanyu's grep solutions, which I adjusted in one of my posts above, i.e.?:
    egrep -o "(\w|\.|-)*@(british|ba)\.com"
but if it needs to not match if there's nothing valid before the '@' (e.g. an address which is just '@ba.com'), then you might have to do something like this:
    egrep -o "(\w|\.|-)(\w|\.|-)*@(british|ba)\.com"

And can you give us some more realistic sample data, please.  Feel free to hack the email addresses to preserve privacy.
0
 
LVL 12

Accepted Solution

by:
tel2 earned 2000 total points
ID: 41753258
Hi again Sunny,

I've just realised that this solution:
    egrep -o "(\w|\.|-)(\w|\.|-)*@(british|ba)\.com" com.txt
will fail if it finds something like:
    me@ba.com.au
because it will match it and print:
    me@ba.com

I've just written this Perl solution which avoids that problem:
    perl -ne 'print "$1\n" while(/([\w.-]+\@(british|ba)\.com)[^.\w]/g)' com.txt
but I expect there will be some (hopefully rare) cases it won't handle, hence my request for better test data.

A more perfect solution could probably be written using a Perl module like Email::Address (it is supposed to comply with RFC 2822).  Create a script called matching_addrs.pl (or whatever), and put this in it:
#!/usr/bin/perl

use Email::Address;

while (<>)
{
        @all_addrs = Email::Address->parse($_);
        @matching_addrs = grep(/@(british|ba)\.com$/, @all_addrs);
        print join("\n", @matching_addrs) . "\n" if @matching_addrs;
}

Open in new window

Now make it executable:
    chmod u+x matching_addrs.pl
and run it, feeding your address file to it as STDIN:
    ./matching_addrs.pl <address.list
If you get an error it may be because the Email::Address module is not yet installed.

If you want to be able to match upper case characters to (e.g. 'abc@BA.COM') then let me know and I can easily adjust any of my solutions to handle that.
0
 

Author Comment

by:sunny82
ID: 41754449
Both the solutions-Abhimanyu and tel2 working just fine. Thanks a lot.
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This tech tip describes how to install the Solaris Operating System from a tape backup that was created using the Solaris flash archive utility. I have used this procedure on the Solaris 8 and 9 OS, and it shoudl also work well on the Solaris 10 rel…
Recently, an awarded photographer, Selina De Maeyer (http://www.selinademaeyer.com/), completed a photo shoot of a beautiful event (http://www.sintjacobantwerpen.be/verslag-en-fotoreportage-van-de-sacramentsprocessie-door-antwerpen#thumbnails) in An…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…
Suggested Courses
Course of the Month13 days, 8 hours left to enroll

801 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question