• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 97
  • Last Modified:

remove a combination of patterns from a file

I need to remove from a file, through sed command, any strings not containing @british.com  or @ba.com
0
sunny82
Asked:
sunny82
  • 6
  • 5
  • 4
1 Solution
 
Abhimanyu SuriCommented:
Is it that you only need "sed" based solution or grep will work for you as well, anyways

lets say, you have a file

com.txt

jhfuierbv@british.com
huihwegbfuewv@ba.com
hvd;ivhav@hfrihv.com
fewuivguhuier@nhvn.com
erbv@british.com
egbfuewv@ba.com

egrep "@british.com|@ba.com" com.txt > com_new.txt

jhfuierbv@british.com
huihwegbfuewv@ba.com
erbv@british.com
egbfuewv@ba.com

sed -n -e  '/@british.com/p' -e '/@ba.com/p' com.txt > com_new.txt


jhfuierbv@british.com
huihwegbfuewv@ba.com
erbv@british.com
egbfuewv@ba.com
0
 
sunny82Author Commented:
its not working if the file is tab-delimited.

Say like this ->

erbv@british.com         huihwegbfuewv@ba.com
hvd     ivhav@hfrihv.com
fewuivguhuier@nhvn.com  erbv@british.com
erbv@british.com        egbfuewv@abc.com

It should only show ->
erbv@british.com         huihwegbfuewv@ba.com
erbv@british.com
erbv@british.com

Either egrep or sed will do.
0
 
Abhimanyu SuriCommented:
Please try

egrep -o -e "\w*@british.com" -e "\w*@ba.com" com.txt

In case \w doesn't work for you, please try

grep -oh -e "[[:alpha:]]*@british.com" -e "[[:alpha:]]*@ba.com" com.txt
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
tel2Commented:
Will you accept a Perl solution, Sunny?
Type:
    perl -v
from your command line, and if you don't get an error message, then Perl is probably installed.

Hi Abhimanyu   8)
0
 
Abhimanyu SuriCommented:
Hey tel2, just replied you on the other one :D

And Sunny, you should go with perl if environment allows :)
0
 
tel2Commented:
Good to meet you again, Abhimanyu.   8)

Looking at your 1st solution:
    egrep -o -e "\w*@british.com" -e "\w*@ba.com" com.txt
It will (incorrectly) match things like:
    abc@britishxcom.uk
because '.' matches any character.
Also, '\w' will match '_' (which is good), but won't match things like '-' or '.' which are valid in email addresses.
I think this would solve all the above problems:
    egrep -o "(\w|\.|-)*@(british|ba)\.com"
but it needs to not match if there's nothing valid before the '@' (e.g. an address which is just '@ba.com'), then you might have to do something like this:
    egrep -o "(\w|\.|-)(\w|\.|-)*@(british|ba)\.com"
I expect that still wouldn't meet the RFC for all valid email addresses, but would hopefully cater for all those which will be run into by the user.

And your 2nd solution:
    grep -oh -e "[[:alpha:]]*@british.com" -e "[[:alpha:]]*@ba.com" com.txt
This also has the above mentioned problems, but won't even match '_' in addresses.
0
 
Abhimanyu SuriCommented:
\. Was a miss
\w .. I was not aware that it doesn't support ./-

Thanks for corrections
Apologies for brevity, in commute :)

Cheers to experts-exchange, probably I would not have ever known about \w supported characters, being an Oracle DBA I just deal with ORA error codes ;)
0
 
tel2Commented:
Glad to be of service, Abhimanyu.

Thanks for introducing me to grep's "-o" switch.
0
 
Abhimanyu SuriCommented:
Anither thought, for the very last solution i.e.

grep -oh -e "[[:alpha:]]*@british.com" -e "[[:alpha:]]*@ba.com" com.txt

How about replacing alpha with alnum

That should be able to handle alphanumeric values not sure though
0
 
tel2Commented:
Yes, [[::alnum::]] will match alphanumerics (which brings it closer to '\w'), but it still won't handle '_', '-' & '.'.
0
 
sunny82Author Commented:
Thank you. I wiil try the solutions tomorrow and let you now. As always, many many thanks
0
 
sunny82Author Commented:
@tel2, yes any perl solutions for this will be much appreciated also
0
 
tel2Commented:
Hi Sunny,

Before I try Perl, what problems are you having with Abhimanyu's grep solutions, which I adjusted in one of my posts above, i.e.?:
    egrep -o "(\w|\.|-)*@(british|ba)\.com"
but if it needs to not match if there's nothing valid before the '@' (e.g. an address which is just '@ba.com'), then you might have to do something like this:
    egrep -o "(\w|\.|-)(\w|\.|-)*@(british|ba)\.com"

And can you give us some more realistic sample data, please.  Feel free to hack the email addresses to preserve privacy.
0
 
tel2Commented:
Hi again Sunny,

I've just realised that this solution:
    egrep -o "(\w|\.|-)(\w|\.|-)*@(british|ba)\.com" com.txt
will fail if it finds something like:
    me@ba.com.au
because it will match it and print:
    me@ba.com

I've just written this Perl solution which avoids that problem:
    perl -ne 'print "$1\n" while(/([\w.-]+\@(british|ba)\.com)[^.\w]/g)' com.txt
but I expect there will be some (hopefully rare) cases it won't handle, hence my request for better test data.

A more perfect solution could probably be written using a Perl module like Email::Address (it is supposed to comply with RFC 2822).  Create a script called matching_addrs.pl (or whatever), and put this in it:
#!/usr/bin/perl

use Email::Address;

while (<>)
{
        @all_addrs = Email::Address->parse($_);
        @matching_addrs = grep(/@(british|ba)\.com$/, @all_addrs);
        print join("\n", @matching_addrs) . "\n" if @matching_addrs;
}

Open in new window

Now make it executable:
    chmod u+x matching_addrs.pl
and run it, feeding your address file to it as STDIN:
    ./matching_addrs.pl <address.list
If you get an error it may be because the Email::Address module is not yet installed.

If you want to be able to match upper case characters to (e.g. 'abc@BA.COM') then let me know and I can easily adjust any of my solutions to handle that.
0
 
sunny82Author Commented:
Both the solutions-Abhimanyu and tel2 working just fine. Thanks a lot.
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

  • 6
  • 5
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now