Solved

remove a combination of patterns from a file

Posted on 2016-08-11
15
48 Views
Last Modified: 2016-08-12
I need to remove from a file, through sed command, any strings not containing @british.com  or @ba.com
0
Comment
Question by:sunny82
  • 6
  • 5
  • 4
15 Comments
 
LVL 4

Expert Comment

by:Abhimanyu Suri
ID: 41753096
Is it that you only need "sed" based solution or grep will work for you as well, anyways

lets say, you have a file

com.txt

jhfuierbv@british.com
huihwegbfuewv@ba.com
hvd;ivhav@hfrihv.com
fewuivguhuier@nhvn.com
erbv@british.com
egbfuewv@ba.com

egrep "@british.com|@ba.com" com.txt > com_new.txt

jhfuierbv@british.com
huihwegbfuewv@ba.com
erbv@british.com
egbfuewv@ba.com

sed -n -e  '/@british.com/p' -e '/@ba.com/p' com.txt > com_new.txt


jhfuierbv@british.com
huihwegbfuewv@ba.com
erbv@british.com
egbfuewv@ba.com
0
 

Author Comment

by:sunny82
ID: 41753125
its not working if the file is tab-delimited.

Say like this ->

erbv@british.com         huihwegbfuewv@ba.com
hvd     ivhav@hfrihv.com
fewuivguhuier@nhvn.com  erbv@british.com
erbv@british.com        egbfuewv@abc.com

It should only show ->
erbv@british.com         huihwegbfuewv@ba.com
erbv@british.com
erbv@british.com

Either egrep or sed will do.
0
 
LVL 4

Expert Comment

by:Abhimanyu Suri
ID: 41753148
Please try

egrep -o -e "\w*@british.com" -e "\w*@ba.com" com.txt

In case \w doesn't work for you, please try

grep -oh -e "[[:alpha:]]*@british.com" -e "[[:alpha:]]*@ba.com" com.txt
0
 
LVL 11

Expert Comment

by:tel2
ID: 41753155
Will you accept a Perl solution, Sunny?
Type:
    perl -v
from your command line, and if you don't get an error message, then Perl is probably installed.

Hi Abhimanyu   8)
0
 
LVL 4

Expert Comment

by:Abhimanyu Suri
ID: 41753164
Hey tel2, just replied you on the other one :D

And Sunny, you should go with perl if environment allows :)
0
 
LVL 11

Expert Comment

by:tel2
ID: 41753176
Good to meet you again, Abhimanyu.   8)

Looking at your 1st solution:
    egrep -o -e "\w*@british.com" -e "\w*@ba.com" com.txt
It will (incorrectly) match things like:
    abc@britishxcom.uk
because '.' matches any character.
Also, '\w' will match '_' (which is good), but won't match things like '-' or '.' which are valid in email addresses.
I think this would solve all the above problems:
    egrep -o "(\w|\.|-)*@(british|ba)\.com"
but it needs to not match if there's nothing valid before the '@' (e.g. an address which is just '@ba.com'), then you might have to do something like this:
    egrep -o "(\w|\.|-)(\w|\.|-)*@(british|ba)\.com"
I expect that still wouldn't meet the RFC for all valid email addresses, but would hopefully cater for all those which will be run into by the user.

And your 2nd solution:
    grep -oh -e "[[:alpha:]]*@british.com" -e "[[:alpha:]]*@ba.com" com.txt
This also has the above mentioned problems, but won't even match '_' in addresses.
0
 
LVL 4

Expert Comment

by:Abhimanyu Suri
ID: 41753184
\. Was a miss
\w .. I was not aware that it doesn't support ./-

Thanks for corrections
Apologies for brevity, in commute :)

Cheers to experts-exchange, probably I would not have ever known about \w supported characters, being an Oracle DBA I just deal with ORA error codes ;)
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 11

Expert Comment

by:tel2
ID: 41753190
Glad to be of service, Abhimanyu.

Thanks for introducing me to grep's "-o" switch.
0
 
LVL 4

Expert Comment

by:Abhimanyu Suri
ID: 41753207
Anither thought, for the very last solution i.e.

grep -oh -e "[[:alpha:]]*@british.com" -e "[[:alpha:]]*@ba.com" com.txt

How about replacing alpha with alnum

That should be able to handle alphanumeric values not sure though
0
 
LVL 11

Expert Comment

by:tel2
ID: 41753210
Yes, [[::alnum::]] will match alphanumerics (which brings it closer to '\w'), but it still won't handle '_', '-' & '.'.
0
 

Author Comment

by:sunny82
ID: 41753229
Thank you. I wiil try the solutions tomorrow and let you now. As always, many many thanks
0
 

Author Comment

by:sunny82
ID: 41753230
@tel2, yes any perl solutions for this will be much appreciated also
0
 
LVL 11

Expert Comment

by:tel2
ID: 41753246
Hi Sunny,

Before I try Perl, what problems are you having with Abhimanyu's grep solutions, which I adjusted in one of my posts above, i.e.?:
    egrep -o "(\w|\.|-)*@(british|ba)\.com"
but if it needs to not match if there's nothing valid before the '@' (e.g. an address which is just '@ba.com'), then you might have to do something like this:
    egrep -o "(\w|\.|-)(\w|\.|-)*@(british|ba)\.com"

And can you give us some more realistic sample data, please.  Feel free to hack the email addresses to preserve privacy.
0
 
LVL 11

Accepted Solution

by:
tel2 earned 500 total points
ID: 41753258
Hi again Sunny,

I've just realised that this solution:
    egrep -o "(\w|\.|-)(\w|\.|-)*@(british|ba)\.com" com.txt
will fail if it finds something like:
    me@ba.com.au
because it will match it and print:
    me@ba.com

I've just written this Perl solution which avoids that problem:
    perl -ne 'print "$1\n" while(/([\w.-]+\@(british|ba)\.com)[^.\w]/g)' com.txt
but I expect there will be some (hopefully rare) cases it won't handle, hence my request for better test data.

A more perfect solution could probably be written using a Perl module like Email::Address (it is supposed to comply with RFC 2822).  Create a script called matching_addrs.pl (or whatever), and put this in it:
#!/usr/bin/perl

use Email::Address;

while (<>)
{
        @all_addrs = Email::Address->parse($_);
        @matching_addrs = grep(/@(british|ba)\.com$/, @all_addrs);
        print join("\n", @matching_addrs) . "\n" if @matching_addrs;
}

Open in new window

Now make it executable:
    chmod u+x matching_addrs.pl
and run it, feeding your address file to it as STDIN:
    ./matching_addrs.pl <address.list
If you get an error it may be because the Email::Address module is not yet installed.

If you want to be able to match upper case characters to (e.g. 'abc@BA.COM') then let me know and I can easily adjust any of my solutions to handle that.
0
 

Author Comment

by:sunny82
ID: 41754449
Both the solutions-Abhimanyu and tel2 working just fine. Thanks a lot.
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

My previous tech tip, Installing the Solaris OS From the Flash Archive On a Tape (http://www.experts-exchange.com/articles/OS/Unix/Solaris/Installing-the-Solaris-OS-From-the-Flash-Archive-on-a-Tape.html), discussed installing the Solaris Operating S…
Every server (virtual or physical) needs a console: and the console can be provided through hardware directly connected, software for remote connections, local connections, through a KVM, etc. This document explains the different types of consol…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now