Link to home
Start Free TrialLog in
Avatar of sunny82
sunny82

asked on

how to remove all email addresses from a comma delimited file except those from bloomberg domain

I have  a comma-separated file containing lots of email addresses. I want to delete all email addresses except those from @bloomberg.net domain. How can I do this in Unix ?
Avatar of Gerwin Jansen
Gerwin Jansen
Flag of Netherlands image

grep "@bloomberg.net" original.csv > onlybloomberg.csv
(this gets you the bloomberg mail addresses in a new file)

If you want to delete the other addresses, move the file:

mv onlybloomberg.csv original.csv
(careful: original csv file gets overwritten)

The above assumes you have one address per line in the csv file.
Avatar of sunny82
sunny82

ASKER

So my file is like this-

1, abc@yahoo.com, pqr@gmail.com, abc@bloomberg.net,
2, pqr@bloomberg.net, abc@gmail.com, abc@hotmail.com, mno@yahoo.com
3,.................

I want to have-

1, abc@bloomberg.net,
2,pqr@bloomberg.net,
3,...............
So basically only remove all non Bloomberg email addresses but keep rest of the things.

How can I do this?
You need to use perl or sed or awk, I prefer perl, .......
Test first, on a copy even though this creates a backup file once perc heck
perl -pi.bal -e 's/([a-zA-Z0-9.-]+\@[^bloomberg\.com])//g;' filename

If that works, you'll be left with many commas that you would need to clear using

sed -pi.baksed -e 's/\,[,]+/\,/g' filename
This script will need to be run several times until, there is only commas in the front and the end that need to be removed.
...
Avatar of sunny82

ASKER

not working. file has same output as original
Can you try:
sed 's/ [a-z0-9.-]*@bloomberg.net,//g' yourfile.txt

Open in new window

Do you get the output you would expect?
Avatar of sunny82

ASKER

I tried ->

sed 's/ [a-z0-9.-]*@bloomberg.net,//g' test.csv > test1.csv

test.csv and test1.csv same output

test.csv
--------------
1,abc@bloomberg.net,abc,
2,abc@yahoo.com,pqr@bloomberg.net,
3,abc,pqr@bloomberg.net,mnr@gmail.com,

Desired output--
---------------------------------
1,abc@bloomberg.net,
2,pqr@bloomberg.net,
3,pqr@bloomberg.net,
Avatar of sunny82

ASKER

something like this worked for me in perl, but still thinking how to do it as a unixone liner->
------------------------------------------------------------------------

open (IN, "<$input_file") or die "Can't open $input_file\n";
open (OUT, ">$output_file") or die "Can't open $output_file\n";

while (<IN>) {
 chomp($_);
 @results=split(/\,/,$_);
 foreach my $i (1 .. $#results) {
  if ($results[$i] =~ /\@bloomberg\.net/) {
   print OUT join (',',$results[0],$results[$i]), "\n";
   }
  }
}
I totally misread your question, can you try this:
awk '{printf "%s ", $1} { for (n=1 ; n<=NF; n++) if ($n ~ /bloomberg.net/ ) printf "%s ", $n } { printf "\n" }' test.csv

Open in new window

One liner :)
Avatar of sunny82

ASKER

I tried, with the awk command also ->

awk '{printf "%s ", $1} { for (n=1 ; n<=NF; n++) if ($n ~ /bloomberg.net/ ) printf "%s ", $n } { printf "\n" }' test.csv > test1.csv

Output of test1.csv
--------------------------

1,abc@bloomberg.net,abc, 1,abc@bloomberg.net,abc,
2,abc@yahoo.com,pqr@bloomberg.net, 2,abc@yahoo.com,pqr@bloomberg.net,
3,abc,pqr@bloomberg.net,mnr@gmail.com, 3,abc,pqr@bloomberg.net,mnr@gmail.com,

original file test.csv
--------------------------------------
1,abc@bloomberg.net,abc,
2,abc@yahoo.com,pqr@bloomberg.net,
3,abc,pqr@bloomberg.net,mnr@gmail.com,
Are the email addresses all on a single line or are they a combination spread across multipl
My negative pattern match focused on the domain and ...... likely failed because of that.
Must it be a single line?
doing the route you have , output or use push to add the email of bloomberg onto an array or and join them all at the end..
The sample input you've shown so far always has just a single @bloomberg.net address per line. Is there ever a line with more than one such address?

It seems that all you want to do is copy each @bloomberg.net address to a new output line, ignoring everything else. If there are two such addresses on one line, would you want them both on the same line in the output?
It is working on my end, what Unix OS and awk version do you have?

The awk I created will print the first field (containing the number) and then go over every field, match the requested mail address and print it if found.
SOLUTION
Avatar of arnold
arnold
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sunny82

ASKER

Thank you both the perl and awk solutions now work fine. Yes there can be multiple Bloomberg addresses in one line, but as output we can print them on separate lines.