asked on

how to remove all email addresses from a comma delimited file except those from bloomberg domain

I have a comma-separated file containing lots of email addresses. I want to delete all email addresses except those from @bloomberg.net domain. How can I do this in Unix ?

Gerwin Jansen

grep "@bloomberg.net" original.csv > onlybloomberg.csv
(this gets you the bloomberg mail addresses in a new file)

If you want to delete the other addresses, move the file:

mv onlybloomberg.csv original.csv
(careful: original csv file gets overwritten)

The above assumes you have one address per line in the csv file.

sunny82

ASKER

So my file is like this-

1, abc@yahoo.com, pqr@gmail.com, abc@bloomberg.net,
2, pqr@bloomberg.net, abc@gmail.com, abc@hotmail.com, mno@yahoo.com
3,.................

I want to have-

1, abc@bloomberg.net,
2,pqr@bloomberg.net,
3,...............
So basically only remove all non Bloomberg email addresses but keep rest of the things.

How can I do this?

arnold

You need to use perl or sed or awk, I prefer perl, .......
Test first, on a copy even though this creates a backup file once perc heck
perl -pi.bal -e 's/([a-zA-Z0-9.-]+\@[^bloomberg\.com])//g;' filename

If that works, you'll be left with many commas that you would need to clear using

sed -pi.baksed -e 's/\,[,]+/\,/g' filename
This script will need to be run several times until, there is only commas in the front and the end that need to be removed.
...

sunny82

ASKER

not working. file has same output as original

Gerwin Jansen

Can you try:

sed 's/ [a-z0-9.-]*@bloomberg.net,//g' yourfile.txt

Open in new window

Do you get the output you would expect?

sunny82

ASKER

I tried ->

sed 's/ [a-z0-9.-]*@bloomberg.net,//g' test.csv > test1.csv

test.csv and test1.csv same output

test.csv
--------------
1,abc@bloomberg.net,abc,
2,abc@yahoo.com,pqr@bloomberg.net,
3,abc,pqr@bloomberg.net,mnr@gmail.com,

Desired output--
---------------------------------
1,abc@bloomberg.net,
2,pqr@bloomberg.net,
3,pqr@bloomberg.net,

sunny82

ASKER

something like this worked for me in perl, but still thinking how to do it as a unixone liner->
------------------------------------------------------------------------

open (IN, "<$input_file") or die "Can't open $input_file\n";
open (OUT, ">$output_file") or die "Can't open $output_file\n";

while (<IN>) {
chomp($_);
@results=split(/\,/,$_);
foreach my $i (1 .. $#results) {
if ($results[$i] =~ /\@bloomberg\.net/) {
print OUT join (',',$results[0],$results[$i]), "\n";
}
}
}

Gerwin Jansen

I totally misread your question, can you try this:

awk '{printf "%s ", $1} { for (n=1 ; n<=NF; n++) if ($n ~ /bloomberg.net/ ) printf "%s ", $n } { printf "\n" }' test.csv

Open in new window

One liner :)

sunny82

ASKER

I tried, with the awk command also ->

awk '{printf "%s ", $1} { for (n=1 ; n<=NF; n++) if ($n ~ /bloomberg.net/ ) printf "%s ", $n } { printf "\n" }' test.csv > test1.csv

Output of test1.csv
--------------------------

1,abc@bloomberg.net,abc, 1,abc@bloomberg.net,abc,
2,abc@yahoo.com,pqr@bloomberg.net, 2,abc@yahoo.com,pqr@bloomberg.net,
3,abc,pqr@bloomberg.net,mnr@gmail.com, 3,abc,pqr@bloomberg.net,mnr@gmail.com,

original file test.csv
--------------------------------------
1,abc@bloomberg.net,abc,
2,abc@yahoo.com,pqr@bloomberg.net,
3,abc,pqr@bloomberg.net,mnr@gmail.com,

arnold

Are the email addresses all on a single line or are they a combination spread across multipl
My negative pattern match focused on the domain and ...... likely failed because of that.
Must it be a single line?
doing the route you have , output or use push to add the email of bloomberg onto an array or and join them all at the end..

Member_2_276102

The sample input you've shown so far always has just a single @bloomberg.net address per line. Is there ever a line with more than one such address?

It seems that all you want to do is copy each @bloomberg.net address to a new output line, ignoring everything else. If there are two such addresses on one line, would you want them both on the same line in the output?

Gerwin Jansen

It is working on my end, what Unix OS and awk version do you have?

The awk I created will print the first field (containing the number) and then go over every field, match the requested mail address and print it if found.

SOLUTION

arnold

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER CERTIFIED SOLUTION

Gerwin Jansen

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

sunny82

ASKER

Thank you both the perl and awk solutions now work fine. Yes there can be multiple Bloomberg addresses in one line, but as output we can print them on separate lines.