sunny82
asked on
how to remove all email addresses from a comma delimited file except those from bloomberg domain
I have a comma-separated file containing lots of email addresses. I want to delete all email addresses except those from @bloomberg.net domain. How can I do this in Unix ?
ASKER
So my file is like this-
1, abc@yahoo.com, pqr@gmail.com, abc@bloomberg.net,
2, pqr@bloomberg.net, abc@gmail.com, abc@hotmail.com, mno@yahoo.com
3,.................
I want to have-
1, abc@bloomberg.net,
2,pqr@bloomberg.net,
3,...............
So basically only remove all non Bloomberg email addresses but keep rest of the things.
How can I do this?
1, abc@yahoo.com, pqr@gmail.com, abc@bloomberg.net,
2, pqr@bloomberg.net, abc@gmail.com, abc@hotmail.com, mno@yahoo.com
3,.................
I want to have-
1, abc@bloomberg.net,
2,pqr@bloomberg.net,
3,...............
So basically only remove all non Bloomberg email addresses but keep rest of the things.
How can I do this?
You need to use perl or sed or awk, I prefer perl, .......
Test first, on a copy even though this creates a backup file once perc heck
perl -pi.bal -e 's/([a-zA-Z0-9.-]+\@[^bloo mberg\.com ])//g;' filename
If that works, you'll be left with many commas that you would need to clear using
sed -pi.baksed -e 's/\,[,]+/\,/g' filename
This script will need to be run several times until, there is only commas in the front and the end that need to be removed.
...
Test first, on a copy even though this creates a backup file once perc heck
perl -pi.bal -e 's/([a-zA-Z0-9.-]+\@[^bloo
If that works, you'll be left with many commas that you would need to clear using
sed -pi.baksed -e 's/\,[,]+/\,/g' filename
This script will need to be run several times until, there is only commas in the front and the end that need to be removed.
...
ASKER
not working. file has same output as original
Can you try:
sed 's/ [a-z0-9.-]*@bloomberg.net,//g' yourfile.txt
Do you get the output you would expect?
ASKER
I tried ->
sed 's/ [a-z0-9.-]*@bloomberg.net, //g' test.csv > test1.csv
test.csv and test1.csv same output
test.csv
--------------
1,abc@bloomberg.net,abc,
2,abc@yahoo.com,pqr@bloomb erg.net,
3,abc,pqr@bloomberg.net,mn r@gmail.co m,
Desired output--
-------------------------- -------
1,abc@bloomberg.net,
2,pqr@bloomberg.net,
3,pqr@bloomberg.net,
sed 's/ [a-z0-9.-]*@bloomberg.net,
test.csv and test1.csv same output
test.csv
--------------
1,abc@bloomberg.net,abc,
2,abc@yahoo.com,pqr@bloomb
3,abc,pqr@bloomberg.net,mn
Desired output--
--------------------------
1,abc@bloomberg.net,
2,pqr@bloomberg.net,
3,pqr@bloomberg.net,
ASKER
something like this worked for me in perl, but still thinking how to do it as a unixone liner->
-------------------------- ---------- ---------- ---------- ---------- ------
open (IN, "<$input_file") or die "Can't open $input_file\n";
open (OUT, ">$output_file") or die "Can't open $output_file\n";
while (<IN>) {
chomp($_);
@results=split(/\,/,$_);
foreach my $i (1 .. $#results) {
if ($results[$i] =~ /\@bloomberg\.net/) {
print OUT join (',',$results[0],$results[ $i]), "\n";
}
}
}
--------------------------
open (IN, "<$input_file") or die "Can't open $input_file\n";
open (OUT, ">$output_file") or die "Can't open $output_file\n";
while (<IN>) {
chomp($_);
@results=split(/\,/,$_);
foreach my $i (1 .. $#results) {
if ($results[$i] =~ /\@bloomberg\.net/) {
print OUT join (',',$results[0],$results[
}
}
}
I totally misread your question, can you try this:
awk '{printf "%s ", $1} { for (n=1 ; n<=NF; n++) if ($n ~ /bloomberg.net/ ) printf "%s ", $n } { printf "\n" }' test.csv
One liner :)
ASKER
I tried, with the awk command also ->
awk '{printf "%s ", $1} { for (n=1 ; n<=NF; n++) if ($n ~ /bloomberg.net/ ) printf "%s ", $n } { printf "\n" }' test.csv > test1.csv
Output of test1.csv
--------------------------
1,abc@bloomberg.net,abc, 1,abc@bloomberg.net,abc,
2,abc@yahoo.com,pqr@bloomb erg.net, 2,abc@yahoo.com,pqr@bloomb erg.net,
3,abc,pqr@bloomberg.net,mn r@gmail.co m, 3,abc,pqr@bloomberg.net,mn r@gmail.co m,
original file test.csv
-------------------------- ---------- --
1,abc@bloomberg.net,abc,
2,abc@yahoo.com,pqr@bloomb erg.net,
3,abc,pqr@bloomberg.net,mn r@gmail.co m,
awk '{printf "%s ", $1} { for (n=1 ; n<=NF; n++) if ($n ~ /bloomberg.net/ ) printf "%s ", $n } { printf "\n" }' test.csv > test1.csv
Output of test1.csv
--------------------------
1,abc@bloomberg.net,abc, 1,abc@bloomberg.net,abc,
2,abc@yahoo.com,pqr@bloomb
3,abc,pqr@bloomberg.net,mn
original file test.csv
--------------------------
1,abc@bloomberg.net,abc,
2,abc@yahoo.com,pqr@bloomb
3,abc,pqr@bloomberg.net,mn
Are the email addresses all on a single line or are they a combination spread across multipl
My negative pattern match focused on the domain and ...... likely failed because of that.
Must it be a single line?
doing the route you have , output or use push to add the email of bloomberg onto an array or and join them all at the end..
My negative pattern match focused on the domain and ...... likely failed because of that.
Must it be a single line?
doing the route you have , output or use push to add the email of bloomberg onto an array or and join them all at the end..
The sample input you've shown so far always has just a single @bloomberg.net address per line. Is there ever a line with more than one such address?
It seems that all you want to do is copy each @bloomberg.net address to a new output line, ignoring everything else. If there are two such addresses on one line, would you want them both on the same line in the output?
It seems that all you want to do is copy each @bloomberg.net address to a new output line, ignoring everything else. If there are two such addresses on one line, would you want them both on the same line in the output?
It is working on my end, what Unix OS and awk version do you have?
The awk I created will print the first field (containing the number) and then go over every field, match the requested mail address and print it if found.
The awk I created will print the first field (containing the number) and then go over every field, match the requested mail address and print it if found.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you both the perl and awk solutions now work fine. Yes there can be multiple Bloomberg addresses in one line, but as output we can print them on separate lines.
(this gets you the bloomberg mail addresses in a new file)
If you want to delete the other addresses, move the file:
mv onlybloomberg.csv original.csv
(careful: original csv file gets overwritten)
The above assumes you have one address per line in the csv file.