chrismdoyle
asked on
Dump email addresses out of flat files using Regex and sed
I've got a folder full of csv and text files that I need to grab all of the email addresses out of. These files are stored on a linux server so I'd like to be able to use the command line or a script to do the following:
For each file in a folder (whether it's a CSV file, tab delimited, or | delimited)
Search each line for the email address, and send all of those lines to a new file
Example:
File1.csv:
chris@gmail.com, Chris, Last name, phone
1234, chris@gmail.com, Chris, Last Name, phone
ouput to File1-emails.csv:
chris@gmail.com
chris@gmail.com
File2.csv:
"chris@gmail.com", Chris, Last name, phone
ouput to File2-emails.csv:
chris@gmail.com
File3.txt:
chris@gmail.com|chris|last name| phone
ouput to File3-emails.txt:
chris@gmail.com
The point here is, I'd like the script to be able to run in files of different formats.
For each file in a folder (whether it's a CSV file, tab delimited, or | delimited)
Search each line for the email address, and send all of those lines to a new file
Example:
File1.csv:
chris@gmail.com, Chris, Last name, phone
1234, chris@gmail.com, Chris, Last Name, phone
ouput to File1-emails.csv:
chris@gmail.com
chris@gmail.com
File2.csv:
"chris@gmail.com", Chris, Last name, phone
ouput to File2-emails.csv:
chris@gmail.com
File3.txt:
chris@gmail.com|chris|last
ouput to File3-emails.txt:
chris@gmail.com
The point here is, I'd like the script to be able to run in files of different formats.
ASKER
Very close ^ What about the output on a file like this:
KIFFANY|VANZANT|38122|_lil _sexy131@h otmail.com |
KIFFANY|VANZANT|38122|_lil
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Just an update, the above solution is not complete yet. As it turns out, it does not capture the domain name following the email address so chris@gmail.com became chris@gmail
Any help would be appreciated. Could you post the Regex expression in the SED command as well as separate?
Any help would be appreciated. Could you post the Regex expression in the SED command as well as separate?
I thought I had a . inside of [-A-Za-z0-9_.] but I don't see it now
how does
/\(.*["|, ]\)*\([^"|, ]*@[-A-Za-z0-9_.]*\).*/
work for you?
how does
/\(.*["|, ]\)*\([^"|, ]*@[-A-Za-z0-9_.]*\).*/
work for you?
d' file1.csv > File1-emails.csv