Link to home
Start Free TrialLog in
Avatar of KETTANEH
KETTANEH

asked on

what is the easiest way to grap emails using linux server (application name)

Hello,

I have a Linux OS VPS and I want to use it as bellow:

I have a list of URLs (e.g. 10,000) and I need an application to extract (collect) any email in these URLs.

Any recommended application/way ?


thanks
Avatar of Steven Vona
Steven Vona
Flag of United States of America image

You can use regular expressions with grep.

For example, if you wanted to fine all email addresses in a file named urls.txt you can run the following command:

grep -E -o --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" urls.txt

Avatar of KETTANEH
KETTANEH

ASKER

Thanks savone for the response.

will it search inside the text file ? or it will search inside the URLs inside that file ?
It will search all the text inside the text file.  The URLs are part of the text inside the file.

URLs are usually websites like http://google.com.  There shouldn't be any email addresses in URLs unless it's a mailto URL, in which case YES it will find all the email addresses.

If you post the file it may help understand what you are trying to do.

okay, I will explain more.

I have a file called grook.txt
this file contains:
"
http://www.grook.net/programming/sport-stopwatch
http://www.grook.net/forum/civil-engineering/construction/construction-hand-tools
http://www.grook.net/forum/security/unified-threat-management-comparison-cyberoam
http://www.grook.net/forum/electrical-formulas
"

I want a way to download all these Links and get the email from them
I think we have to use wget to get the contents of these URLs and scan them using the grep

note: grook.net is my website :)


Thanks
I just wrote a script to do what you want, unfortunately there are no emails on those pages.

Here is how I set the script up.

first create a working directory:
mkdir /tmp/sites
then change into that directory:
cd /tmp/sites

Now create file with the urls one per line and call it urls.txt
vi urls.txt

Now create another file for the script called get_emails.sh with the following contents:

#!/bin/bash

for i in `cat $1`
do
wget $i
done
grep -E -or --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" *


Then run the script passing the urls.txt file as an argument:
./get_emails.sh urls.txt

No emails found :(

I just looked over this site: http://www.grook.net/programming/sport-stopwatch

and found there is no email address on that page.
you can also do this on one line like so:

for i in `cat urls.txt`; do wget $i; done; grep -E -or --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" *
hi

just silly problem

whenever I try to run the .sh file as bellow:
./get_email.sh urls


-bash: ./get_email.sh: Permission denied
You have to set the permissions to make it executable. So change to the directory where the script is and run the following command as root.

chmod +x get_emails.sh

I really appreciate your corporation :)

i will try and report back
perfect solution .. thanks a lot

just another small point


Is there anyway to open more than one session at the same time ??
I am not sure what you mean. Can you explain a little?

okay...

currently, I connect to the first URL .. download .. disconnect
then
connect to the second URL .. download .. disconnect

and so on ....


is there any way to connect to (e.g. 5 URLs) at the same time ? instead of one by one
ASKER CERTIFIED SOLUTION
Avatar of Steven Vona
Steven Vona
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
sending the command to the background will not speedup the process .. anyway, thanks a lot Savone... you did a great job in helping me


thanks a gain :)
thanks a lot