KETTANEH
asked on
what is the easiest way to grap emails using linux server (application name)
Hello,
I have a Linux OS VPS and I want to use it as bellow:
I have a list of URLs (e.g. 10,000) and I need an application to extract (collect) any email in these URLs.
Any recommended application/way ?
thanks
I have a Linux OS VPS and I want to use it as bellow:
I have a list of URLs (e.g. 10,000) and I need an application to extract (collect) any email in these URLs.
Any recommended application/way ?
thanks
ASKER
Thanks savone for the response.
will it search inside the text file ? or it will search inside the URLs inside that file ?
will it search inside the text file ? or it will search inside the URLs inside that file ?
It will search all the text inside the text file. The URLs are part of the text inside the file.
URLs are usually websites like http://google.com. There shouldn't be any email addresses in URLs unless it's a mailto URL, in which case YES it will find all the email addresses.
If you post the file it may help understand what you are trying to do.
URLs are usually websites like http://google.com. There shouldn't be any email addresses in URLs unless it's a mailto URL, in which case YES it will find all the email addresses.
If you post the file it may help understand what you are trying to do.
ASKER
okay, I will explain more.
I have a file called grook.txt
this file contains:
"
http://www.grook.net/programming/sport-stopwatch
http://www.grook.net/forum/civil-engineering/construction/construction-hand-tools
http://www.grook.net/forum/security/unified-threat-management-comparison-cyberoam
http://www.grook.net/forum/electrical-formulas
"
I want a way to download all these Links and get the email from them
I think we have to use wget to get the contents of these URLs and scan them using the grep
note: grook.net is my website :)
Thanks
I have a file called grook.txt
this file contains:
"
http://www.grook.net/programming/sport-stopwatch
http://www.grook.net/forum/civil-engineering/construction/construction-hand-tools
http://www.grook.net/forum/security/unified-threat-management-comparison-cyberoam
http://www.grook.net/forum/electrical-formulas
"
I want a way to download all these Links and get the email from them
I think we have to use wget to get the contents of these URLs and scan them using the grep
note: grook.net is my website :)
Thanks
I just wrote a script to do what you want, unfortunately there are no emails on those pages.
Here is how I set the script up.
first create a working directory:
mkdir /tmp/sites
then change into that directory:
cd /tmp/sites
Now create file with the urls one per line and call it urls.txt
vi urls.txt
Now create another file for the script called get_emails.sh with the following contents:
#!/bin/bash
for i in `cat $1`
do
wget $i
done
grep -E -or --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0 -9.-]+\.[a -zA-Z0-9.- ]+\b" *
Then run the script passing the urls.txt file as an argument:
./get_emails.sh urls.txt
No emails found :(
I just looked over this site: http://www.grook.net/programming/sport-stopwatch
and found there is no email address on that page.
Here is how I set the script up.
first create a working directory:
mkdir /tmp/sites
then change into that directory:
cd /tmp/sites
Now create file with the urls one per line and call it urls.txt
vi urls.txt
Now create another file for the script called get_emails.sh with the following contents:
#!/bin/bash
for i in `cat $1`
do
wget $i
done
grep -E -or --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0
Then run the script passing the urls.txt file as an argument:
./get_emails.sh urls.txt
No emails found :(
I just looked over this site: http://www.grook.net/programming/sport-stopwatch
and found there is no email address on that page.
you can also do this on one line like so:
for i in `cat urls.txt`; do wget $i; done; grep -E -or --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0 -9.-]+\.[a -zA-Z0-9.- ]+\b" *
for i in `cat urls.txt`; do wget $i; done; grep -E -or --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0
ASKER
hi
just silly problem
whenever I try to run the .sh file as bellow:
./get_email.sh urls
-bash: ./get_email.sh: Permission denied
just silly problem
whenever I try to run the .sh file as bellow:
./get_email.sh urls
-bash: ./get_email.sh: Permission denied
You have to set the permissions to make it executable. So change to the directory where the script is and run the following command as root.
chmod +x get_emails.sh
chmod +x get_emails.sh
ASKER
I really appreciate your corporation :)
i will try and report back
i will try and report back
ASKER
perfect solution .. thanks a lot
just another small point
Is there anyway to open more than one session at the same time ??
just another small point
Is there anyway to open more than one session at the same time ??
I am not sure what you mean. Can you explain a little?
ASKER
okay...
currently, I connect to the first URL .. download .. disconnect
then
connect to the second URL .. download .. disconnect
and so on ....
is there any way to connect to (e.g. 5 URLs) at the same time ? instead of one by one
currently, I connect to the first URL .. download .. disconnect
then
connect to the second URL .. download .. disconnect
and so on ....
is there any way to connect to (e.g. 5 URLs) at the same time ? instead of one by one
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
sending the command to the background will not speedup the process .. anyway, thanks a lot Savone... you did a great job in helping me
thanks a gain :)
thanks a gain :)
ASKER
thanks a lot
For example, if you wanted to fine all email addresses in a file named urls.txt you can run the following command:
grep -E -o --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0