what is the easiest way to grap emails using linux server (application name)

Hello,

I have a Linux OS VPS and I want to use it as bellow:

I have a list of URLs (e.g. 10,000) and I need an application to extract (collect) any email in these URLs.

Any recommended application/way ?


thanks
LVL 5
KETTANEHAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

savoneCommented:
You can use regular expressions with grep.

For example, if you wanted to fine all email addresses in a file named urls.txt you can run the following command:

grep -E -o --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" urls.txt

0
KETTANEHAuthor Commented:
Thanks savone for the response.

will it search inside the text file ? or it will search inside the URLs inside that file ?
0
Acronis True Image 2019 just released!

Create a reliable backup. Make sure you always have dependable copies of your data so you can restore your entire system or individual files.

savoneCommented:
It will search all the text inside the text file.  The URLs are part of the text inside the file.

URLs are usually websites like http://google.com.  There shouldn't be any email addresses in URLs unless it's a mailto URL, in which case YES it will find all the email addresses.

If you post the file it may help understand what you are trying to do.

0
KETTANEHAuthor Commented:
okay, I will explain more.

I have a file called grook.txt
this file contains:
"
http://www.grook.net/programming/sport-stopwatch
http://www.grook.net/forum/civil-engineering/construction/construction-hand-tools
http://www.grook.net/forum/security/unified-threat-management-comparison-cyberoam
http://www.grook.net/forum/electrical-formulas
"

I want a way to download all these Links and get the email from them
I think we have to use wget to get the contents of these URLs and scan them using the grep

note: grook.net is my website :)


Thanks
0
savoneCommented:
I just wrote a script to do what you want, unfortunately there are no emails on those pages.

Here is how I set the script up.

first create a working directory:
mkdir /tmp/sites
then change into that directory:
cd /tmp/sites

Now create file with the urls one per line and call it urls.txt
vi urls.txt

Now create another file for the script called get_emails.sh with the following contents:

#!/bin/bash

for i in `cat $1`
do
wget $i
done
grep -E -or --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" *


Then run the script passing the urls.txt file as an argument:
./get_emails.sh urls.txt

No emails found :(

I just looked over this site: http://www.grook.net/programming/sport-stopwatch

and found there is no email address on that page.
0
savoneCommented:
you can also do this on one line like so:

for i in `cat urls.txt`; do wget $i; done; grep -E -or --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" *
0
KETTANEHAuthor Commented:
hi

just silly problem

whenever I try to run the .sh file as bellow:
./get_email.sh urls


-bash: ./get_email.sh: Permission denied
0
savoneCommented:
You have to set the permissions to make it executable. So change to the directory where the script is and run the following command as root.

chmod +x get_emails.sh

0
KETTANEHAuthor Commented:
I really appreciate your corporation :)

i will try and report back
0
KETTANEHAuthor Commented:
perfect solution .. thanks a lot

just another small point


Is there anyway to open more than one session at the same time ??
0
savoneCommented:
I am not sure what you mean. Can you explain a little?

0
KETTANEHAuthor Commented:
okay...

currently, I connect to the first URL .. download .. disconnect
then
connect to the second URL .. download .. disconnect

and so on ....


is there any way to connect to (e.g. 5 URLs) at the same time ? instead of one by one
0
savoneCommented:
Yes there is.  You can simply background the wget process. Notice I added an & sign after the wget statement.

#!/bin/bash

for i in `cat $1`
do
wget $i &
done
grep -E -or --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" *

I am not great at programming but this should work.

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
KETTANEHAuthor Commented:
sending the command to the background will not speedup the process .. anyway, thanks a lot Savone... you did a great job in helping me


thanks a gain :)
0
KETTANEHAuthor Commented:
thanks a lot
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Software

From novice to tech pro — start learning today.