what is the easiest way to grap emails using linux server (application name)

Hello,

I have a Linux OS VPS and I want to use it as bellow:

I have a list of URLs (e.g. 10,000) and I need an application to extract (collect) any email in these URLs.

Any recommended application/way ?


thanks
LVL 5
KETTANEHAsked:
Who is Participating?
 
savoneCommented:
Yes there is.  You can simply background the wget process. Notice I added an & sign after the wget statement.

#!/bin/bash

for i in `cat $1`
do
wget $i &
done
grep -E -or --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" *

I am not great at programming but this should work.

0
 
savoneCommented:
You can use regular expressions with grep.

For example, if you wanted to fine all email addresses in a file named urls.txt you can run the following command:

grep -E -o --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" urls.txt

0
Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

 
KETTANEHAuthor Commented:
Thanks savone for the response.

will it search inside the text file ? or it will search inside the URLs inside that file ?
0
 
savoneCommented:
It will search all the text inside the text file.  The URLs are part of the text inside the file.

URLs are usually websites like http://google.com.  There shouldn't be any email addresses in URLs unless it's a mailto URL, in which case YES it will find all the email addresses.

If you post the file it may help understand what you are trying to do.

0
 
KETTANEHAuthor Commented:
okay, I will explain more.

I have a file called grook.txt
this file contains:
"
http://www.grook.net/programming/sport-stopwatch
http://www.grook.net/forum/civil-engineering/construction/construction-hand-tools
http://www.grook.net/forum/security/unified-threat-management-comparison-cyberoam
http://www.grook.net/forum/electrical-formulas
"

I want a way to download all these Links and get the email from them
I think we have to use wget to get the contents of these URLs and scan them using the grep

note: grook.net is my website :)


Thanks
0
 
savoneCommented:
I just wrote a script to do what you want, unfortunately there are no emails on those pages.

Here is how I set the script up.

first create a working directory:
mkdir /tmp/sites
then change into that directory:
cd /tmp/sites

Now create file with the urls one per line and call it urls.txt
vi urls.txt

Now create another file for the script called get_emails.sh with the following contents:

#!/bin/bash

for i in `cat $1`
do
wget $i
done
grep -E -or --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" *


Then run the script passing the urls.txt file as an argument:
./get_emails.sh urls.txt

No emails found :(

I just looked over this site: http://www.grook.net/programming/sport-stopwatch

and found there is no email address on that page.
0
 
savoneCommented:
you can also do this on one line like so:

for i in `cat urls.txt`; do wget $i; done; grep -E -or --color "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" *
0
 
KETTANEHAuthor Commented:
hi

just silly problem

whenever I try to run the .sh file as bellow:
./get_email.sh urls


-bash: ./get_email.sh: Permission denied
0
 
savoneCommented:
You have to set the permissions to make it executable. So change to the directory where the script is and run the following command as root.

chmod +x get_emails.sh

0
 
KETTANEHAuthor Commented:
I really appreciate your corporation :)

i will try and report back
0
 
KETTANEHAuthor Commented:
perfect solution .. thanks a lot

just another small point


Is there anyway to open more than one session at the same time ??
0
 
savoneCommented:
I am not sure what you mean. Can you explain a little?

0
 
KETTANEHAuthor Commented:
okay...

currently, I connect to the first URL .. download .. disconnect
then
connect to the second URL .. download .. disconnect

and so on ....


is there any way to connect to (e.g. 5 URLs) at the same time ? instead of one by one
0
 
KETTANEHAuthor Commented:
sending the command to the background will not speedup the process .. anyway, thanks a lot Savone... you did a great job in helping me


thanks a gain :)
0
 
KETTANEHAuthor Commented:
thanks a lot
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.