We help IT Professionals succeed at work.

How to extract email addresses from a Dreamweaver document using Regular Expression

Medium Priority
687 Views
Last Modified: 2012-05-11
Hi there,
I have got an html document that has email address that I would like extracted or at least highlighted in one go so I can copy and paste.

Am not clear on RegEx but I have got this line of code that can find emails one by one but it's not very helpful.


\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-]+

Open in new window

Comment
Watch Question

Top Expert 2012

Commented:
It seems to make sense, what exactly doesn't it match?

You can also look at:
http://fightingforalostcause.net/misc/2006/compare-email-regex.php
http://www.regular-expressions.info/email.html

Author

Commented:
johanntagle, thanks for the reply. I want to be able to do one of two things (either will do):
Delete all none email address text
or copy (and eventually paste) email addresses from the document

RegEx line is finding the email address one by one and I want to avoid copy and pasting email address individually.

Thank for your help.
Top Expert 2012

Commented:
I could probably do it in a perl script, but it looks like I don't have to bother.  There's a lot of email address extracting software out there.

Following the top results of http://www.google.com.ph/search?q=extract+email+addresses+from+a+file I found:

http://www.sharewareconnection.com/email-extractor-free-edition.htm
http://www.a1soft.com/emextcom.htm
http://www.brothersoft.com/extract-email-addresses-in-multiple-file-49006.html



Author

Commented:
johanntagle unfortunately, I can't install and run software at work (or i can but not an easy process). However, I do have Dreamweaver installed.
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2015

Commented:
How to extract email addresses from a Dreamweaver document using Regular Expression
Are you actually trying to do the extract from within Dreamweaver, or are you using some other tool?

Author

Commented:
kaufmed, am trying to do the extract from within Dreamweaver. Like I said above, I would like to be able to highlight the email address in one-go so I can copy-paste somewhere else or be able to delete all none-email address text so I remain with just email address.
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2015

Commented:
Silly question, but are you using the "Find All" button? I don't have DW, but I can see the button in the image on this page, assuming your version is the same.

Author

Commented:
kaufmed, yes, am using the Find feature.

Author

Commented:
kaufmed, maybe I need to find and replace all non-email text with empty space that way I will remain with just emails?
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2015

Commented:
kaufmed, yes, am using the Find feature.
...  "Find" means find one...   "Find All" means find every occurrence.


kaufmed, maybe I need to find and replace all non-email text with empty space that way I will remain with just emails?
I doubt DW find/replace supports the regex constructs you would need to do this.
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2015

Commented:
Can you attach a screenshot of your find/replace dialog box?

Author

Commented:
kaufmed, my screen is the same as you have posted. When I do Find-All it will list the find in a small dialog box. But it won't actually highlight them on the document as it would for a single Find.
Jason C. LevineDon't talk to me.
CERTIFIED EXPERT

Commented:
Dreamweaver is the wrong tool for this job.  What you can do with DW has been stated already...do a Find with Reyes to get the email and copy/paste it to a new document and then do Find Next to repeat the process or attempt to find everything but an email and replace it with null space the mere thought of which gives me a migraine.

You would be far better off convincing your IT folks to let you install one of the freeware email extractors or installing perl or php locally and using a quick script to parse the document and extract emails to a new file.  
Jason C. LevineDon't talk to me.
CERTIFIED EXPERT

Commented:
"Reyes"

Stupid autocorrect...

Meant to be regex

Author

Commented:
Would you approve of this online tool
http://email-extractor.pavucina.com/en/

CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2015

Commented:
But it won't actually highlight them on the document as it would for a single Find.
I see. Without having DW, I'm afraid I won't be of much help then.

I do see that you put PHP as one of the zones for this question. If you wanted to, you could use the following PHP to get the addresses for you. I changed the pattern a bit because I think for this purpose, it can be more succinct. Besides, your pattern will only find address that consist of a single, alphanumeric user id (e.g. johndoe) and will not find user id's that are separated with dots or hyphens (e.g. john.doe).
<?php
	$data = file_get_contents('\test.txt');

	preg_match_all('/[^ @]+@[^ @]+/', $data, $matches);

	foreach ($matches as $match)
	{
		foreach ($match as $address)
		{
			echo $address . "<br />\n";
		}
	}
?>

Open in new window

Jason C. LevineDon't talk to me.
CERTIFIED EXPERT

Commented:
>> Would you approve of this online tool

It will work but most of these online "tool" also keep a copy of all email addresses feed to it and then sell the collected addresses to spammers.
Top Expert 2012

Commented:
Are the email addresses spread around the file without any structure/format?  Or does it look something like:
<SOME STRUCTURED TEXT HERE IN A KNOWN FORMAT>email1@address.com<MORE STRUCTURED TEXT HERE>
<SOME STRUCTURED TEXT HERE IN A KNOWN FORMAT>email2@address.com<MORE STRUCTURED TEXT HERE>
<SOME STRUCTURED TEXT HERE IN A KNOWN FORMAT>email3@address.com<MORE STRUCTURED TEXT HERE>
<SOME STRUCTURED TEXT HERE IN A KNOWN FORMAT>email4@address.com<MORE STRUCTURED TEXT HERE>

If so then maybe you can modify your find and replace so that your find query is something like

<REG EXP FOR STRUCTURED TEXT>(\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-]+)<REG EXP FOR MORE TEXT>

and then your replace string is just
$1

If not, as Jason1178 said, I don't think any find+replace tool, whether in Dreamweaver or other editor, can do this.  You need an extraction tool.  You need to ask your management to allow you to install one of the extraction tools or a scripting tool you can develop an extraction script on to your machine in order for you to do your job.

Author

Commented:
kaufmed, your php code is very helpful but needs slight modification. Some of the lines looklike this:
name@domain.com www.domain.com +44
4040 name@domain-name.com Someword
name@domain.co.uk word
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2015

Commented:
I'll check it out when I get to work, but it should pick those up because we are lookng for any string of characters that are not a space or @ followed by a single @ followed by any string of characters that are not a space or @. Unless...


you are using the pattern you originally posted. If that be the case, then read why I changed it in the comment above ( http:#35434098 )
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2015
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION
Jason C. LevineDon't talk to me.
CERTIFIED EXPERT

Commented:
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.