Link to home
Start Free TrialLog in
Avatar of mars16
mars16

asked on

How to extract email addresses from a Dreamweaver document using Regular Expression

Hi there,
I have got an html document that has email address that I would like extracted or at least highlighted in one go so I can copy and paste.

Am not clear on RegEx but I have got this line of code that can find emails one by one but it's not very helpful.


\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-]+

Open in new window

Avatar of johanntagle
johanntagle
Flag of Philippines image

It seems to make sense, what exactly doesn't it match?

You can also look at:
http://fightingforalostcause.net/misc/2006/compare-email-regex.php
http://www.regular-expressions.info/email.html
Avatar of mars16
mars16

ASKER

johanntagle, thanks for the reply. I want to be able to do one of two things (either will do):
Delete all none email address text
or copy (and eventually paste) email addresses from the document

RegEx line is finding the email address one by one and I want to avoid copy and pasting email address individually.

Thank for your help.
I could probably do it in a perl script, but it looks like I don't have to bother.  There's a lot of email address extracting software out there.

Following the top results of http://www.google.com.ph/search?q=extract+email+addresses+from+a+file I found:

http://www.sharewareconnection.com/email-extractor-free-edition.htm
http://www.a1soft.com/emextcom.htm
http://www.brothersoft.com/extract-email-addresses-in-multiple-file-49006.html



Avatar of mars16

ASKER

johanntagle unfortunately, I can't install and run software at work (or i can but not an easy process). However, I do have Dreamweaver installed.
Avatar of kaufmed
How to extract email addresses from a Dreamweaver document using Regular Expression
Are you actually trying to do the extract from within Dreamweaver, or are you using some other tool?
Avatar of mars16

ASKER

kaufmed, am trying to do the extract from within Dreamweaver. Like I said above, I would like to be able to highlight the email address in one-go so I can copy-paste somewhere else or be able to delete all none-email address text so I remain with just email address.
Silly question, but are you using the "Find All" button? I don't have DW, but I can see the button in the image on this page, assuming your version is the same.
Avatar of mars16

ASKER

kaufmed, yes, am using the Find feature.
Avatar of mars16

ASKER

kaufmed, maybe I need to find and replace all non-email text with empty space that way I will remain with just emails?
kaufmed, yes, am using the Find feature.
...  "Find" means find one...   "Find All" means find every occurrence.


kaufmed, maybe I need to find and replace all non-email text with empty space that way I will remain with just emails?
I doubt DW find/replace supports the regex constructs you would need to do this.
Can you attach a screenshot of your find/replace dialog box?
Avatar of mars16

ASKER

kaufmed, my screen is the same as you have posted. When I do Find-All it will list the find in a small dialog box. But it won't actually highlight them on the document as it would for a single Find.
Dreamweaver is the wrong tool for this job.  What you can do with DW has been stated already...do a Find with Reyes to get the email and copy/paste it to a new document and then do Find Next to repeat the process or attempt to find everything but an email and replace it with null space the mere thought of which gives me a migraine.

You would be far better off convincing your IT folks to let you install one of the freeware email extractors or installing perl or php locally and using a quick script to parse the document and extract emails to a new file.  
"Reyes"

Stupid autocorrect...

Meant to be regex
Avatar of mars16

ASKER

Would you approve of this online tool
http://email-extractor.pavucina.com/en/

But it won't actually highlight them on the document as it would for a single Find.
I see. Without having DW, I'm afraid I won't be of much help then.

I do see that you put PHP as one of the zones for this question. If you wanted to, you could use the following PHP to get the addresses for you. I changed the pattern a bit because I think for this purpose, it can be more succinct. Besides, your pattern will only find address that consist of a single, alphanumeric user id (e.g. johndoe) and will not find user id's that are separated with dots or hyphens (e.g. john.doe).
<?php
	$data = file_get_contents('\test.txt');

	preg_match_all('/[^ @]+@[^ @]+/', $data, $matches);

	foreach ($matches as $match)
	{
		foreach ($match as $address)
		{
			echo $address . "<br />\n";
		}
	}
?>

Open in new window

>> Would you approve of this online tool

It will work but most of these online "tool" also keep a copy of all email addresses feed to it and then sell the collected addresses to spammers.
Are the email addresses spread around the file without any structure/format?  Or does it look something like:
<SOME STRUCTURED TEXT HERE IN A KNOWN FORMAT>email1@address.com<MORE STRUCTURED TEXT HERE>
<SOME STRUCTURED TEXT HERE IN A KNOWN FORMAT>email2@address.com<MORE STRUCTURED TEXT HERE>
<SOME STRUCTURED TEXT HERE IN A KNOWN FORMAT>email3@address.com<MORE STRUCTURED TEXT HERE>
<SOME STRUCTURED TEXT HERE IN A KNOWN FORMAT>email4@address.com<MORE STRUCTURED TEXT HERE>

If so then maybe you can modify your find and replace so that your find query is something like

<REG EXP FOR STRUCTURED TEXT>(\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-]+)<REG EXP FOR MORE TEXT>

and then your replace string is just
$1

If not, as Jason1178 said, I don't think any find+replace tool, whether in Dreamweaver or other editor, can do this.  You need an extraction tool.  You need to ask your management to allow you to install one of the extraction tools or a scripting tool you can develop an extraction script on to your machine in order for you to do your job.

Avatar of mars16

ASKER

kaufmed, your php code is very helpful but needs slight modification. Some of the lines looklike this:
name@domain.com www.domain.com +44
4040 name@domain-name.com Someword
name@domain.co.uk word
I'll check it out when I get to work, but it should pick those up because we are lookng for any string of characters that are not a space or @ followed by a single @ followed by any string of characters that are not a space or @. Unless...


you are using the pattern you originally posted. If that be the case, then read why I changed it in the comment above ( http:#35434098 )
ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.