[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 594
  • Last Modified:

How to extract email addresses from a Dreamweaver document using Regular Expression

Hi there,
I have got an html document that has email address that I would like extracted or at least highlighted in one go so I can copy and paste.

Am not clear on RegEx but I have got this line of code that can find emails one by one but it's not very helpful.


\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-]+

Open in new window

0
mars16
Asked:
mars16
  • 8
  • 7
  • 4
  • +1
1 Solution
 
johanntagleCommented:
It seems to make sense, what exactly doesn't it match?

You can also look at:
http://fightingforalostcause.net/misc/2006/compare-email-regex.php
http://www.regular-expressions.info/email.html
0
 
mars16Author Commented:
johanntagle, thanks for the reply. I want to be able to do one of two things (either will do):
Delete all none email address text
or copy (and eventually paste) email addresses from the document

RegEx line is finding the email address one by one and I want to avoid copy and pasting email address individually.

Thank for your help.
0
 
johanntagleCommented:
I could probably do it in a perl script, but it looks like I don't have to bother.  There's a lot of email address extracting software out there.

Following the top results of http://www.google.com.ph/search?q=extract+email+addresses+from+a+file I found:

http://www.sharewareconnection.com/email-extractor-free-edition.htm
http://www.a1soft.com/emextcom.htm
http://www.brothersoft.com/extract-email-addresses-in-multiple-file-49006.html



0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
mars16Author Commented:
johanntagle unfortunately, I can't install and run software at work (or i can but not an easy process). However, I do have Dreamweaver installed.
0
 
käµfm³d 👽Commented:
How to extract email addresses from a Dreamweaver document using Regular Expression
Are you actually trying to do the extract from within Dreamweaver, or are you using some other tool?
0
 
mars16Author Commented:
kaufmed, am trying to do the extract from within Dreamweaver. Like I said above, I would like to be able to highlight the email address in one-go so I can copy-paste somewhere else or be able to delete all none-email address text so I remain with just email address.
0
 
käµfm³d 👽Commented:
Silly question, but are you using the "Find All" button? I don't have DW, but I can see the button in the image on this page, assuming your version is the same.
0
 
mars16Author Commented:
kaufmed, yes, am using the Find feature.
0
 
mars16Author Commented:
kaufmed, maybe I need to find and replace all non-email text with empty space that way I will remain with just emails?
0
 
käµfm³d 👽Commented:
kaufmed, yes, am using the Find feature.
...  "Find" means find one...   "Find All" means find every occurrence.


kaufmed, maybe I need to find and replace all non-email text with empty space that way I will remain with just emails?
I doubt DW find/replace supports the regex constructs you would need to do this.
0
 
käµfm³d 👽Commented:
Can you attach a screenshot of your find/replace dialog box?
0
 
mars16Author Commented:
kaufmed, my screen is the same as you have posted. When I do Find-All it will list the find in a small dialog box. But it won't actually highlight them on the document as it would for a single Find.
0
 
Jason C. LevineNo oneCommented:
Dreamweaver is the wrong tool for this job.  What you can do with DW has been stated already...do a Find with Reyes to get the email and copy/paste it to a new document and then do Find Next to repeat the process or attempt to find everything but an email and replace it with null space the mere thought of which gives me a migraine.

You would be far better off convincing your IT folks to let you install one of the freeware email extractors or installing perl or php locally and using a quick script to parse the document and extract emails to a new file.  
0
 
Jason C. LevineNo oneCommented:
"Reyes"

Stupid autocorrect...

Meant to be regex
0
 
mars16Author Commented:
Would you approve of this online tool
http://email-extractor.pavucina.com/en/

0
 
käµfm³d 👽Commented:
But it won't actually highlight them on the document as it would for a single Find.
I see. Without having DW, I'm afraid I won't be of much help then.

I do see that you put PHP as one of the zones for this question. If you wanted to, you could use the following PHP to get the addresses for you. I changed the pattern a bit because I think for this purpose, it can be more succinct. Besides, your pattern will only find address that consist of a single, alphanumeric user id (e.g. johndoe) and will not find user id's that are separated with dots or hyphens (e.g. john.doe).
<?php
	$data = file_get_contents('\test.txt');

	preg_match_all('/[^ @]+@[^ @]+/', $data, $matches);

	foreach ($matches as $match)
	{
		foreach ($match as $address)
		{
			echo $address . "<br />\n";
		}
	}
?>

Open in new window

0
 
Jason C. LevineNo oneCommented:
>> Would you approve of this online tool

It will work but most of these online "tool" also keep a copy of all email addresses feed to it and then sell the collected addresses to spammers.
0
 
johanntagleCommented:
Are the email addresses spread around the file without any structure/format?  Or does it look something like:
<SOME STRUCTURED TEXT HERE IN A KNOWN FORMAT>email1@address.com<MORE STRUCTURED TEXT HERE>
<SOME STRUCTURED TEXT HERE IN A KNOWN FORMAT>email2@address.com<MORE STRUCTURED TEXT HERE>
<SOME STRUCTURED TEXT HERE IN A KNOWN FORMAT>email3@address.com<MORE STRUCTURED TEXT HERE>
<SOME STRUCTURED TEXT HERE IN A KNOWN FORMAT>email4@address.com<MORE STRUCTURED TEXT HERE>

If so then maybe you can modify your find and replace so that your find query is something like

<REG EXP FOR STRUCTURED TEXT>(\w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-]+)<REG EXP FOR MORE TEXT>

and then your replace string is just
$1

If not, as Jason1178 said, I don't think any find+replace tool, whether in Dreamweaver or other editor, can do this.  You need an extraction tool.  You need to ask your management to allow you to install one of the extraction tools or a scripting tool you can develop an extraction script on to your machine in order for you to do your job.

0
 
mars16Author Commented:
kaufmed, your php code is very helpful but needs slight modification. Some of the lines looklike this:
name@domain.com www.domain.com +44
4040 name@domain-name.com Someword
name@domain.co.uk word
0
 
käµfm³d 👽Commented:
I'll check it out when I get to work, but it should pick those up because we are lookng for any string of characters that are not a space or @ followed by a single @ followed by any string of characters that are not a space or @. Unless...


you are using the pattern you originally posted. If that be the case, then read why I changed it in the comment above ( http:#35434098 )
0
 
käµfm³d 👽Commented:
Ah. Simple correction on my part. Changing the spaces in the aforementioned pattern to "\s" should produce cleaner results:
[^\s@]+@[^\s@]+

Open in new window



Example
<html>
<body>
<h1>some testing</h1>
<div id="column-middle page-content"><p>some stuff inside first div</p> <h1>some title inside the div </h1><p> khjk lh</p></div>
<h1>My First Heading</h1> </div> more text
<p>My first paragraph.</p>
<div id="anding-left page-content">some stuff john.doe@example.com inside second div </div>
<div id="column-middle page-content"><p>some stuff inside first div</p> <h1>some other title inside the div </h1><p> khjk lh</p></div>
<p>content=23 john.doe@unknown.com april 2011 </p>

jane.doe@megaplex.com
</bod>

kaufmed, your php code is very helpful but needs slight modification. Some of the lines looklike this:
name@domain.com www.domain.com +44

4040 name@domain-name.com Someword

name@domain.co.uk word
</html>

Open in new window


Code
<html>
	<body>
<?php
	$data = file_get_contents('\test.txt');

	preg_match_all('/[^\s@]+@[^\s@]+/', $data, $matches);

	foreach ($matches as $match)
	{
		foreach ($match as $address)
		{
			echo $address . "<br />\n";
		}
	}
?>
	</body>
</html>

Open in new window


Results
 Screenshot
0
 
Jason C. LevineNo oneCommented:
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
0

Featured Post

Upgrade your Question Security!

Add Premium security features to your question to ensure its privacy or anonymity. Learn more about your ability to control Question Security today.

  • 8
  • 7
  • 4
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now