• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 438
  • Last Modified:

extract emails from text file

Hi experts,

I need to extract emails from a large .txt file and output them to another .txt file. Please help.

Thanks
0
gloriaewold41
Asked:
gloriaewold41
  • 7
  • 5
  • 2
  • +1
1 Solution
 
m4nd4li4Commented:
Do you mean, extract email addresses from the .txt file?
0
 
gloriaewold41Author Commented:
Yes, from a large file.
0
 
gloriaewold41Author Commented:
Common guys i really need this script. Nobody knows this?
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
Erdinç Güngör ÇorbacıPHP Development Team LeaderCommented:
There are many ways of achieving this.

Try this code and sample txt file i attach. (put them in same folder)

and notice that first email is not added because of char "ç" in it.
test.php
myfile.txt
0
 
Erdinç Güngör ÇorbacıPHP Development Team LeaderCommented:
If you need to filter same emails to be added once use this code instead ;

<?php

    $file_path_and_name="myfile.txt";// SET AS YOU WISH
	$emails = array();
	
    if ( file_exists($file_path_and_name)){
         if (is_file ($file_path_and_name)){

			$lines = file($file_path_and_name);
			foreach($lines as $the_line){
				//I assume there are spaces before and after the email addresses
				$pieces=explode(" ",$the_line);
				foreach($pieces as $piece){
					if(!in_array($piece,$emails)) {
						$emails[]=$piece;
						if (filter_var($piece, FILTER_VALIDATE_EMAIL)) $output.=$piece.PHP_EOL;
					}
				}
			}

			//update file content	
			// outputfile has a output addition in filename
			$output_file_name=substr($file_path_and_name,0,-4)."-output".substr($file_path_and_name,-4);
			$file_work = fopen($output_file_name,'w') or die ("failed to write") ;
			fwrite($file_work,  $output);
			fclose($file_work);
			return TRUE;
        }
    }
	
	echo $output;
?>

Open in new window

0
 
Ray PaseurCommented:
See http://www.laprbass.com/RAY_scrape_emails.php

<?php // RAY_scrape_emails.php
error_reporting(E_ALL);
echo "<pre>";

// SOME TEST DATA
$txt = 'This is a test message containing Ray@NationalPres.org and RPaseur@NatPresCh.org in clear text and <a href="mailto:RPaseur@NationalPres.org">Ray@NPC</a> in a link.';

// A REGEX
$rgx = '/[A-Z0-9_-][A-Z0-9._-]*@[A-Z0-9][A-Z0-9-]*\.+[A-Z]{2,6}/i';

// THE PROCESSING
preg_match_all($rgx, $txt, $mat);

// VISUALIZE THE INPUT AND THE RESULT
echo htmlentities($txt);

foreach ($mat[0] as $email)
{
    echo PHP_EOL . $email;
}

Open in new window

Use your judgement about the regular expression - I wrote it a long time ago, and the internet is full of regular expressions to match email addresses.  Most of the are wrong in one way or another and this one may be deficient, too.  But it seems to work fairly well for my needs.

HTH, ~Ray
0
 
gloriaewold41Author Commented:
Ray your code works but it skips emails like this email@nlmusd.k12.ca.us
I don't know with what to replace the $rgx line. Anyway your answer is the solution, i did not accepted it because i don't know if you can then still add another answer, if you know another email filter. Please tell me a good email filter. Thanks
0
 
Erdinç Güngör ÇorbacıPHP Development Team LeaderCommented:
gloriaewold41: Email validation is really a headache in our business. I also had used many regular expression based self made functions for this. Which i assumed them to work correctly at first. But later i realized none of them are perfect. So if your php version is above 5.2 using php built-in filter is better. (It's also based on a regex too but tested on more samples i think)

You can see the long explanations here;
http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address


Would you please tell if you'd tested my solution If your problem with it is the space requirements around emails, this can easily  be solved.

Like this ;
<?php

    $file_path_and_name="myfile.txt";
	$emails = array();
	// Add yours as you wish
	$clean_these[]='/</';
	$clean_these[]='/>/';
	$clean_these[]='/:/';
	$clean_these[]='/;/';
	$clean_these[]='/=/';
	
    if ( file_exists($file_path_and_name)){
         if (is_file ($file_path_and_name)){

			$lines = file($file_path_and_name);
			foreach($lines as $the_line){
				$the_line = preg_replace($clean_these," ",$the_line);
				$pieces=explode(" ",$the_line);
				foreach($pieces as $piece){
					if(!in_array($piece,$emails)) {
						$emails[]=$piece;
						if (filter_var($piece, FILTER_VALIDATE_EMAIL)) $output.=$piece.PHP_EOL;
					}
				}
			}

			//update file content	
			// outputfile has a output addition in filename
			$output_file_name=substr($file_path_and_name,0,-4)."-output".substr($file_path_and_name,-4);
			$file_work = fopen($output_file_name,'w') or die ("failed to write") ;
			fwrite($file_work,  $output);
			fclose($file_work);
			return TRUE;
        }
    }
	
	echo $output;
?>

Open in new window


This is tested with many kind of email with many text files.
it produces a new file with emails scraped.
0
 
Ray PaseurCommented:
Have a look at this article.  Email (and web url) detection and validation with PHP regular expressions is an inconsistent and quirky thing.  I do not have all of the answers.  My best hope is to get you close and help you understand how to test your way to success with your own code!

http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html

HTH, ~Ray
0
 
gloriaewold41Author Commented:
erdincgc your code doesn't work at all. no output and no errors in browser.
0
 
Erdinç Güngör ÇorbacıPHP Development Team LeaderCommented:
Do you work locally?  It seems you are running php pages on remote server . Folder and files you uploaded to server must be writable because mycode sample produces a new file with the scraped emails (and filtering same emails.) not just a screen output. Also your servers php settings doesn't show  file write permission errors.

Please add this before the first line in php code to see warnings.

error_reporting(E_ALL);

Open in new window


 
Some possible reasons for problem.

- Your PHP version is not above 5.2
- Folder you put these files is not writable.

If you can't change folder permissions , you can create myfile-output.txt yourself and make that file writable

I'd tested a very big txt file with different type of emails and get a list of emails in a seperate file successfully. If your PHP version is above 5.2 there shouldn't be a problem.
0
 
gloriaewold41Author Commented:
erdincgc my php works, other scripts output creating files. i've added the first line like you said and still doesn't work.
0
 
Erdinç Güngör ÇorbacıPHP Development Team LeaderCommented:
Are you sure you'd put both files in the same folder ? (as i indicated in my first comment)

My sample php page needs a myfile.txt in the same folder which has the text input containing emails to be extracted. As you can see in the php code there are two if clauses. They are checking that files existence. So if you don't have that file surely there will be no output. So please be sure you have that file beside .php file or change that file variable's content according to your file.
0
 
gloriaewold41Author Commented:
Yes, i put the file in the same folder and did not work.
0
 
gloriaewold41Author Commented:
Thanks.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 7
  • 5
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now