?
Solved

PHP Regular Expression

Posted on 2009-06-30
7
Medium Priority
?
1,188 Views
Last Modified: 2013-11-13
I have a string having data in html format. This data is actually will be emails  . I have a class for converting html to text. Before converting this in text, I need to eliminate few things for which I need regular expression. Am using PHP.

I need to eliminate following things
1:All new lines should replaced with (dot/full stop) . sign
2:  Fwd and replied emails have Date , from a, subject and To fields. If need such complete block to be eliminated from string.
3: All formats like "--- On Sat, 6/27/09, aqeel chishti <chishtiaaa@hotmail.com> wrote:" and "Date: Saturday, June 27, 2009, 10:05 PM" should be eliminted .
4: Foter advertisements should be eliminated.

All these four points may be repeating several time in one string . So all should be eliminated. Below is an example of data in string .


Date: Tue, 30 Jun 2009 01:35:02 -0700
From: cyber_searchnaz@yahoo.com
Subject: Fw: Read it if YOU LOVE
To: farah_cyber@yahoo.com; saqib_ali1234@yahoo.com; sshanshah@hotmail.com; shariq.cybersearch@yahoo.com; naveedm@yahoo.com; osp_786@hotmail.com



--- On Sat, 6/27/09, aqeel chishti <chishtiaaa@hotmail.com> wrote:


    From: aqeel chishti <chishtiaaa@hotmail.com>
    Subject: Read it if YOU LOVE
    To: "Aamir Jamil" <jamilaamir@hotmail.com>, "ADEEL CHISHTI" <caddesigner007@hotmail.com>, "ADIL JAMIL" <adil_dds@hotmail.com>, "adnan mubeen" <eamubeen1@gmail.com>, anzarulhaq@hotmail.com, "ASIF ZAFAR" <mirzaasifzafar@mail.com>, "atif muhammad" <atif165@hotmail.com>, "BILAL CHISHTI" <slowwhisperer@hotmail.com>, "Fazal Bhai London" <khan_fazal@hotmail.com>, "IMRAN HADDI TRAVEL" <immimubijee@hotmail.com>, "Jibran Chishti" <jibran.a.chishti@hotmail.com>, "Kamran Latif" <mklsohawa@hotmail.com>, "MOHD NAEEM" <mohdnaeem2007@hotmail.com>, "munawar agha" <agha61@hotmail.com>, "NADEEM PRIVATE ANWAR" <anwer.nadeem@piac.aero>, "Najam Imtiaz" <najjo4@yahoo.com>, "NASIR ALI" <nasirali_pk@hotmail.com>, nasirhasan4u@hotmail.com, "RAHEEL MEMON" <rapid007@hotmail.com>, "RASHID BERLAS" <rashidbarlas_b@hotmail.com>, "sadaf naz" <cyber_searchnaz@yahoo.com>, "SAQIB SALEEM" <saqibkhawaja@hotmail.com>, "shafqat bukhari" <sasbukhari@yahoo.com>, "SHAHID HABIB SHAHID HABIB" <shahid54600@yahoo.com>, "Shazad Khan Bank Alhabib" <skhan@bankalhabib.com>, "SOHAIL IMRAN" <dreamnetzone@yahoo.com>, "WASEEM PIA KHAN" <waseem.khan@piac.aero>, "tabish hasan" <aliali925@hotmail.com>, "ZULFI JEDDAH" <zulfiqarali@jamjoompharma.com>, "zeshan pia zeshan pia" <lookphilogymist@hotmail.com>
    Date: Saturday, June 27, 2009, 10:05 PM




        IF YOU ARE A TRUE
         
         READ THIS!
         
         
         
            * These days Wives dont look after their Husbands

            * Girls go round without being covered
            * Children no longer respect their parents or any orders
            * The Rich do not look after the poor, they also don't give gifts or money and they also do not fulfil zakaat. he also said to


        Thank you very much for your time
         

        Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! Try it!
        See all the ways you can stay connected to friends and family
        See all the ways you can stay connected to friends and family
        What can you do with the new Windows Live? Find out
        See all the ways you can stay connected to friends and family
        check out the rest of the Windows Live". More than mailWindows Live" goes way beyond your inbox. More than messages
        Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! Try it!
        Microsoft brings you a new way to search the web. Try Bing" now




    Microsoft brings you a new way to search the web. Try Bing" now
    check out the rest of the Windows Live". More than mailWindows Live" goes way beyond your inbox. More than messages


    check out the rest of the Windows Live". More than mailWindows Live" goes way beyond your inbox. More than messages



0
Comment
Question by:Naveed_Manzoor
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 
LVL 2

Expert Comment

by:thinkingman2
ID: 24751033
Check this one:
$toBeDeleted = array(
	"~^(Date|From|Subject|To|Other-Headers|More-Headers):.*$~",
	"^.*Microsoft brings you a new way to.*$",
	"^.*Yahoo is your friend.*$",
	"^.*partial snippet of other footer texts here.*$",
	"^.*more partial snippet of other footer texts here.*$",
);
 
//stuff you want to delete goes in the above array
$messages = preg_replace($toBeDeleted,'',$messages);
 
//replace one or more newlines of either mac, windows or unix with a '.'
//if you don't want it to replace more than two returns with one dot, get rid of the + in the expression below
//do this last because the above expression will generate lots of empty lines
$messages = preg_replace("~[\r\n]+~",'.',$messages);

Open in new window

0
 

Author Comment

by:Naveed_Manzoor
ID: 24764973
I use this code and getting this eror.

Warning: preg_replace() [function.preg-replace]: No ending delimiter '^' found in C:\xampp\htdocs\imap\test.php on line 133.


      $toBeDeleted = array(
                  "~^(Date|From|Subject|To|Other-Headers|More-Headers):.*$~",
                  "^.*Microsoft brings you a new way to.*$",
                  "^.*Yahoo is your friend.*$",
                  "^.*partial snippet of other footer texts here.*$",
                  "^.*more partial snippet of other footer texts here.*$",
      );
$msgBody = preg_replace($toBeDeleted,'',$msgBody);
$msgBody = preg_replace("~[\r\n]+~",'.',$msgBody);
0
 
LVL 2

Expert Comment

by:thinkingman2
ID: 24767122
Sorry, left out the delimiters in the array above, here's a corrected snippet below.
$toBeDeleted = array(
        "~^(Date|From|Subject|To|Other-Headers|More-Headers):.*$~",
        "~^.*Microsoft brings you a new way to.*$~",
        "~^.*Yahoo is your friend.*$~",
        "~^.*partial snippet of other footer texts here.*$~",
        "~^.*more partial snippet of other footer texts here.*$~",
);
 
//stuff you want to delete goes in the above array
$messages = preg_replace($toBeDeleted,'',$messages);
 
//replace one or more newlines of either mac, windows or unix with a '.'
//if you don't want it to replace more than two returns with one dot, get rid of the + in the expression below
//do this last because the above expression will generate lots of empty lines
$messages = preg_replace("~[\r\n]+~",'.',$messages);

Open in new window

0
Secure Your WordPress Site: 5 Essential Approaches

WordPress is the web's most popular CMS, but its dominance also makes it a target for attackers. Our eBook will show you how to:

Prevent costly exploits of core and plugin vulnerabilities
Repel automated attacks
Lock down your dashboard, secure your code, and protect your users

 

Author Comment

by:Naveed_Manzoor
ID: 24774068
Check the attache code snipet. Its not working ...
<?php
$msgBody = 'Date: Tue, 30 Jun 2009 01:35:02 -0700
From: cyber_searchnaz@yahoo.com
Subject: Fw: Read it if YOU LOVE
To: farah_cyber@yahoo.com; saqib_ali1234@yahoo.com; sshanshah@hotmail.com; shariq.cybersearch@yahoo.com; naveedm@yahoo.com; osp_786@hotmail.com
 
 
 
--- On Sat, 6/27/09, aqeel chishti <chishtiaaa@hotmail.com> wrote:
 
 
    From: aqeel chishti <chishtiaaa@hotmail.com>
    Subject: Read it if YOU LOVE
    To: "Aamir Jamil" <jamilaamir@hotmail.com>, "ADEEL CHISHTI" <caddesigner007@hotmail.com>, "ADIL JAMIL" <adil_dds@hotmail.com>, "adnan mubeen" <eamubeen1@gmail.com>, anzarulhaq@hotmail.com, "ASIF ZAFAR" <mirzaasifzafar@mail.com>, "atif muhammad" <atif165@hotmail.com>
    Date: Saturday, June 27, 2009, 10:05 PM
 
 
 
 
        This is only content we nee to dispplay.
';
 
$toBeDeleted = array(
        "~^(Date|From|Subject|To|Other-Headers|More-Headers):.*$~",
        "~^.*Microsoft brings you a new way to.*$~",
        "~^.*Yahoo is your friend.*$~",
        "~^.*partial snippet of other footer texts here.*$~",
        "~^.*more partial snippet of other footer texts here.*$~",
);
$msgBody = preg_replace($toBeDeleted,'',$msgBody);
$msgBody = preg_replace("~[\r\n]+~",'.',$msgBody);
echo $msgBody; 
?>

Open in new window

0
 
LVL 2

Expert Comment

by:thinkingman2
ID: 24774214
Strange& no reason for it not to. Will reexamine.
0
 
LVL 2

Expert Comment

by:thinkingman2
ID: 24774507
Played around with it because it's a script I will use. Don't have the time to finish this now, but if you finish it, please post your findings:
$toBeDeleted = array(
        "~[\n\r]*[\s>*](Date|From|Subject|To|Cc|Bcc|More-Headers):[^\r\n]+($|[\r\n])~i",
        "~^--+.+$~",
        "~^.*Microsoft brings you a new way to.*$~",
        "~^.*Yahoo is your friend.*$~",
        "~^.*partial snippet of other footer texts here.*$~",
        "~^.*more partial snippet of other footer texts here.*$~",
);
$msgBody = preg_replace("~[\r]+~","\n\n\n\n",$msgBody);
$msgBody = preg_replace($toBeDeleted,'!!!"""!!!',$msgBody);
//$msgBody = preg_replace("~[\r\n]+~",'.',$msgBody);
$msgBody = preg_replace("~[\r\n]+[\s^\r\n]+[\r\n]+~","\r",$msgBody);
$msgBody = preg_replace("~[\r\n]+[\s^\r\n]+[\r\n]+~",'.',$msgBody);
echo "<pre>{$msgBody}</pre>"; 

Open in new window

0
 

Accepted Solution

by:
Naveed_Manzoor earned 0 total points
ID: 25378730
Its still not working as per my requirements... have u got the time to finish it?
0

Featured Post

Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What do responsible coders do? They don't take detrimental shortcuts. They do take reasonable security precautions, create important automation, implement sufficient logging, fix things they break, and care about users.
Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question