Link to home
Start Free TrialLog in
Avatar of Sqlspider
SqlspiderFlag for United States of America

asked on

How to use Preg_match to get string that continues on next line

Hello all again.

I was given help on a topic earlier that gave the following:

if(preg_match[b]("/(~TRN\*[^\*]+\*[^\*]+)[/b]/",file_get_contents($oldfile),$matches))
	 {
	 // Match found! Renaming file!
     $newfile = str_replace(array(".txt","*"),array("_","-"),$oldfile.$matches[0]).".txt";
     echo "Renaming file from {$oldfile} to {$newfile}...". "<br /> <br />";
     }
     else
     {
     // No match found!
     }
     }

Open in new window



Thank you again gr8gonzo on the help. The code works great until it come across string that wraps to the nextline.
example:
89245*000087726*01*052000113*DA*18790741*20120312~T
RN*1*1QG70794930
*1411289245*000087726

I tried to play with regular expressions and was unable to get the Part in bold to work when the line wraps or carriage returns.  I also tried doing
echo substr($str,124,11) . "<br /> <br />"; but even though that gave me what I was looking for in the case above it fail for other files.

NON************20120312~T
RN*1*1QG70795851
*1411289245*  

I could not find any sites that explicitly showed how to use the regular expressions.  

Thanks for any help with this.
Avatar of Shaun McNicholas
Shaun McNicholas
Flag of United States of America image

If you are looking for the regular expression syntax for finding carriage returns I believe its \r
So replace the replacement line with this.

$newfile = str_replace(array(".txt","*","\r","\n"),array("_","-","",""),$oldfile.$matches[0]).".txt";

That will replace any carriage returns with nothing (or delete them)
And \n for New Line gets replaced with nothing as well.
Avatar of Sqlspider

ASKER

Ok I understand you comment but the problem is with preg_match.  for example.
If i take the text
89245*000087726*01*052000113*DA*18790741*20120312~T
RN*1*1QG70794930*1411289245*000087726

Open in new window


preg_match ("/(~TRN\*[^\*]+\*[^\*]+)/") will not grab ~T and then pick up the rest of the pattern.

If the text is the following:
89245*000087726*01*052000113*DA*18790741*20120312
~TRN*1*1QG70794930*1411289245*000087726

Preg_match ("/(~TRN\*[^\*]+\*[^\*]+)/") captures the pattern just fine.
Well then just do a replace function on the $oldfile before doing the preg_match check
Like this

$fileContents = file_get_contents($oldfile);
$fileContents = str_replace(array("\r","\n"),array("","");

if(preg_match[b]("/(~TRN\*[^\*]+\*[^\*]+)[/b]/",$fileContents,$matches))
	 {
	 // Match found! Renaming file!
     $newfile = str_replace(array(".txt","*"),array("_","-"),$oldfile.$matches[0]).".txt";
     echo "Renaming file from {$oldfile} to {$newfile}...". "<br /> <br />";
     }
     else
     {
     // No match found!
     }
     }

Open in new window

Ok I tried that and I get this

Parse error: syntax error, unexpected ';'
for this section of the code
$fileContents = str_replace(array("\r","\n"),array("","");

Yes I do like the way you put it but not sure why it is not working.
Sorry about that its missing a closing ) in the line!


$fileContents = str_replace(array("\r","\n"),array("",""));
Ok I tried what was stated maestropsm but maybe I am missing something. I was successful with the following until I hit another issue.

// Get an array of all files in the current directory that end in .txt
	 $txtfiles = glob("c:\splitting\*.txt"); 

	 // Loop through the files
	 foreach($txtfiles as $oldfile)
	 {
	 // Skip files that already look like they've been processed
	 if(strpos($oldfile,"_~")) continue;
	 
	 // Otherwise, process the rest of the files
	 $fileContents = file_get_contents($oldfile);
	 
	 
     if(preg_match("/((~TR|N)\*[^\*]+\*[^\*]+)/",$fileContents,$matches))
	 {
	 // Match found! Renaming file!
     $newfile = str_replace(array(".txt",""),array("_","_"),$oldfile.$matches[1]).".txt";
     echo "Renaming file from {$oldfile} to {$newfile}...". "<br /> <br />";
	 rename($oldfile,$newfile);
     }
     else
     {
     echo 'No match found!'. "<br /> <br />";
     
     }
     }

Open in new window


The change I made was in the pre_match   if(preg_match("/((~TR|N)\*[^\*]+\*[^\*]+)/",$fileContents,$matches))

This seemed to fix a few issues but not all.

Again in the txt file I am trying to capture the pattern that matches ~TRN*1*1Q54896*.

Please note that the *1Q54896* portion can vary.  I was successful in getting the echo to show on the screen properly but when I do the rename($oldfile,$newfile) I get the following on a few files

Warning: rename(c:\splitting\ERA_uhc_batch 1 to 22_20120313_013022_S_ER073560_16.txt,c:\splitting\ERA_uhc_batch 1 to 22_20120313_013022_S_ER073560_16_N_1 _1QG80786469.txt) [function.rename]: The filename, directory name, or volume label syntax is incorrect. (code: 123)

Open in new window

. Uncertain what happened here.
Any help pls on this issue.
I'm not the worlds biggest regular expression expert but I think this might do it.

if(preg_match("/((~TRN|~TR\rN|~TR\nN|~T\rRN|~T\nRN|~\rTRN|~\nTRN)\*[^\*]+\*[^\*]+)/",$fileContents,$matches))

Replace your regular expression with the above.

In the first condition using | as an or statement - I believe this will match any of the possible iterations of the ~TRN with spaces or carriage returns in between those letters and then continue through the remainder of the expression.
Avatar of North2Alaska
I'm no Regex expert but doesn't the '~' need to be escaped?  '\~'
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you all for the help.  Ray_Paseur's solution is what worked.  I did have to do a bit more fine tuning but the example you gave in the str_replace is what i used to cause the text to populate on its own line.  Once I did that I used the Preg_match without the added Regex and it worked.

Again you all are Top's in my book. I am trying more and more to get as good as you all are!!!!
Thanks for the points and thanks for using EE!  This was a great question.  All the best, ~Ray