• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 427
  • Last Modified:

Find and Replace text in String

Hi there,
I've got a string in a variable,

$text = "This is some sample ##00012## so it's just stuff bla bla. There could be more ##010101## and so on";

What I want to do is something like this:
•      First, find all instances of ##.....## in $text and extract what's between and store all of them in an array, say $inputArray
•      I'll then look up the values of $inputArray in the database individually, then return results and store in another array, called $outputArray (I can do this)
•      Replace these instances of ##.....## with corresponding data from $outputArray and store in $outputText

I hope this doesn’t sound too confusing. I don’t know where to start. Appreciate your help. Thanks.
0
skylabel
Asked:
skylabel
  • 10
  • 5
  • 4
  • +4
4 Solutions
 
gr8gonzoConsultantCommented:
Just to get you started:

1. Use preg_match_all to match all instances of ##....## and get them all into an array, like:
if(preg_match_all("/##([0-1]+)##/",$text,$inputArray))
{
  ... matches were found and put into $inputArray ...
}

2. Don't look up the values individually if you can minimize the number of queries by combining them into one query. For example, instead of looking up ##010101## and ##00012## with two queries, do:

SELECT something FROM yourTable WHERE idString IN ('010101','00012');

3. Use str_replace with arrays, like:

$outputText = str_replace(  
   array("##010101##","##00012##"),
   array("Oh one oh one oh one", "Zero zero zero twelve"),
   $text);
0
 
Ray PaseurCommented:
Probably you would start with a regular expression.  I'll show you how to do that.
0
 
ukerandiCommented:
you can use str_replace function to replace # values
echo str_replace("","#",$text );
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Ray PaseurCommented:
http://www.laprbass.com/RAY_temp_skylabel.php
<?php // RAY_temp_skylabel.php
error_reporting(E_ALL);

// TEST DATA FROM THE POST AT EE
$text = "This is some sample ##00012## so it's just stuff bla bla. There could be more ##010101## and so on";

// CONSTRUCT A REGULAR EXPRESSION
$regex
= '/'                      // REGEX DELIMITER
. '(##)'                   // GROUP #1 OF EXACTLY TWO HASHMARKS
. '(.*?)'                  // GROUP #2 OF ANYTHING OR NOTHING, UNGREEDY - WE WANT THIS PART
. '(##)'                   // GROUP #3 OF EXACTLY TWO HASHMARKS
. '/'                      // REGEX DELIMITER
;
// MAKE THE MATCH
preg_match_all($regex, $text, $match);

// SHOW ONLY THE THINGS YOU WANT - IN GROUP 2
echo "<pre>";
var_dump($match[2]);

// SHOW HOW TO GET THE VALUES FROM THE ARRAY
foreach ($match[2] as $string)
{
    echo "<br/>$string";
}

Open in new window

0
 
DerokorianCommented:
<?php

$outputArray = array('00012'=>'New Text','010101'=>'Replaced Text');
$text = "This is some sample ##00012## so it's just stuff bla bla. There could be more ##010101## and so on";

$pattern = '/##([0-9]+)##/';

preg_match_all($pattern,$text,$matches);

foreach( $matches[1] as $index ) {
   $text = str_replace('##'.$index.'##',$outputArray[$index],$text);
}

echo $text;

Open in new window


HTH
0
 
Ray PaseurCommented:
Regarding this part, "I'll then look up the values of $inputArray in the database individually, then return results and store in another array, called $outputArray (I can do this)" -- you may want some way to coordinate the two arrays.  If your values from the $text string are unique, you can use them as keys to the elements of $outputArray.  If they are not unique you might want to have an object that holds each of the values from $inputArray and its corresponding value from $outputArray.  You can make an array of these simple objects and that will let you handle the replacements in the text string.  Note the use of the "count" argument in this function.  In the case that there may be non-unique data, you might want to do the replacements one at a time in the order given by the original $text.
http://php.net/manual/en/function.str-replace.php
0
 
Cornelia YoderArtistCommented:
A far easier way would be to use explode().

http://us2.php.net/manual/en/function.explode.php

$text = "This is some sample ##00012## so it's just stuff bla bla. There could be more ##010101## and so on";

$inputArray = explode('##',$text);

This makes an array of all the strings between all instances of ##.

You can process that array however you want, then use implode() to put it back together.

$outputText = implode('##',$outputArray);

http://us2.php.net/manual/en/function.implode.php
0
 
Ray PaseurCommented:
Here is how explode() works:
http://www.laprbass.com/RAY_temp_yodercm.php
Outputs:
Array
(
    [0] => This is some sample
    [1] => 00012
    [2] =>  so it's just stuff bla bla. There could be more
    [3] => 010101
    [4] =>  and so on
)
<?php // RAY_temp_yodercm.php
error_reporting(E_ALL);
echo "<pre>";

$text = "This is some sample ##00012## so it's just stuff bla bla. There could be more ##010101## and so on";

$inputArray = explode('##',$text);

// WHAT HAVE WE GOT?
print_r($inputArray);

Open in new window

0
 
ob2sCommented:
Hi,

See if this will do what you need.  You'll need to replace the contents of the function db_lookup() with real code to lookup the value in your database and return the results.  I've provided a fake "db lookup" so that you have a "working" example to test.

?php 
error_reporting(E_ALL);

function db_lookup($keyval)
{
	// stub -- replace with real db lookup code
	$fake_db = array( '00012' => 'stuff', '010101' => 'flotsam and jetsam' );

	return isset($fake_db[$keyval]) ? $fake_db[$keyval] : '';
}

$text = "This is some sample ##00012## so it's just stuff bla bla. There could be more ##010101## and so on";
$regex = "/##(\d+)##/";

preg_match_all($regex, $text, $match);
foreach ($match[1] as $string)
{
	$text  = str_replace("##$string##", db_lookup($string), $text);
}

echo "<pre>$text</pre>";
?>

Open in new window

0
 
Cornelia YoderArtistCommented:
Good grief!  Why would you want to use (or advise someone to use) regex for something this simple that can be done in 3 lines of code using explode() and implode()?
0
 
DerokorianCommented:
And how would you do it in 3 lines of code with explode and implode? 1 line to explode 1 to implode so you're saying you can process each entry in the resulting array, substituting where appropriate with only one line of code? I do declare maybe you could show this line of code!
0
 
Cornelia YoderArtistCommented:
$array = explode("##",$text);
$outputArray = processingthearray($array);
$outputText = implode("##",$outputArray);

The asker said he could do the array processing himself.  This is how to do what he asked about -- take care of the ## delimiters.
0
 
ob2sCommented:
As I read the OP, the values are "wrapped"

by a pair of ## characters, not separated by a single ## delimiter.
0
 
Cornelia YoderArtistCommented:
Yes, ob2s, exactly.  The explode separates all the segments that are delimited by the ##.

So after the explode, the array for

"This is some sample ##00012## so it's just stuff bla bla. There could be more ##010101## and so on";

will contain these 5 values, namely:

This is some sample
00012
 so it's just stuff bla bla. There could be more
010101
 and so on
0
 
Cornelia YoderArtistCommented:
I used a simple string reverse on the seqments of the text (in place of the "look up the values of $inputArray in the database individually, then return results and store in another array") as an example of the array processing.  This code produces the output shown below:

elpmas emos si sihT##21000## erom eb dluoc erehT .alb alb ffuts tsuj s'ti os ##101010##no os dna ##


Each segment is individually processed (reversed in my example), and then reassembled into a single string with the ## delimiters in the same places.  I believe this is what the asker wants.

Isn't this a LOT easier than regex?


<?php

function processarray($inarray)
{
   $outarray=array();
   for ($i=0;$i<=count($inarray);$i++)

   {
      $outarray[$i] = strrev($inarray[$i]);
   }

return $outarray;


}



$text = "This is some sample ##00012## so it's just stuff bla bla. There could be more ##010101## and so on";

$array = explode('##',$text);

$outputArray = processarray($array);

$outputText = implode('##',$outputArray);

echo $outputText;

?>

Open in new window

0
 
ob2sCommented:
@yodercm
Isn't this a LOT easier than regex?
No, not "easier". Less characters typed, yes. And, yes, you saved one whole regex. But, the array you'd pass to processarray() not only contains the values in-between the left ##  and the right ## "brackets" (what the OP wanted),  the array also contains virtually everything from the original string--an unnecessary potential waste of memory. If the string contains 8MB of text, then the input AND output arrays will each contain ~8M as well. This isn't necessary for this task.

In addition, your processarray() function now has to be aware that it should only process the even numbered array elements.  Otherwise, it'll do an extra n+1 unnecessary db lookups, for the odd numbered elements that contain text that wasn't encapsulated by a ##..## wrapper--a big performance hit.  

Perhaps, I misunderstood the OP, but as I said before, I think of the ## pairs as "brackets" that envelope values to be looked up in the db.  The text not in one of these envelopes should remain unchanged.  Essentially, the OP is doing something like template variable substitution.  @skylabel please correct me if I misunderstood.

So, if done yodercm's way, the code might look something like this:
<?php
function processarray($array)
{
   for ($i=1;$i<count($array);$i+=2)
   {
      $array[$i] = strrev($array[$i]);  // replace with db lookup
   }
   return $array;
}

$text = "This is some sample ##00012## so it's just stuff bla bla. There could be more ##010101## and so on";

$array = explode('##',$text);

$outputArray = processarray($array);

$outputText = implode('##',$outputArray);

echo $outputText;
?>

Open in new window

It works and 20+ years ago I would have thought it was a cool hack.  Now, however, it looks like unecessarily increasing complexity in processarray() to justify using implode/explode. Without knowing skylabel's performance requirements, I generally prefer the solution that I posted earlier, rather than the one above, as the earlier solution requires fewer passes over the data (implode and explode iterate over arrays), eliminates the need for an output array entirely, and is more readable and maintainable, IMHO.
0
 
Cornelia YoderArtistCommented:
Well, if what you infer is true, and only the numbers in his string need to be processed, then yes, you just do the processing on even numbered elements, using $i+=2, but that is a trivial change.

The real important fact is that using explode/implode is SO MUCH easier than trying to sort out some regex string when you don't need to.  Regex is the most miserable and easy to screw up thing there is in all of php.

There is absolutely NOTHING readable, understandable, or maintainable about something like your line of code:
if(preg_match_all("/##([0-1]+)##/",$text,$inputArray))

It reminds me of the early days of programming when someone would lay a line of code on me and say "I'll bet you can't figure out what this does".  I shudder at the memory ....
0
 
DerokorianCommented:
Regex will give you just the numbers encapsulated by pound symbols. With implode/explode what happens when there is an arbitrary double pound not part of an encapsulation? Then $i+=2 no longer works! meaning in reality you still have to check and test every part of the array, not just the ones you HOPE have the information you desire.

As far as how readable that line is - its perfectly legible to someone who has taken the time to learn and understand regex (which is a very powerful tool).

If you have trouble understanding regex, I suggest you look at Ray's posts regarding it. You can break the pattern down bit by bit and comment each part individually so as to make it easier to understand when looking at it later. I find this technique exceedingly helpful in both developing regex and maintaining very complicated regex. For example you could define the pattern as such:

$pattern = '/'   // Opening delimiter
   .'##'         // 2 literal pound symbols
   .'('          // opening subset
   .'[0-9]+'     // one or more digits
   .')'          // closing subset
   .'##'         // 2 literal pound symbols
   .'/';         // closing delimiter
preg_match($pattern,$text,$inputArray);

Open in new window

0
 
gr8gonzoConsultantCommented:
@yodercrm - You're entitled to your opinion, but saying regexes are miserable and that there is nothing readable, understandable, or maintainable about them?

You could go to a foreign country and probably use hand gestures to get around instead of trying to learn the foreign language, but just because it's temporarily easier doesn't mean it's better.

Regexes have a purpose, and this is a very good example of when to use a regex instead of explode/implode. I regularly use both approaches in different projects because they are appropriate in different ways. In this case, none of us (maybe not even the author) can know for certain what the original text might contain, and that presents a problem that regexes are better at handling.

For example, what if the original text originates (even in part) from some random user who puts in a visual line separator like "##############" ? If you were to explode that on "##", you would end up with unreliable results in your array, meaning that you would need more code to test each element before you could reliably make use of it. I started writing out a whole paragraph of potential chain reactions of problems that could cascade from making assumptions about the values, but the short of the story is that with regexes, you're matching only things that follow a very specific mask, which will produce a LOT less false positives.

So offer up your explode() suggestion by all means - it's good to have options and alternatives, but don't bash other peoples' code just because you're not comfortable with using regexes. When you know them well, regular expressions are fantastic, extremely powerful tools.
0
 
Cornelia YoderArtistCommented:
I know them very well, thank you, and that's why I avoid them whenever possible.

Sure, if (as you say) the asker wants to learn a new foreign language to do such a simple task, it's up to him.  But if this problem is as simple as stated, regex is way more than it needs.

By the way, I haven't bashed anyone's code.  I only bash regex as a solution to something that doesn't need it.   Derokorian was the only one to make direct sarcastic bashing statements, not me.

For me, the simple to understand, quick to code, and easy to maintain/modify later counts a lot more than some complex "fantastic, extremely powerful tools" that really aren't needed for a simple task.
0
 
gr8gonzoConsultantCommented:
By the way...

"It reminds me of the early days of programming when someone would lay a line of code on me and say "I'll bet you can't figure out what this does".  I shudder at the memory ...."

Good programmers know that not all code is going to be easily-readable in English, which is why it's good programming practice to comment your code.

if(preg_match_all("/##([0-9]+)##/",$text,$inputArray))
{
}

...might seem cryptic to new programmers, but if you comment your code like the following:

// Find all numbers between ## ... ##. Example: ##123456##
if(preg_match_all("/##([0-9]+)##/",$text,$inputArray))
{
}

...then you're going to know what that line of code does without having to mentally process it. We're not working in assembly, nor are we likely dealing with a stump-the-programmer situation.
0
 
Cornelia YoderArtistCommented:
Good programmers make sure that all their code IS easily readable in English.  Lazy programmers use comments instead (which by the way, in my 35 years of professional programming, are usually out of date long before the code itself is).

That said, I do strongly recommend extensive commenting as shown by Derokorian if you choose to use any regex functions.  They are impossible to make easily readable in English.

Now, you guys can keep bashing me for as long as you want.  I stand by my simple explode/implode 3 lines of easily understandable code solution.  Maybe you will convince skylabel to learn that foreign language after all.
0
 
ob2sCommented:
I understand your feelings about general regex complexity, but my regex expression is reasonably concise:
$regex = "/##(\d+)##/";

Open in new window

Further, my regex is confined to one line of code, but by using explode/implode, you require changes in another part of the code (your processarray() function). My concern is that these two sections of code now need to be in synch for the life of the code. I like to avoid that kind of thing when I can.

That's said, I realize that people have different opinions about what is complex and what is "easy".  Let's see how skylabel feels.  :)

Cheers,
Fred

P.S.  I have my own "shudder memories" of working on a compiler where the previous developer thought using literal constants EVERYWHERE was "easier".  What should have been a one line code change turned into a days long effort to find every instance of "74" in the code (there were thousands), then figure out if it represented the byte length of the on-disk structure that I was trying to enlarge to 80 bytes, or not. Ah, the bad old days... ;-)
0
 
Cornelia YoderArtistCommented:
Or the days when you had to define every variable.

DECLARE FOUR VALUE(4);

Then later the value needed to change to 5, so the change was made

DECLARE FOUR VALUE(5);

Of course everywhere that variable was used in the code still said FOUR, because it was too hard to find all the instances of FOUR in the code and change them to FIVE.

Or the comments .....

I = I+1;   //Add 1 to I.

Shuddering uncontrollably now ...
0
 
gr8gonzoConsultantCommented:
@yodercrm - It's only simple if you make a lot of assumptions. You still have to check the values before you can use them, which is more code. You also have to write code to deal with the values that are NOT lookups, which is more code. You also have to recreate a new array before you can implode it, which is more memory. You also are processing one lookup at a time, which could be far slower than invoking regex engine in the first place. Also, since you're dealing with one array element at a time, you lose a lot of efficiency (one database query per lookup value, plus inability to deal with duplicates without even MORE code).

Take a look at the two approaches:

<?php

$text = "Attention: ##123##,
Issue number ##456## has been raised!";

// Option 1:
$array = explode('##',$text);
$outputArray = processarray($array);
$outputText = implode('##',$outputArray);
function processarray($inarray)
{
   $outarray=array();
   for ($i=0;$i<count($inarray);$i++)
   {
   		if(is_numeric($inarray[$i]))
   		{
   			// Fetch value from database for one lookup value
				$rs = mysql_query("SELECT value,lookupID FROM lookup WHERE lookupID = " . $inarray[$i]);
				$row = mysql_fetch_assoc($rs);
      	$outarray[$i] = $row["value"];
   		}
   		else
   		{
      	$outarray[$i] = $inarray[$i];
   		}
   }

	return $outarray;
}



// Option 2:
if(preg_match_all("/##([0-9]+)##/",$text,$inputArray))
{
	// Fetch all matched values from database
	$rs = mysql_query("SELECT value,lookupID FROM lookup WHERE lookupID IN (".implode(",",$inputArray[1]).")");
	while($row = mysql_fetch_assoc($rs))
	{
		$text = str_replace("##".$row["lookupID"]."##",$row["value"],$text);
	}
}

?>

Open in new window


The second approach is not only smaller in terms of amount of code, but it is more flexibile (you could easily add one line to eliminate dupes from the array), and all the syntax/masking is already handled by the regex. It also combines everything into one single database query, which is going to be far faster (and exponentially so) than processing things individually.

Yes, explode() may FEEL simpler at first, but when you take into consideration all the extra code you have to write to juggle all the other pieces, regexes become far simpler.
0
 
Cornelia YoderArtistCommented:
What you say would be true if skylabel has misstated his problem.  However, if what he wrote is correct, then all of that extraneous stuff you have added to make this method look bad is totally unnecessary and serves only to make a simple solution look complex when it isn't.

THREE simple, understandable lines of code is all it takes.  Skylabel already stated that he has the processing for the individual elements, that part of the code is not a factor in this decision.  He asked for an array to process, and that is exactly what explode gives him.

Just because you "love" the "fantastic, powerful tool" does not make it right for this job.

When you have a "fantastic powerful hammer" that you love, everything looks like a nail.

I'm beginning to resent all the attacks on my solution.  If you need points that badly, I'll ask skylabel to award them to you, but I sincerely hope he isn't confused by all these attempts to make a very simple solution look complex, just so he will learn your "fantastic powerful" new language for one little simple task.


Yes, explode() DOES feel simpler, because it IS.


.
0
 
skylabelAuthor Commented:
Thanks all, all these work well, but I've opted for the one that's easiest to understand and direct (to me)
0
 
gr8gonzoConsultantCommented:
@yodercrm - I honestly don't care about points. I'm here to educate and help. If I see someone going down a path that is eventually going to lead them into performance problems, then I call it out and show them a better path.

Contrary to what you might think, I almost always use explode and implode in my programming solutions because I like to avoid the overhead of the regex engine when I am confident that explode/implode will handle all the scenarios, so please do not incorrectly assume that I'm in love with some powerful hammer and that I see every problem as a nail to be solved with regexes.

You keep bringing up "3 lines" as if that were the reason it was simpler. It's easy to reduce my code for Option 2 down to 2 lines, actually:

if(preg_match_all("/##([0-9]+)##/",$text,$matches))
  array_walk($matches,"processarray");

On top of being less code, it would only be processing positive matches, and not every element in the array, which makes it that much more efficient. However, LESS code isn't always SIMPLER code, much less BETTER code, and there was a reason I didn't suggest that approach (and still don't).

The reason I presented my approach was primarily because it was the most efficient in not only reducing array processing but also (and most importantly) in reducing the database impact, which is the most frequent performance problem I see here on EE. It's very common to see people on here not caring about doing multiple database calls because everything feels quick in development mode, and then they complain later that their database is crawling, and we see bad code that is now replicated multiple times throughout their system and will take months to rework. All of that can be prevented by just suggesting the proper approach up front.

I think an important part of our roles here on EE is to make sure that people also get an understanding of what is a good approach in the long term. If someone is posting code that has some serious SQL injection or other security problem in it, then I bring up that as a potential problem for them to address, rather than ignoring it because that's the way they're already doing things.
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

  • 10
  • 5
  • 4
  • +4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now