Solved

preg_match() problem in php

Posted on 2012-04-06
9
644 Views
Last Modified: 2012-04-07
I have not learned regexp or how to write patterns yet. I am reading articles and tuts but so far can't find one to do my particular need. I have a string like so:

$str = "This is my string and it could be up to 200 characters long [But it never will be]";

My problem is I need to return the text inside the square brackets using preg_match(). I can do it with a function very easily but my boss wants it done with preg_match() as he thinks this will be quicker. Here is a simple function that works.

function strip_filename($string) {
            if ($string != '') {
            $sub_string = substr($string, strpos($string, '[')+1, strlen($string));
            $result = substr($sub_string, 0, (strlen($sub_string)-1));
            return $result;            
            }
      }

That gets me text between the brackets. In this case, all the strings to be searched will end in a ']' closing bracket but it might be better to assume it will possibly have other characters after the last ']' character which my function would not work for as it is. Any help would be appreciated.

Just to be clear, I do not want help to fix my function, it's not broken. I want help to write a regular expression to find the text between two characters those being a '[' and a ']'

Thanks in advance
0
Comment
Question by:Mark Brady
  • 3
  • 3
  • 2
  • +1
9 Comments
 
LVL 31

Expert Comment

by:Frosty555
Comment Utility
Hi Elvin66,

This is what you're looking for:

$str = "This is my string and it could be up to 200 characters long [But it never will be]";
$result = preg_match_all( "/(\[)(.*)(\])/",   $str )

This will return an array of results. The value result[2] will contain the text inbetween the square brackets.

Test it out here and you'll see what I mean:
http://www.functions-online.com/preg_match.html

If you have multiple square  bracket pairs and you want all of the data, you can use preg_match_all() with the same parameters. The result that is returned is a little different. Again, test it out here:

http://www.functions-online.com/preg_match_all.html
0
 
LVL 31

Assisted Solution

by:Frosty555
Frosty555 earned 167 total points
Comment Utility
Regular expressions is a bit tricky to wrap your head around, but here's the basic idea of what the pattern I used is doing:

The pattern:
/(\[)(.*)(\])/

Can be broken down into these parts

/   (\[)  (.*)  (\])   /
1    2     3     4     5

Open in new window


#1 and #5 are just beginning and ending tokens. They could have been any character. I used a forward slash in this case.
#2 is the opening square bracket, escaped with a backslash
#3 means "zero or more of any character"
#4 is the closing square bracket, escaped with a backslash

preg_match returns the parts of the string that matches those patterns. So result[1] is the opening bracket, result[2] is the inner contents, and result[3] is the closing bracket.
0
 
LVL 74

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 166 total points
Comment Utility
...and Frosty555's pattern will suit you just fine... right up to the point where your string contains multiple brackets  : \

The "problem" with Frosty555's pattern is that he is using a dot-star. This construct is a greedy (under "Possessive Quantifiers") one. The dot-star will try to consume as much as possible before it declares a match. What this means for a string containing multiple brackets (in our scenario here) is that the opening bracket will match, and then the dot-star will consume everything up to the end of the string, at which point the regex engine will backtrack until it finds the first closing bracket. The engine will then declare success.

For example, given this string:

This is my string and it could be up to 200 characters long [But it never will be], but then again, [it might].

...the value in result[2] will be:

But it never will be], but then again, [it might

Probably not what you were looking for  : \

There are a couple of solutions. You can opt for the non-greedy version of dot-star:

.*?

This will cause the regex engine to try and match as little as possible before declaring a match. Instead of using any form of dot-star, you could instead match any character not a closing bracket:

[^\]]*

The effect, in either case, is that given the same sample string I used above, the value in result[2] would now be:

But it never will be

...and:

it might

I highly recommend the following site for learning regex:  www.regular-expressions.info . The site is especially helpful if you use regex in more than one language.
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
I'd like to step back from the question a little bit and suggest that your boss is well-intentioned, and learning REGEX has a certain value, but in terms of "quicker" the emphasis is misplaced.  Try this... Take your function and use microtime() to time it - just get the difference between the before and after times.  Then take a regular expression algorithm and use microtime() to time it.  Compare the times.  Then buy your boss a beer and discuss the value of spending a day of your life to optimize a process that completes in microseconds.  

If you really want to make things quicker, find all of the MySQL queries with SELECT * and change them to select only the required columns!  Just a thought, ~Ray
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 20

Author Comment

by:Mark Brady
Comment Utility
Haha well thought out Ray and I couldn't agree more. I just moved from New Zealand to Sunny Florida and have just started a new job over here. Funny thing, this was on my first day. I was given some data to extract from xml files on the server so I wrote the code to l,oad each cone and get the tag they needed, and this question arose after he saw my function to extract the text between the brackets. They apparently like minimal coding in their scripts and that is fine but like you say, let's see how fast both methods really are. I will do exactly that. Thank you all for your input. I will present him with the preg_match_all option and let him decide but food for thought!

I must admit I have danced around the regex issue (not learned it when I should have) because unlike almost all other php it was not easy to learn. I guess I need to actually do some work and learn it now :) Thanks a bunch guys.
0
 
LVL 20

Author Comment

by:Mark Brady
Comment Utility
Maybe I'm missing something here. When I run this code...

$str = "This is my string and it could be up to 200 characters long [But it never will be]";
$result = preg_match_all('/(\[)(.*)(\])/', $str);
echo $result[2];

Which is what you gave me to test, I get a blank screen. Nothing in var_dump() either and nothing in the $result variable. Is there a typo in there somewhere?
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 167 total points
Comment Utility
See http://www.laprbass.com/RAY_temp_elvin66.php
<?php // RAY_temp_elvin66.php
error_reporting(E_ALL);
echo "<pre>";

// TEST DATA FROM THE POST AT EE
$str = "This is my string and it could be up to 200 characters long [But it never will be]";

// A REGULAR EXPRESSION TO ISOLATE DATA INSIDE BRACKETS
$rgx
= '#'        // REGEX DELIMITER
. '\['       // ESCAPED BRACKET
. '(.*?)'    // GROUP OF ANYTHING
. '\]'       // ESCAPED BRACKET
. '#'        // REGEX DELIMITER
;

// USE THE REGULAR EXPRESSION
preg_match_all($rgx, $str, $mat);

// THE ANSWER IS HERE
var_dump($mat[1][0]);


// ANOTHER TEST DATA STRING
$str = "This is my string [and it could be] up to 200 characters long [But it never will be]";
preg_match_all($rgx, $str, $mat);

// SHOW EVERYTHING THE REGEX CREATED
var_dump($mat);

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
On my server, the substring function is faster than the regular expression.  That said, 10,000 iterations of either one completes in a few milliseconds.  The moving parts of this script start around line 150.
See http://www.laprbass.com/RAY_temp_elvin66_2.php
REGEX STOPPED 24.167 SUBSTR STOPPED 17.054
<?php // RAY_temp_elvin66_2.php
error_reporting(E_ALL);


// DEMONSTRATE A SCRIPT TIMER FOR ALL OR PART OF A SCRIPT PHP 5+
// MAN PAGE http://php.net/manual/en/function.microtime.php


class StopWatch
{
    protected $a; // START TIME
    protected $s; // STATUS - IF RUNNING
    protected $z; // STOP TIME

    public function __construct()
    {
        $this->a = array();
        $this->s = array();
        $this->z = array();
    }

    // A METHOD TO REMOVE A TIMER
    public function reset($name='TIMER')
    {
        // RESET ALL TIMERS
        if ($name == 'TIMER')
        {
            $this->__construct();
        }
        else
        {
            unset($this->a[$name]);
            unset($this->s[$name]);
            unset($this->z[$name]);
        }
    }

    // A METHOD TO CAPTURE THE START TIME
    public function start($name='TIMER')
    {
        $this->a[$name] = microtime(TRUE);
        $this->z[$name] = $this->a[$name];
        $this->s[$name] = 'RUNNING';
    }

    // A METHOD TO CAPTURE THE END TIME
    public function stop($name='TIMER')
    {
        $ret = NULL;

        // STOP ALL THE TIMERS
        if ($name == 'TIMER')
        {
            foreach ($this->a as $name => $start_time)
            {
                // IF THIS TIMER IS STILL RUNNING, STOP IT
                if ($this->s[$name])
                {
                    $this->s[$name] = FALSE;
                    $this->z[$name] = microtime(TRUE);
                }
            }
        }

        // STOP ONLY ONE OF THE TIMERS
        else
        {
            if ($this->s[$name])
            {
                $this->s[$name] = FALSE;
                $this->z[$name] = microtime(TRUE);
            }
            else
            {
                $ret .= "ERROR: CALL TO STOP() METHOD FOR '$name' IS NOT RUNNING";
            }
        }

        // RETURN AN ERROR MESSAGE, IF ANY
        return $ret;
    }

    // A METHOD TO READ OUT THE TIMER(S)
    public function readout($name='TIMER', $dec=3, $m=1000, $eol=PHP_EOL)
    {
        $str = NULL;

        // GET READOUTS FOR ALL THE TIMERS
        if ($name == 'TIMER')
        {
            foreach ($this->a as $name => $start_time)
            {
                $str .= $name;

                // IF THIS TIMER IS STILL RUNNING UPDATE THE END TIME
                if ($this->s[$name])
                {
                    $this->z[$name] = microtime(TRUE);
                    $str .= " RUNNING ";
                }
                else
                {
                    $str .= " STOPPED ";
                }

                // RETURN A DISPLAY STRING
                $lapse_time = $this->z[$name] - $start_time;
                $lapse_msec = $lapse_time * $m;
                $lapse_echo = number_format($lapse_msec, $dec);
                $str .= " $lapse_echo";
                $str .= $eol;
            }
            return $str;
        }

        // GET A READOUT FOR ONLY ONE TIMER
        else
        {
            $str .= $name;

            // IF THIS TIME IS STILL RUNNING, UPDATE THE END TIME
            if ($this->s[$name])
            {
                $this->z[$name] = microtime(TRUE);
                $str .= " RUNNING ";
            }
            else
            {
                $str .= " STOPPED ";
            }


            // RETURN A DISPLAY STRING
            $lapse_time = $this->z[$name] - $this->a[$name];
            $lapse_msec = $lapse_time * $m;
            $lapse_echo = number_format($lapse_msec, $dec);
            $str .= " $lapse_echo";
            $str .= $eol;
            return $str;
        }
    }
}



// INSTANTIATE THE STOPWATCH OBJECT
$sw  = new Stopwatch;

// DEFINE THE SUBSTR FUNCTION
function strip_filename($string)
{
    if ($string != '')
    {
        $sub_string = substr($string, strpos($string, '[')+1, strlen($string));
        $result = substr($sub_string, 0, (strlen($sub_string)-1));
        return $result;
    }
}

// DEFINE A REGULAR EXPRESSION TO ISOLATE DATA INSIDE BRACKETS
$rgx
= '#'        // REGEX DELIMITER
. '\['       // ESCAPED BRACKET
. '(.*?)'    // GROUP OF ANYTHING
. '\]'       // ESCAPED BRACKET
. '#'        // REGEX DELIMITER
;


// DO A MEANINGFUL AMOUNT OF WORK TO MAKE THE TIMINGS USEFUL
$cnt = 10000;
$sw->start('REGEX');
while ($cnt)
{
    $cnt--;

    // TEST DATA FROM THE POST AT EE
    $str = "This is my string and it could be up to 200 characters long [But it never will be]";

    // USE THE REGULAR EXPRESSION
    preg_match_all($rgx, $str, $mat);
}
$sw->stop('REGEX');


// DO A MEANINGFUL AMOUNT OF WORK TO MAKE THE TIMINGS USEFUL
$cnt = 10000;
$sw->start('SUBSTR');
while ($cnt)
{
    $cnt--;

    // TEST DATA FROM THE POST AT EE
    $str = "This is my string and it could be up to 200 characters long [But it never will be]";

    // USE THE SUBSTRING FUNCTION
    $mat = strip_filename($str);
}
$sw->stop('SUBSTR');


// SHOW THE TIMERS
echo $sw->readout();

Open in new window

Best to all, over-and-out, ~Ray
0
 
LVL 20

Author Closing Comment

by:Mark Brady
Comment Utility
Thanks to all for shedding some light on regex for me. Some really good information in this post so I thank you all. I'm certainly going to point out that the substr() seems to be faster so perhaps leaving my function as it is which works fine is the better solution for this one. Oh well, it's off to study regular expressions yummy!
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

These days socially coordinated efforts have turned into a critical requirement for enterprises.
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now