How do I adjust this preg match statement (use multiple times on same url)

Experts,

I modified the preg match statement on the previous question to use with a URL.  It works with one preg match, but when I attempt to use two preg match statements on one URL, the first result appears twice. (I realize that there is probably a better way but I am just getting into preg match / regex type coding.)

The result of the second preg match should be:

 <div class="active_content blacksm">
 <br />
 &bull; Warranty: Lifetime<br />
 &bull; Color: Black<br />
 &bull; Length: 0.5m<br />
 &bull; Mfr: Cables To Go<br />
 &bull; Weight: 0.590lbs<br />
 </div>


Thanks for your help!!!

<?php
$url = file_get_contents('http://www.cproducts.com/product.asp?cat_id=2030&sku=40294');
preg_match('%<div style="float:left; clear:both;">(.*?)</div>%s',$url,$extA);  // this one works
preg_match('%<div class="active_content blacksm">(.*?)</div>%s',$url,$extB);   // this one repeats the same as the first, however it is different

echo $extA[1];
echo "<BR>";
echo $extB[1];
?>

Open in new window

rlb1Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Ray PaseurCommented:
Hey, rlb1.  We can't see what was in the previous question.  Give us some more information please.  Show us the input you have and the desired output you want.  REGEX is often a good tool, but there may be other creative ideas, too.

If you're like me, you might find that it's best to work with a REGEX cheat sheet at your left hand.  I like this one:
http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet/

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
rlb1Author Commented:
Ray,
Thanks!  Here is the previous code:  (I just took the code you provided me with and adjusted it a little.)

<?php // RAY_temp_rlb1.php
error_reporting(E_ALL);

$strtest = '<a title="some title here" href="some url here">';
preg_match('%<a title=[\"]some title here[\"] href=([^`]*?)[\"]>%',$strtest,$extA);
// 'You should get the result: some url here.
echo $extA[1];
 
Thanks for your help!!
Randy
Ray PaseurCommented:
This REGEX string says this:

1. Find a string starting with href=
2. Followed by a quote OR apostrophe OR any character, and capture this into a group
3. Followed by any number of any characters, and capture this into a group
4. Followed by a quote OR apostrophe OR (escaped) right wicket and capture this into a group
5. End REGEX and case-insensitive.

Please post back if you have any questions.  Best, ~Ray
<?php // RAY_temp_rlb1.php
error_reporting(E_ALL);
echo "<pre>" . PHP_EOL;

// TEST DATA
$arr = array
( '<a title="some title here" href="some url here">THING</a>'
, '<a title="some title here" href=urli>'
, "<a title='some title here' href='some url '>"
)
;

// A REGULAR EXPRESSION
$rgx = <<<REGEX
#href=("|'|.)(.+?)("|'|\>)#i
REGEX;

// GET THE HREF= STRING - WHETHER BOUNDED BY QUOTES OR NOT
foreach ($arr as $str)
{
    $new = preg_match($rgx, $str, $ext);

    // THESE ARE EQUAL IF THE URL WAS WRAPPED WITH A QUOTE OR APOSTROPHE DELIMITER
    if ($ext[1] == $ext[3])
    {
        $url = $ext[2];
    }
    else
    {
        $url = $ext[1] . $ext[2];
    }
    var_dump($url);
}

Open in new window

Rowby Goren Makes an Impact on Screen and Online

Learn about longtime user Rowby Goren and his great contributions to the site. We explore his method for posing questions that are likely to yield a solution, and take a look at how his career transformed from a Hollywood writer to a website entrepreneur.

rlb1Author Commented:
Ray,
I am a little lost.  I have worked on this for a few hours and I cannot figure this out.  Regex is tough!! and I am still getting my hands around arrays.

I am trying to obtain the data within these tags from a URL
 
$url = file_get_contents('http://www.cproducts.com/product.asp?cat_id=2030&sku=40294'


%<div style="float:left; clear:both;">(.*?)</div>%s         (Description)

%<div class="active_content blacksm"><br />(.*?)</div>%s      (Specs)

I am also trying to get only the "40294a.jpg" out of this line.

<img src="/product-images/40294/50/40294a.jpg" style="border:solid 1px #D6D6D6; border-collapse:separate;" />


If you can give me some assistance coding this, I can better figure this out...  Thank You!!!
Ray PaseurCommented:
Get this book and work through the examples.  It will not make you a pro, but it is very readable and has great examples.  It will give you some foundation in PHP and all of your questions will be easier to frame when you post them here at EE.
http://www.sitepoint.com/books/phpmysql4/

I'll take a look at that URL in a moment...
Ray PaseurCommented:
Prints: 40294a.jpg
<?php // RAY_temp_rlb1.php
error_reporting(E_ALL);
echo "<pre>" . PHP_EOL;

// STATED GOAL: I am also trying to get only the "40294a.jpg" out of this line.

// TEST DATA
$tag = <<<TAG
<img src="/product-images/40294/50/40294a.jpg" style="border:solid 1px #D6D6D6; border-collapse:separate;" />
TAG;

// A REGULAR EXPRESSION
$rgx = <<<REGEX
#src=("|'|.)(.+?)("|'| |\>)#i
REGEX;

// GET THE STRING - WHETHER BOUNDED BY QUOTES OR NOT
$new = preg_match($rgx, $tag, $ext);

// THESE ARE EQUAL IF THE URL WAS WRAPPED WITH A QUOTE OR APOSTROPHE DELIMITER
if ($ext[1] == $ext[3])
{
    $url = $ext[2];
}
else
{
    $url = $ext[1] . $ext[2];
}

// ACTIVATE THIS TO SEE THE ISOLATED URL (FILE PATH)
// var_dump($url);

// GET THE FILE NAME FROM THE FILE PATH
$fnm = end(explode('/', $url));
echo $fnm;

Open in new window

Ray PaseurCommented:
When I tried the URL posted above, I got this output:

CREATIVE PRODUCTS

We're sorry, but we were unable to locate the file you requested.

Let's try this instead.  Visit the page you want us to scrape data from.  Use "View source" and copy the HTML.  Post that in the code snippet, and we can work with the posted data.

Regarding this, "Regex is tough!!" -- Yep.  It's a language made up almost entirely from punctuation, and it creates rules that interact in complex ways.  There are entire books about regular expressions.  It easily forms a semester of an engineering curriculum, so don't be surprised if it takes a while to master.  Most software developers never master regular expressions, and many who think they have mastered regex publish expressions that are full of holes and errors.  Example, the regex I posted above that says this:

#href=("|'|.)(.+?)("|'|\>)#i

It is wrong because the third group (terminator) should also contain an "or" condition for the blank, like this:

#href=("|'|.)(.+?)("|'| |\>)#i

That is because a blank would terminate a URL, since a URL would have to be URL-encoded, and the encoding would turn any blanks into plus signs.

I often find that I can use some combination of strpos(), substr(), and explode() to get the strings I want, and I can get those right faster than I can write the regular expressions.  There is no extra credit for using regex.  The reward is working code, gotten as fast and accurately as possible.  Just a thought...
rlb1Author Commented:
Ray,  
What am I missing here on this array?   I have attempted several things here and cannot get it to work.

Thanks for your help!

Randy


<?php
$url = file_get_contents('http://www.cablestogo.com/product.asp?cat_id=2030&sku=40294');
/*
//preg_match('%<div style="float:left; clear:both;">(.*?)</div>%s','%<div style="float:left; clear:both;">(.*?)</div>%s',$url,$extA);
preg_match('%<div style="float:left; clear:both;">(.*?)</div>%s',$url,$extA);

//echo $extA[0];
echo $extA[1];
echo $extA[2];
*/

preg_match("/(%<div style="float:left; clear:both;">(.*?)</div>%s|%<div style="float:left; clear:both;">(.*?)</div>%s)/i", $line, $url);
echo $line[1];
echo $line[2];


?>

Open in new window

rlb1Author Commented:
Ray, Got it to work with hours of manipulation!!  Thanks for your help!!
<?php
$data=file_get_contents('http://www.c.com/product.asp?cat_id=2030&sku=40294');
preg_match('%<div style="float:left; clear:both;">
(.*?)</div>%s',$data,$matches);

$a=$matches[1];
echo $a."<br />";


preg_match('%<div class="active_content blacksm">
	             <br />
(.*?)</div>%s',$data,$matches2);

$b=$matches2[1];
echo $b."<br />";

?>

Open in new window

rlb1Author Commented:
Thank you Ray!!
Ray PaseurCommented:
Congratulations!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.