Link to home
Start Free TrialLog in
Avatar of abenbow
abenbow

asked on

can preg_match help?

ok

this is part of the source of a page (view source)

      <span class="categorieBlockTitel">Categorieën</span>
                  <a href="/categorie/337487">Aangemelde sportwagens</a><br>
                  <a href="/categorie/337679">Daniel den Hoed kliniek</a><br>
                  <a href="/categorie/338998">Deelnemers over hun auto.</a><br>

                  <a href="/categorie/340186">Foto's vorige edities</a><br>
                  <a href="/categorie/337570">MM2006 nieuws</a><br>
                  <a href="/categorie/340270">Passagiers over de MM</a><br>
                  <a href="/categorie/337484">SPONSOR OF DONEER!</a><br>
      </div>      


can pregmatch get me

                  <a href="/categorie/337487">Aangemelde sportwagens</a><br>
                  <a href="/categorie/337679">Daniel den Hoed kliniek</a><br>
                  <a href="/categorie/338998">Deelnemers over hun auto.</a><br>

                  <a href="/categorie/340186">Foto's vorige edities</a><br>
                  <a href="/categorie/337570">MM2006 nieuws</a><br>
                  <a href="/categorie/340270">Passagiers over de MM</a><br>
                  <a href="/categorie/337484">SPONSOR OF DONEER!</a><br>

and display it on a page?

TIA


Avatar of BogoJoker
BogoJoker

<?php
$str = '<span class="categorieBlockTitel">Categorieën</span>
               <a href="/categorie/337487">Aangemelde sportwagens</a><br>
               <a href="/categorie/337679">Daniel den Hoed kliniek</a><br>
               <a href="/categorie/338998">Deelnemers over hun auto.</a><br>

               <a href="/categorie/340186">Foto's vorige edities</a><br>
               <a href="/categorie/337570">MM2006 nieuws</a><br>
               <a href="/categorie/340270">Passagiers over de MM</a><br>
               <a href="/categorie/337484">SPONSOR OF DONEER!</a><br>
     </div>';

preg_match('/Categorieën<\/span>(.*?)<\/div>/', $str, $matches);
$answer = htmlentities($matches[1]);
print $answer;
?>

A few questions:
1) Will that span always have the same name: Categorieën?  If not I can edit that part out a little.
2) I displayed the actual html text (not the links) at the end using htmlentities.  If you actually wanted those links to appear, just remove the htmlentities(), but leave $answer = $matches[1];

Not tested but it looks good to me.  Just get the information into $str then run that preg_match().  Enjoy,
Joe P
Avatar of abenbow

ASKER

ah

ok, my bad

this bit

              <a href="/categorie/337487">Aangemelde sportwagens</a><br>
               <a href="/categorie/337679">Daniel den Hoed kliniek</a><br>
               <a href="/categorie/338998">Deelnemers over hun auto.</a><br>

               <a href="/categorie/340186">Foto's vorige edities</a><br>
               <a href="/categorie/337570">MM2006 nieuws</a><br>
               <a href="/categorie/340270">Passagiers over de MM</a><br>
               <a href="/categorie/337484">SPONSOR OF DONEER!</a><br>

is actually dynamic so I don't necessarily know what will be there. Can I still grab it?

Avatar of abenbow

ASKER

doh!

ok. having a thick moment.

do i just replace
$str = '<span class="categorieBlockTitel">Categorieën</span>
               <a href="/categorie/337487">Aangemelde sportwagens</a><br>
               <a href="/categorie/337679">Daniel den Hoed kliniek</a><br>
               <a href="/categorie/338998">Deelnemers over hun auto.</a><br>

               <a href="/categorie/340186">Foto's vorige edities</a><br>
               <a href="/categorie/337570">MM2006 nieuws</a><br>
               <a href="/categorie/340270">Passagiers over de MM</a><br>
               <a href="/categorie/337484">SPONSOR OF DONEER!</a><br>
     </div>';

with

$str = 'http:\//www.whatevermyurlis.com';

??

TIA
sure, but it would be this:
$str = file_get_contents('http://www.whatevermyurlis.com');
$str = file_get_contents('http://www.whatevermyurlis.com');
Avatar of abenbow

ASKER

ok

<?php
$str = file_get_contents('http://maartenmemorial.web-log.nl');

preg_match('/Categorieën<\/span>(.*?)<\/div>/', $str, $matches);
$answer = htmlentities($matches[1]);
print $answer;
?>

gives me a blank page.

ASKER CERTIFIED SOLUTION
Avatar of BogoJoker
BogoJoker

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The period will "match any character except newline (by default)". You can alter that behavior with the "s" modifier:

"If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier."

i.e., preg_match('/pattern/s' ...);

Or:

preg_match('/Categorieën<\/span>(.*?)<\/div>/s', $str, $matches);
Avatar of abenbow

ASKER

PERFECT!!

Thanks

:)
Also note that the links wont work, because they are relative to the server you got them from!
Here is a fix:
<?php
$str = file_get_contents('http://maartenmemorial.web-log.nl');
$str = str_replace("\n", "", $str);
$str = str_replace("\r", "", $str);
preg_match('/Categorieën<\/span>(.*?)<\/div>/', $str, $matches);
$answer = $matches[1];
$answer = preg_replace('/href="(.*?)"/', 'href="http://maartenmemorial.web-log.nl' . "\$2\"", $answer);
print $answer;
?>
You should probably ask the site for permission to do this as well.