Match all links which have specified text ?

I'm trying to crawl a forum but I don't know how to match all the links which have the text "showthread.php". Can anyone tell me how to do that with preg_match please ? Thank you very much.
phpdotnetAsked:
Who is Participating?
 
MasonWolfCommented:
In the future, please tell us what you want when you first ask the question. Always remember to include what you have and what you want. If you had been more specific, you could have had your answer last night.

Here's a function that can get what you're looking for:

function getURLvars($html)
{
       preg_match_all('@<a [^>]+showthread.php[^>]*>@', $html, $matches);
       foreach($matches[0] AS $match)
      {
            preg_match('@\?[^\"]+\"@',$match,$vars);
            unset($php_vars);
            $vars = explode('&',$vars[0]);
            foreach($vars AS $var)
            {
                  $var = explode('=',str_replace(array('?','"','&'),array('','',''),$var));
                  $php_vars[$var[0]]=$var[1];
            }
            $final_vars[] = $php_vars;
      }
        return $final_vars;
}

Now, depending on how the initial html looks, you may need to first use:
"$html = str_replace('&amp;','&',$html);"
0
 
b0lsc0ttIT ManagerCommented:
Can you show us some of the html or text you are "crawling"?

bol
0
 
MasonWolfCommented:
The code below should get the entire "a" tag. Is that what you need?

preg_match_all('@<a [^>]+showthread.php[^>]*>@', $html, $matches);
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
phpdotnetAuthor Commented:
Thanks but when I tried with  this http://www.webhostingtalk.com/forumdisplay.php?f=68 then the result array is :

Array
(
    [0] => Array
        (
            [0] => <a href="showthread.php?goto=newpost&t=624761" title="Go to first unread post in thread 'Record Breaking Downtime by Newista (Some Logic Inc)'">
            [1] => <a href="showthread.php?p=4640078#post4640078">
            [2] => <a href="showthread.php?goto=newpost&t=623727" title="Go to first unread post in thread 'Understanding and Verifying Uptime Guarantees'">
            [3] => <a href="showthread.php?p=4637807#post4637807">
...

I think it would be difficult but are there any way for me to echo the file name in each match ( eg this array is showthread.php ) and the vars after the file name to be an array where keys are the requested variables ( p, goto, t ... ) and values are  the strings to request ? ( something like this :

showthread.php?goto=newpost&t=624761

Array
(
      [0] => Array (
                  [0] => Array (
                        [0] => "showthread.php"
                        [1] => Array (
                              [goto] => "newpost"
                              [t] => "624761"
                        }      
            )
      )
)
0
 
Bernard S.CTOCommented:
If you get the rightmost part between the ? and the first ", and put it in $temp_query,
then you can get the array iof the parameters by

$temp_param1=array(); //reset it to be safe
$temp_param1=explode($temp_query, '&');
// we now need to explode each of these elements
$temp_param2=array(); // MUST be reset
foreach ($temp_param1 as $pair) {
  $temp_param2[]=explode($temp_param1[$pair], '=');
};
//the array $temp_param2 now holds all the required elements and values

Note: there is probably a way to combine explode into a single one, but I cannot experiment just now.
0
 
phpdotnetAuthor Commented:
Thanks but when I tried your code, the output is :

Warning: explode() [function.explode]: Empty delimiter. in C:\xampp\htdocs\1.php on line 14

where line 14 is the

$temp_param2[]=explode($temp_param1[$pair],'=');

and the $temp_param1 is :

Array
(
    [0] => &
)

So I edit the code into :

$a = preg_replace('/showthread.php\?(.*)/','$1','showthread.php?goto=newpost&t=624761');
$b=explode('&',$a);
foreach ($b as $c) {
    $d=explode('=',$c);
      print_r($d);
}

And the result is :

Array
(
    [0] => goto
    [1] => newpost
)
Array
(
    [0] => t
    [1] => 624761
)

But are there any way for me to get this array into :

Array
(
    [goto] => "newpost"
    [t] => "624761"
)
0
 
Bernard S.CTOCommented:
Seems that MasonWolf's text is exactly what you were looking for.

MasonWolf,
Just looking at your code I think will make lots of use from it
0
 
MasonWolfCommented:
fibo,

Always appreciate a compliment! Hope it helps you down the road.
0
 
phpdotnetAuthor Commented:
I'm sorry for being careless like that. MasonWolf's answer is great and thank you very much.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.