Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

How do you extract all the url's in a paragaph of text and then check to see if they are valid or not?

Posted on 2011-03-09
5
Medium Priority
?
240 Views
Last Modified: 2012-08-14
Greetings and thanks in advance! I have a field in a database title DetailedDescription, in the this text field are several links to products and other categories on the website, most internal, but a few external.... Anyway a few of them have become broken overtime, and instead of clicking on every link to find the mistakes or even wait for goolge to find them i would like to build an action page that shows all the broken links in that field. Hope that makes sense. I have it working with the preg_match function but it is only finding the first occurrence, i am guessing i need to use preg_match_all but i can't seem to figure out how to make it work from there. If there is a better way please let me know. Thanks again.
<?php

$DetailedDescription = $row_rsCategoryLinks['DetailedDescription'];

$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";


$text = $DetailedDescription;

$text = str_replace('target="_blank"','', $text);
$text = str_replace("target='_blank'","", $text);
$text = str_replace(")","", $text);
$text = str_replace(";","", $text); 
$text = str_replace("'s","s", $text); 

if(preg_match($reg_exUrl, $text, $url)) {

$url_checker = $url[0] ;

$findme = "'"; //CHECKING TO SEE IF SINGLE OR DOUBLE QUOTES ARE USED

$pos = strpos($url_checker, $findme);

if ($pos === false) {
   
  $url_checker = substr($url_checker, 0, strpos($url_checker, "\""));
  
} else {

    $url_checker = substr($url_checker, 0, strpos($url_checker, "'"));   
}   

if($url_checker!=''){ 

	 $handle = @fopen($url_checker,'r');
   
      if($handle !== false){
     	//DO NOTHING BECAUSE NO BAD URL
      }
  
		  else{
			//SHOW BAD URL AND LINK TO EDIT PAGE
			  echo "<a href=\"javascript:poptastic('categoryview.php?CID=" . $CID . "');\">" . $CID . " CID</a>   -    " ;		
			  echo $url_checker; 
			  echo "<hr>";
		  }

	}
	else {
		   // NO URLS RETURNED
	}

}
?>

Open in new window

0
Comment
Question by:Fullsource
  • 4
5 Comments
 
LVL 34

Accepted Solution

by:
Beverley Portlock earned 2000 total points
ID: 35083445
Make your regex pattern work on multi-line data by appending an 's' like so

"/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/s"

and then use preg_match_all instead of preg_match. The array structure returned will be slightly different so you'll need to make changes to accommodate that. Do this (below) to see the new structure and if it returns what you want

if(preg_match($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "<pre>";    //  Add this line

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35083458
Silly me - that extra should have a closing PRE tag like so


if(preg_match($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "</pre>";    //  Add this line

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35083470
I'm going for a coffee... this is not my day.....

:-(

if(preg_match_all($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "</pre>";    //  Add this line
0
 

Author Comment

by:Fullsource
ID: 35083706
Awesome thanks! I will play around with the results right now, and see if i can extract the right info i need to check the urls.... I feel points coming your way and i hope that coffee does you right.
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35083915
"...hope that coffee does you right....

That coffee's history! It's the next one that counts...

0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Originally, this post was published on Monitis Blog, you can check it here . In business circles, we sometimes hear that today is the “age of the customer.” And so it is. Thanks to the enormous advances over the past few years in consumer techno…
This holiday season, we’re giving away the gift of knowledge—tech knowledge, that is. Keep reading to see what hacks, tips, and trends we have wrapped and waiting for you under the tree.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.
This video shows how to quickly and easily deploy an email signature for all users in Office 365 and prevent it from being added to replies and forwards. (the resulting signature is applied on the server level in Exchange Online) The email signat…
Suggested Courses

564 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question