Solved

How do you extract all the url's in a paragaph of text and then check to see if they are valid or not?

Posted on 2011-03-09
5
222 Views
Last Modified: 2012-08-14
Greetings and thanks in advance! I have a field in a database title DetailedDescription, in the this text field are several links to products and other categories on the website, most internal, but a few external.... Anyway a few of them have become broken overtime, and instead of clicking on every link to find the mistakes or even wait for goolge to find them i would like to build an action page that shows all the broken links in that field. Hope that makes sense. I have it working with the preg_match function but it is only finding the first occurrence, i am guessing i need to use preg_match_all but i can't seem to figure out how to make it work from there. If there is a better way please let me know. Thanks again.
<?php

$DetailedDescription = $row_rsCategoryLinks['DetailedDescription'];

$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";


$text = $DetailedDescription;

$text = str_replace('target="_blank"','', $text);
$text = str_replace("target='_blank'","", $text);
$text = str_replace(")","", $text);
$text = str_replace(";","", $text); 
$text = str_replace("'s","s", $text); 

if(preg_match($reg_exUrl, $text, $url)) {

$url_checker = $url[0] ;

$findme = "'"; //CHECKING TO SEE IF SINGLE OR DOUBLE QUOTES ARE USED

$pos = strpos($url_checker, $findme);

if ($pos === false) {
   
  $url_checker = substr($url_checker, 0, strpos($url_checker, "\""));
  
} else {

    $url_checker = substr($url_checker, 0, strpos($url_checker, "'"));   
}   

if($url_checker!=''){ 

	 $handle = @fopen($url_checker,'r');
   
      if($handle !== false){
     	//DO NOTHING BECAUSE NO BAD URL
      }
  
		  else{
			//SHOW BAD URL AND LINK TO EDIT PAGE
			  echo "<a href=\"javascript:poptastic('categoryview.php?CID=" . $CID . "');\">" . $CID . " CID</a>   -    " ;		
			  echo $url_checker; 
			  echo "<hr>";
		  }

	}
	else {
		   // NO URLS RETURNED
	}

}
?>

Open in new window

0
Comment
Question by:Fullsource
  • 4
5 Comments
 
LVL 34

Accepted Solution

by:
Beverley Portlock earned 500 total points
ID: 35083445
Make your regex pattern work on multi-line data by appending an 's' like so

"/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/s"

and then use preg_match_all instead of preg_match. The array structure returned will be slightly different so you'll need to make changes to accommodate that. Do this (below) to see the new structure and if it returns what you want

if(preg_match($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "<pre>";    //  Add this line

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35083458
Silly me - that extra should have a closing PRE tag like so


if(preg_match($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "</pre>";    //  Add this line

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35083470
I'm going for a coffee... this is not my day.....

:-(

if(preg_match_all($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "</pre>";    //  Add this line
0
 

Author Comment

by:Fullsource
ID: 35083706
Awesome thanks! I will play around with the results right now, and see if i can extract the right info i need to check the urls.... I feel points coming your way and i hope that coffee does you right.
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35083915
"...hope that coffee does you right....

That coffee's history! It's the next one that counts...

0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
In this tutorial viewers will learn how to embed Flash content in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: "<!DOCTYPE html>": Use the <object> tag to embed Flash content.: To specify that the object is Flash content, d…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now