Solved

How do you extract all the url's in a paragaph of text and then check to see if they are valid or not?

Posted on 2011-03-09
5
220 Views
Last Modified: 2012-08-14
Greetings and thanks in advance! I have a field in a database title DetailedDescription, in the this text field are several links to products and other categories on the website, most internal, but a few external.... Anyway a few of them have become broken overtime, and instead of clicking on every link to find the mistakes or even wait for goolge to find them i would like to build an action page that shows all the broken links in that field. Hope that makes sense. I have it working with the preg_match function but it is only finding the first occurrence, i am guessing i need to use preg_match_all but i can't seem to figure out how to make it work from there. If there is a better way please let me know. Thanks again.
<?php

$DetailedDescription = $row_rsCategoryLinks['DetailedDescription'];

$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";


$text = $DetailedDescription;

$text = str_replace('target="_blank"','', $text);
$text = str_replace("target='_blank'","", $text);
$text = str_replace(")","", $text);
$text = str_replace(";","", $text); 
$text = str_replace("'s","s", $text); 

if(preg_match($reg_exUrl, $text, $url)) {

$url_checker = $url[0] ;

$findme = "'"; //CHECKING TO SEE IF SINGLE OR DOUBLE QUOTES ARE USED

$pos = strpos($url_checker, $findme);

if ($pos === false) {
   
  $url_checker = substr($url_checker, 0, strpos($url_checker, "\""));
  
} else {

    $url_checker = substr($url_checker, 0, strpos($url_checker, "'"));   
}   

if($url_checker!=''){ 

	 $handle = @fopen($url_checker,'r');
   
      if($handle !== false){
     	//DO NOTHING BECAUSE NO BAD URL
      }
  
		  else{
			//SHOW BAD URL AND LINK TO EDIT PAGE
			  echo "<a href=\"javascript:poptastic('categoryview.php?CID=" . $CID . "');\">" . $CID . " CID</a>   -    " ;		
			  echo $url_checker; 
			  echo "<hr>";
		  }

	}
	else {
		   // NO URLS RETURNED
	}

}
?>

Open in new window

0
Comment
Question by:Fullsource
  • 4
5 Comments
 
LVL 34

Accepted Solution

by:
Beverley Portlock earned 500 total points
Comment Utility
Make your regex pattern work on multi-line data by appending an 's' like so

"/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/s"

and then use preg_match_all instead of preg_match. The array structure returned will be slightly different so you'll need to make changes to accommodate that. Do this (below) to see the new structure and if it returns what you want

if(preg_match($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "<pre>";    //  Add this line

0
 
LVL 34

Expert Comment

by:Beverley Portlock
Comment Utility
Silly me - that extra should have a closing PRE tag like so


if(preg_match($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "</pre>";    //  Add this line

0
 
LVL 34

Expert Comment

by:Beverley Portlock
Comment Utility
I'm going for a coffee... this is not my day.....

:-(

if(preg_match_all($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "</pre>";    //  Add this line
0
 

Author Comment

by:Fullsource
Comment Utility
Awesome thanks! I will play around with the results right now, and see if i can extract the right info i need to check the urls.... I feel points coming your way and i hope that coffee does you right.
0
 
LVL 34

Expert Comment

by:Beverley Portlock
Comment Utility
"...hope that coffee does you right....

That coffee's history! It's the next one that counts...

0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Someone recently asked me about how to display a progress indicator on a page while an iframe is loading. And I remember when I first came across this myself. It was a bit tricky to get my head around, but really, it's very simple. The most impor…
This article explains how to prepare an HTML email signature template file containing dynamic placeholders for users' Azure AD data. Furthermore, it explains how to use this file to remotely set up a department-wide email signature policy in Office …
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now