Solved

How do you extract all the url's in a paragaph of text and then check to see if they are valid or not?

Posted on 2011-03-09
5
232 Views
Last Modified: 2012-08-14
Greetings and thanks in advance! I have a field in a database title DetailedDescription, in the this text field are several links to products and other categories on the website, most internal, but a few external.... Anyway a few of them have become broken overtime, and instead of clicking on every link to find the mistakes or even wait for goolge to find them i would like to build an action page that shows all the broken links in that field. Hope that makes sense. I have it working with the preg_match function but it is only finding the first occurrence, i am guessing i need to use preg_match_all but i can't seem to figure out how to make it work from there. If there is a better way please let me know. Thanks again.
<?php

$DetailedDescription = $row_rsCategoryLinks['DetailedDescription'];

$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";


$text = $DetailedDescription;

$text = str_replace('target="_blank"','', $text);
$text = str_replace("target='_blank'","", $text);
$text = str_replace(")","", $text);
$text = str_replace(";","", $text); 
$text = str_replace("'s","s", $text); 

if(preg_match($reg_exUrl, $text, $url)) {

$url_checker = $url[0] ;

$findme = "'"; //CHECKING TO SEE IF SINGLE OR DOUBLE QUOTES ARE USED

$pos = strpos($url_checker, $findme);

if ($pos === false) {
   
  $url_checker = substr($url_checker, 0, strpos($url_checker, "\""));
  
} else {

    $url_checker = substr($url_checker, 0, strpos($url_checker, "'"));   
}   

if($url_checker!=''){ 

	 $handle = @fopen($url_checker,'r');
   
      if($handle !== false){
     	//DO NOTHING BECAUSE NO BAD URL
      }
  
		  else{
			//SHOW BAD URL AND LINK TO EDIT PAGE
			  echo "<a href=\"javascript:poptastic('categoryview.php?CID=" . $CID . "');\">" . $CID . " CID</a>   -    " ;		
			  echo $url_checker; 
			  echo "<hr>";
		  }

	}
	else {
		   // NO URLS RETURNED
	}

}
?>

Open in new window

0
Comment
Question by:Fullsource
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
5 Comments
 
LVL 34

Accepted Solution

by:
Beverley Portlock earned 500 total points
ID: 35083445
Make your regex pattern work on multi-line data by appending an 's' like so

"/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/s"

and then use preg_match_all instead of preg_match. The array structure returned will be slightly different so you'll need to make changes to accommodate that. Do this (below) to see the new structure and if it returns what you want

if(preg_match($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "<pre>";    //  Add this line

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35083458
Silly me - that extra should have a closing PRE tag like so


if(preg_match($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "</pre>";    //  Add this line

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35083470
I'm going for a coffee... this is not my day.....

:-(

if(preg_match_all($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "</pre>";    //  Add this line
0
 

Author Comment

by:Fullsource
ID: 35083706
Awesome thanks! I will play around with the results right now, and see if i can extract the right info i need to check the urls.... I feel points coming your way and i hope that coffee does you right.
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35083915
"...hope that coffee does you right....

That coffee's history! It's the next one that counts...

0

Featured Post

Creating Instructional Tutorials  

For Any Use & On Any Platform

Contextual Guidance at the moment of need helps your employees/users adopt software o& achieve even the most complex tasks instantly. Boost knowledge retention, software adoption & employee engagement with easy solution.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
There are times when I have encountered the need to decompress a response from a PHP request. This is how it's done, but you must have control of the request and you can set the Accept-Encoding header.
The viewer will learn the benefit of using external CSS files and the relationship between class and ID selectors. Create your external css file by saving it as style.css then set up your style tags: (CODE) Reference the nav tag and set your prop…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

632 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question