Solved

How do you extract all the url's in a paragaph of text and then check to see if they are valid or not?

Posted on 2011-03-09
5
231 Views
Last Modified: 2012-08-14
Greetings and thanks in advance! I have a field in a database title DetailedDescription, in the this text field are several links to products and other categories on the website, most internal, but a few external.... Anyway a few of them have become broken overtime, and instead of clicking on every link to find the mistakes or even wait for goolge to find them i would like to build an action page that shows all the broken links in that field. Hope that makes sense. I have it working with the preg_match function but it is only finding the first occurrence, i am guessing i need to use preg_match_all but i can't seem to figure out how to make it work from there. If there is a better way please let me know. Thanks again.
<?php

$DetailedDescription = $row_rsCategoryLinks['DetailedDescription'];

$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";


$text = $DetailedDescription;

$text = str_replace('target="_blank"','', $text);
$text = str_replace("target='_blank'","", $text);
$text = str_replace(")","", $text);
$text = str_replace(";","", $text); 
$text = str_replace("'s","s", $text); 

if(preg_match($reg_exUrl, $text, $url)) {

$url_checker = $url[0] ;

$findme = "'"; //CHECKING TO SEE IF SINGLE OR DOUBLE QUOTES ARE USED

$pos = strpos($url_checker, $findme);

if ($pos === false) {
   
  $url_checker = substr($url_checker, 0, strpos($url_checker, "\""));
  
} else {

    $url_checker = substr($url_checker, 0, strpos($url_checker, "'"));   
}   

if($url_checker!=''){ 

	 $handle = @fopen($url_checker,'r');
   
      if($handle !== false){
     	//DO NOTHING BECAUSE NO BAD URL
      }
  
		  else{
			//SHOW BAD URL AND LINK TO EDIT PAGE
			  echo "<a href=\"javascript:poptastic('categoryview.php?CID=" . $CID . "');\">" . $CID . " CID</a>   -    " ;		
			  echo $url_checker; 
			  echo "<hr>";
		  }

	}
	else {
		   // NO URLS RETURNED
	}

}
?>

Open in new window

0
Comment
Question by:Fullsource
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
5 Comments
 
LVL 34

Accepted Solution

by:
Beverley Portlock earned 500 total points
ID: 35083445
Make your regex pattern work on multi-line data by appending an 's' like so

"/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/s"

and then use preg_match_all instead of preg_match. The array structure returned will be slightly different so you'll need to make changes to accommodate that. Do this (below) to see the new structure and if it returns what you want

if(preg_match($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "<pre>";    //  Add this line

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35083458
Silly me - that extra should have a closing PRE tag like so


if(preg_match($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "</pre>";    //  Add this line

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35083470
I'm going for a coffee... this is not my day.....

:-(

if(preg_match_all($reg_exUrl, $text, $url)) {
    echo "<pre>";    print_r($url);    echo "</pre>";    //  Add this line
0
 

Author Comment

by:Fullsource
ID: 35083706
Awesome thanks! I will play around with the results right now, and see if i can extract the right info i need to check the urls.... I feel points coming your way and i hope that coffee does you right.
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35083915
"...hope that coffee does you right....

That coffee's history! It's the next one that counts...

0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This article discusses how to create an extensible mechanism for linked drop downs.
Finding original email is quite difficult due to their duplicates. From this article, you will come to know why multiple duplicates of same emails appear and how to delete duplicate emails from Outlook securely and instantly while vital emails remai…
In this tutorial viewers will learn how to code links for mobile sites that, once clicked, send a call or text to a specified number. For a telephone link (once clicked, calls a number), begin with a normal "<a href=" link tag. For the href, specify…
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question