?
Solved

php link checker

Posted on 2003-03-14
14
Medium Priority
?
262 Views
Last Modified: 2006-11-17
i need to run a link checker, and i have about 2000 links to check, how would i check so many links?
i am trying to check if the links are active (and not showing a 404 error or something like that) i think a fopen() function is used for that but not sure....


any help is appreciated
0
Comment
Question by:noobie
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 2
  • +2
14 Comments
 
LVL 15

Expert Comment

by:VGR
ID: 8141873
easy
1) a loop for your X links, say $links[$i] is the current
2) access URI $links[$i] and check the result for 404 or anything else
3) memorize $i if wrong URI, else NOP
4) loop

something like this :
<?
// inits
$badlinks=0;
$bad=array();
// loop through $links[] (beforehand filled in by you)
for ($i=1;$i<count($links);$i++) {
  // try to access that link
  $isgood=CheckURI($links[$i]]);
  // memorize result
  if (! $isgood) $bad[]=$i;
}
// display bad links
for ($i=1;$i<$badlinks;$i++) echo "bad link '".$links[$bad[$i]]."' (index=$i)<BR>";
// done

function CheckURI($parurl) {
  // inits
  $result=TRUE;
  // try to get URI
  $filename = "$parurl";
  $tobec=TRUE;
  $fd = @fopen ($filename, "r");
  if ($fd) { // si page trouvée
    while ((!feof ($fd))and($tobec)) {
      $ligne= fgets($fd, 4096);
      if (!(strpos($ligne,'[404] Not Found')===false)) $tobec=FALSE; // stop as soon as this is encountered
      $contents []=$ligne;
    } // while lecture bloquante
    fclose ($fd);
    if ($tobec) { // file entirely read OK (note that we could stop after X first lines, the '404' message is not at the 345th line...
      // nothing, result is TRUE already
      // this block is in case you want to log anything like "last correct date where found the URI was OK"
    } else { // we stopped before the end : 404 found
      $result=FALSE;
    }
  } else { // page not found
    $result=FALSE;
  } // if page trouvée ou non
  return $result;
} // CheckURI Boolean Function
?>
0
 

Author Comment

by:noobie
ID: 8143103
so how would this script work?
what do i have to do? create a data file?
0
 
LVL 2

Expert Comment

by:Hatemben
ID: 8143353
is your links in database or text file ?
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 

Author Comment

by:noobie
ID: 8143462
well the links are in this format:
filename.php?go=Download&id=1
........
filename.php?go=Download&id=9999

first...(they skip numbers.)
second..i want to generate the links (all of the id's are in a database)
third...i want to check them if they are active (if they are returning 404 errors)

thanks alot..
anyone that helps me complete this gets 500 points.
0
 
LVL 15

Expert Comment

by:VGR
ID: 8143470
just do this at the begin of the script (not tested by the way)

$links=array();
$links[]='http://www.netscape.com';
$links[]='http://www.badlink.zob';
$links[]='http://www.experts-exchange.com';

and you'll see...

you just have to get your links in an array called $links (how surprising :/ ) and test the script... :/
0
 

Author Comment

by:noobie
ID: 8143481
wait so i have to do:
$links=array();
$links[]='http://www.mydomain.com';

?
and it will list all of the links on the site? (there are many pages...for example filename.php?page=1-20)
0
 
LVL 1

Expert Comment

by:Morph007x2b
ID: 8144199
0
 
LVL 15

Expert Comment

by:VGR
ID: 8144220
Well noobie, you wrote "i need to run a link checker, and i have about 2000 links to check, how would i check so many links?" so I supposed that you had this list of links :/

Don't you ?

call this list $links[] and my code will become crystal clear ;-)

In a word : yes, do

<?
$links=array();
$links[]='http://www.netscape.com';
$links[]='http://www.badlink.zob';
$links[]='http://www.experts-exchange.com';

// inits
$badlinks=0;
$bad=array();
// loop through $links[] (beforehand filled in by you)
for ($i=1;$i<count($links);$i++) {
 // try to access that link
 $isgood=CheckURI($links[$i]]);
 // memorize result
 if (! $isgood) $bad[]=$i;
}
// display bad links
for ($i=1;$i<$badlinks;$i++) echo "bad link '".$links[$bad[$i]]."' (index=$i)<BR>";
// done

function CheckURI($parurl) {
 // inits
 $result=TRUE;
 // try to get URI
 $filename = "$parurl";
 $tobec=TRUE;
 $fd = @fopen ($filename, "r");
 if ($fd) { // si page trouvie
   while ((!feof ($fd))and($tobec)) {
     $ligne= fgets($fd, 4096);
     if (!(strpos($ligne,'[404] Not Found')===false)) $tobec=FALSE; // stop as soon as this is encountered
     $contents []=$ligne;
   } // while lecture bloquante
   fclose ($fd);
   if ($tobec) { // file entirely read OK (note that we could stop after X first lines, the '404' message is not at the 345th line...
     // nothing, result is TRUE already
     // this block is in case you want to log anything like "last correct date where found the URI was OK"
   } else { // we stopped before the end : 404 found
     $result=FALSE;
   }
 } else { // page not found
   $result=FALSE;
 } // if page trouvie ou non
 return $result;
} // CheckURI Boolean Function
?>

I don't guarantee it typo-free or error-free, but it's 85% minimum what you'll need at the end.
0
 
LVL 15

Accepted Solution

by:
VGR earned 2000 total points
ID: 8144259
OK, I TESTED IT AND IT WORKS

I had some typos and minor errors (thigs forgotten)


So now the code is
<?
$links=array();
$links[1]='http://www.netscape.com';
$links[2]='http://www.badlink.zob';
$links[3]='http://www.experts-exchange.com';

//test
$DEBUGTEST=1;
if ($DEBUGTEST==1) echo count($links)." links in input<BR>";
//
// inits
$badlinks=0;
$bad=array();
// loop through $links[] (beforehand filled in by you)
for ($i=1;$i<=count($links);$i++) {
// try to access that link
$isgood=CheckURI($links[$i]);
if ($DEBUGTEST==1) echo "link $i '".$links[$i]."' is ".(($isgood)?'OK':'KO')."<BR>";
// memorize result
if (! $isgood) $bad[]=$i;
}
// display bad links
$badlinks=count($bad);
//test
if ($DEBUGTEST==1) echo "$badlinks bad links found<BR>";
//
for ($i=0;$i<$badlinks;$i++) echo "bad link '".$links[$bad[$i]]."' (index=$i)<BR>";
// done

function CheckURI($parurl) {
// inits
$result=TRUE;
// try to get URI
$filename = "$parurl";
$tobec=TRUE;
$fd = @fopen ($filename, "r");
if ($fd) { // si page trouvie
  while ((!feof ($fd))and($tobec)) {
    $ligne= fgets($fd, 4096);
    if (!(strpos($ligne,'[404] Not Found')===false)) $tobec=FALSE; // stop as soon as this is encountered
    $contents []=$ligne;
  } // while lecture bloquante
  fclose ($fd);
  if ($tobec) { // file entirely read OK (note that we could stop after X first lines, the '404' message is not at the 345th line...
    // nothing, result is TRUE already
    // this block is in case you want to log anything like "last correct date where found the URI was OK"
  } else { // we stopped before the end : 404 found
    $result=FALSE;
  }
} else { // page not found
  $result=FALSE;
} // if page trouvie ou non
return $result;
} // CheckURI Boolean Function
?>

and it produces (correctly) :
3 links in input
link 1 'http://www.netscape.com' is OK
link 2 'http://www.badlink.zob' is KO
link 3 'http://www.experts-exchange.com' is OK
1 bad links found
bad link 'http://www.badlink.zob' (index=0)

Just set $DEBUGTEST=0 and your code will behave as expected by you.
0
 

Author Comment

by:noobie
ID: 8144365
the script works, but i want to check all of the link that are associated with the site...
if i put in yahoo.com, i want it to check the entire site map of it! all of the links the page is linked to and all of the pages the linked site is linked to

later.
0
 
LVL 15

Expert Comment

by:VGR
ID: 8144630
that's not at all what was your original question about...

... anyway, it's feasible (same CheckURI calls), but after having reda the page and CheckURI-ed all links encountered in it

I let you build this loop, given it's a different question. I even suggest you ask a new question, because I fairly answered your original one.

I would do this :
-for each URL in the original sites' list
-check it using technique above, BUT
-modify checkURI so that it recursively checks all encountered URIs in the currently-being-checked page
-you have to provide an external constant "maximum depth" to stop the recursion
-you have to parse the $contents[] array for tags : A HREF, IMG, FORM ACTION= etc it's a lot of work, and build a local array, then loop through it and call the same function again recursively

feasible but time-consuming if you go deeper than first level (ie, verify sites and immediate links, not the links of linked pages)
0
 
LVL 1

Expert Comment

by:Morph007x2b
ID: 8144634
You could try one of those Free Link Harvestors :) Search google http://www.google.com/search?q=Link+Harvestor
0
 
LVL 33

Expert Comment

by:snoyes_jw
ID: 11934666
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.

I will leave the following recommendation for this question in the Cleanup topic area:
    Accept: VGR {http:#8144259}

Any objections should be posted here in the next 4 days. After that time, the question will be closed.

snoyes_jw
EE Cleanup Volunteer
0

Featured Post

7 Extremely Useful Linux Commands for Beginners

Just getting started with Linux? Here's a quick start guide that has 7 commands that we believe will come in handy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article discusses how to create an extensible mechanism for linked drop downs.
This article discusses how to implement server side field validation and display customized error messages to the client.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question