• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 413
  • Last Modified:

Title Tag Grabber

For a while now, I've been trying to develop a "simple" title tag grabber/analyzer, however, I'm totally lost.

What I had in mind, was a script that takes a short list of URLs from a textarea box and outputs the title tag from each of these URLs.

In the future, this script will analyze and then display a comment/suggestion under each title tag, aiding my visitors in writing better title tags for their web pages.

Thank you for your help.
0
spyderx
Asked:
spyderx
  • 4
  • 3
  • 2
2 Solutions
 
Diablo84Commented:
The following is an example of extracting the title from a specified URL:

$url = "http://www.google.com";
$content = file_get_contents($url);

preg_match("/<title>(.*)<\/title>/i",$content,$out);

echo "Title of $url is: ".$out[1];

If you wanted to do the same using text inputs, the easiest way would be to use a html array. Each one would look something like this:

<input type="text" name="urls[]">

Notice the brackets after then name, this is what makes it an array.

Server side you can then loop through each one using:

foreach ($_POST['urls'] as $var) {
 //$var will be used to reference each submitted url
 //you can now extract the title from each url and output it
}

Diablo84
0
 
Diablo84Commented:
Just noticed you mentioned using a textarea, this is more or less the same process (although a little more inconvenient), you need to explode by the newline character to separate each url. Below is an example:

<?php
if (array_key_exists('urls',$_POST)) {
 $arr = explode("\n",$_POST['urls']);
 foreach ($arr as $url) {
  if (!empty($url)) {
   $content = file_get_contents(trim($url));
   preg_match("/<title>(.*)<\/title>/i",$content,$out);
   echo "Title: ".$out[1]."<br>\n";
  }
 }
}
?>

<form method="post" action="<?php echo $_SERVER['PHP_SELF']; ?>">
<textarea name="urls" cols="50" rows="10">
http://www.google.com
http://www.experts-exchange.com
</textarea><br>
<input type="submit" name="submit">
</form>

Note that it might be a better idea to use fread rather then file_get_contents because otherwise you are reading the entire contents of each URL into a string, it could be very slow. If you replaced:

   $content = file_get_contents(trim($url));

With:

   $handle = fopen(trim($url),'r');
   $content = fread($handle,500);
   fclose($handle);

You then only read the first 500 bytes of each url into a string. This of course depends on the title tags being present within the first 500 characters of the file... as this is normally the case it is probably the better option.

Diablo84
0
 
spyderxAuthor Commented:
Thank you so much! It works perfectly!

One last question...

How can I modify this script, so that a user doesn't have to type in the http:// part for each address? So that the http:// is automatically added to the beginning of each address (without it actually being present in the textarea).

For example, he/she could simply type in:

www.mysite.com -or- cool.mysite.com

Thanks again.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
German_RummCommented:
Hi spyderx,

Add check for http://

if (!empty($url)) {
    if (!preg_match('!^http://!i', $url)) {
        $url = 'http://'.$url;
    }

Diablo84,
BTW, it's not very efficient way. file_get_contents() gets whole page, which can be quite large.

It will be much more efficient to read in remote files line by line, until you find </title> tag.
<?php
    $content = '';
    $fp = fopen($url, 'r');
    while (!feof($fp)) {
        $line = fgets($fp, 1024);
        $file .= $line;
        if (str_pos($line, '</title>') !== false) break;
    }
    fclose($fp);
?>

---
German Rumm.
0
 
German_RummCommented:
German_Rumm,

oops, made an error. $file is $content in my second snippet
0
 
Diablo84Commented:
German_Rumm,

>>  file_get_contents() gets whole page, which can be quite large.

yes, as i said. I don't think that any method used to read data from multiple websites is going to be particularly efficient, might be a good time to do some benchmarking i think.

spyderx,

after:

if (!empty($url)) {

you would add:

if (strpos($url,'http://') === false) $url = "http://$url";

No need to use a regex for this part.

Diablo84
0
 
German_RummCommented:
Diablo84,

What if user enters an URL like this:
    www.somesite.com/siteinfo.php?url=http://www.someothersite.com

str_pos() will find 'http://' and will not add anything. This will result in error.

I know that it's unlikely, but I like to be prepared for everything :-)
0
 
Diablo84Commented:
Then just use something like:

if (substr($url,0,7) != 'http://') $url = "http://$url";

instead.
0
 
spyderxAuthor Commented:
It works great! Thank you both for your help!
0

Featured Post

Upgrade your Question Security!

Add Premium security features to your question to ensure its privacy or anonymity. Learn more about your ability to control Question Security today.

  • 4
  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now