asked on

Get image URL's from google image

Hi, I want to do a special tool for my customers, and this tool in the begginer have to get all url's of first results of google image:
Example:
1- If I search for "sun", the url of google image is "http://images.google.pt/images?gbv=2&hl=pt-PT&q=sun".
2- This result show 18 images, and each image have url like this: "http://images.google.pt/imgres?imgurl=http://arte.vital.zip.net/images/sun.gif&imgrefurl=http://ec-mello.zip.net/&usg=__TXRPHyn0eNAl9frBatKr6xkWxYI=&h=459&w=652&sz=159&hl=pt-PT&start=1&tbnid=ef6f14gPW3U0UM:&tbnh=97&tbnw=138&prev=/images%3Fq%3Dsun%26gbv%3D2%26hl%3Dpt-PT".

3- I just want this part of each url (the original localization of each image):
=======================================
http://arte.vital.zip.net/images/sun.gif
=======================================

I try with the code you see in 'code', but I need the regular expression.
Somebody have a great solution or the regular expression?

<?
$texto = "sun";
$url = "http://images.google.pt/images?gbv=2&hl=pt-PT&q=$texto";
$file = @file_get_contents($url);
preg_match_all("REGULAR EXPRESSION", $file, $matches);
?>

Open in new window

google-images.gif

Ray Paseur

No REGEX needed. Try this...

Best regards, ~Ray

<?php // RAY_temp.php
$url = "http://images.google.pt/imgres?imgurl=http://arte.vital.zip.net/images/sun.gif&imgrefurl=http://ec-mello.zip.net/&usg=__TXRPHyn0eNAl9frBatKr6xkWxYI=&h=459&w=652&sz=159&hl=pt-PT&start=1&tbnid=ef6f14gPW3U0UM:&tbnh=97&tbnw=138&prev=/images%3Fq%3Dsun%26gbv%3D2%26hl%3Dpt-PT";
 
$array = explode('=', $url);
$part_url = $array[1];
 
$array = explode('&', $part_url);
$desired_url = $array[0];
 
echo $desired_url;

Open in new window

Pedro Chagas

ASKER

I appreciate your help Ray, but after I use your sollution I need to get all urls from search, in this case 'sun'.
If you do a search in google images, by default the result is 18 images, and I need to get first all url's, and after this I use your sollution.
How I get first all url's?

Pedro Chagas

ASKER

I try with that code, I'm almost there.
Please view the .txt in Attach File, that show the array I get from that regular expression, to mutch line, I just want line that can end in .jpg, .gif etc.

$texto = "sun";
$url = "http://images.google.pt/images?gbv=2&hl=pt-PT&q=$texto";
$file = @file_get_contents($url);
preg_match_all("#(http:\/\/|https:\/\/)([^64])([^\s<>\.]+)\.([^google])([^\s\n<>\"\']+)#sm", $file, $matches);
echo '<pre>';
print_r($matches);
echo '</pre>';

Open in new window

google-urls.txt

Ray Paseur

Have you contacted Google to see if they have an API for this? You may be writing a program that is a lot of work and also violates their terms of service. It's worth checking with them.

I ran the code posted in the snippet above and you can see what I got for output. It looks like it would be easy to extract the information from the first array in $matches, but not if Google won't permit it!

Best regards, ~Ray
google-forbidden.png

Pedro Chagas

ASKER

I don't want any image from google, what I want from google is the url's of the images they present for the search I want, in this case 'sun'.
I don't want violate the Terms and Service of google, never, google is a big friend.

I just want a regex for get the url that contains in the end .jpg, .gif, like this:
[1] => http://arte.vital.zip.net/images/sun.gif
or
[17] => http://i1.trekearth.com/photos/6840/20070707_sun_raise-a.jpg

Ray Paseur

Understood. From what you posted it looks like $matches[0] contains an array with what you want. I cannot get this array because of the 403 response, but I can show you a script that will extract the good stuff from it. Back in a moment with some code you can adapt easily... ~Ray

ASKER CERTIFIED SOLUTION

Ray Paseur

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Pedro Chagas

ASKER

I have to substitute $array to $matches?

SOLUTION

Ray Paseur

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Pedro Chagas

ASKER

Hi, your solution work, and I do the script like you see in 'code snippet'.
I thing, but I don't know the code, is possible make better if we change this line:
preg_match_all("#(http:\/\/|https:\/\/)([^64])([^\s<>\.]+)\.([^google])([^\s\n<>\"\']+)#sm", $file, $matches);
This line give me some absurde results, like:
......
[6] => http://
[7] => http://
[8] => http://
[9] => http://
[10] => http://
..............
or
..............
[22] => f
[23] => f
[24] => f
[25] => f
[26] => f
..............
I just need for finish this script, a new regex for get all url's with preg_match_all, and after with the function eregi I choose jpg, gif, etc.

If is not possible do it, tell me for I finish this case, because I have the solution when I call matches[0], in this way I can get all url's that contain image extensions in the end.

I just reopen this case, for try made better.
What I can do for this line ====preg_match_all("#(http:\/\/|https:\/\/)([^64])([^\s<>\.]+)\.([^google])([^\s\n<>\"\']+)#sm", $file, $matches);======= be better?

$texto = "allfreephoto";
$url = "http://images.google.pt/images?gbv=2&hl=pt-PT&q=$texto";
$file = @file_get_contents($url);
preg_match_all("#(http:\/\/|https:\/\/)([^64])([^\s<>\.]+)\.([^google])([^\s\n<>\"\']+)#sm", $file, $matches);
echo '<pre>';
print_r($matches);
echo '</pre>';
echo "------------------------------------------------";
 
 
// ITERATE OVER THE ARRAY AND DISCARD THE PARTS THAT WE DO NOT WANT
foreach ($matches[0] as $pointer => $url_thing)
{
	if (strpos($url_thing, '&imgrefurl=') === FALSE)
	{
		if (eregi('\.GIF$', $url_thing)) continue;
		if (eregi('\.JPG$', $url_thing)) continue;
		if (eregi('\.PNG$', $url_thing)) continue;
	}
	unset ($matches[0][$pointer]);
}
//var_dump($matches[0]);
echo '<pre>';
print_r($matches[0]);
echo '</pre>';

Open in new window

SOLUTION

Ray Paseur

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Ray Paseur

Thanks for the points. Glad I could help, ~Ray