Link to home
Start Free TrialLog in
Avatar of Pedro Chagas
Pedro ChagasFlag for Portugal

asked on

Get image URL's from google image

Hi, I want to do a special tool for my customers, and this tool in the begginer have to get all url's of first results of google image:
Example:
1- If I search for "sun", the url of google image is "http://images.google.pt/images?gbv=2&hl=pt-PT&q=sun".
2- This result show 18 images, and each image have url like this: "http://images.google.pt/imgres?imgurl=http://arte.vital.zip.net/images/sun.gif&imgrefurl=http://ec-mello.zip.net/&usg=__TXRPHyn0eNAl9frBatKr6xkWxYI=&h=459&w=652&sz=159&hl=pt-PT&start=1&tbnid=ef6f14gPW3U0UM:&tbnh=97&tbnw=138&prev=/images%3Fq%3Dsun%26gbv%3D2%26hl%3Dpt-PT".

3- I just want this part of each url (the original localization of each image):
=======================================
http://arte.vital.zip.net/images/sun.gif
=======================================

I try with the code you see in 'code', but I need the regular expression.
Somebody have a great solution or the regular expression?
<?
$texto = "sun";
$url = "http://images.google.pt/images?gbv=2&hl=pt-PT&q=$texto";
$file = @file_get_contents($url);
preg_match_all("REGULAR EXPRESSION", $file, $matches);
?>

Open in new window

google-images.gif
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

No REGEX needed.  Try this...

Best regards, ~Ray


<?php // RAY_temp.php
$url = "http://images.google.pt/imgres?imgurl=http://arte.vital.zip.net/images/sun.gif&imgrefurl=http://ec-mello.zip.net/&usg=__TXRPHyn0eNAl9frBatKr6xkWxYI=&h=459&w=652&sz=159&hl=pt-PT&start=1&tbnid=ef6f14gPW3U0UM:&tbnh=97&tbnw=138&prev=/images%3Fq%3Dsun%26gbv%3D2%26hl%3Dpt-PT";
 
$array = explode('=', $url);
$part_url = $array[1];
 
$array = explode('&', $part_url);
$desired_url = $array[0];
 
echo $desired_url;

Open in new window

Avatar of Pedro Chagas

ASKER

I appreciate your help Ray, but after I use your sollution I need to get all urls from search, in this case 'sun'.
If you do a search in google images, by default the result is 18 images, and I need to get first all url's, and after this I use your sollution.
How I get first all url's?
I try with that code, I'm almost there.
Please view the .txt in Attach File, that show the array I get from that regular expression, to mutch line, I just want line that can end in .jpg, .gif etc.
$texto = "sun";
$url = "http://images.google.pt/images?gbv=2&hl=pt-PT&q=$texto";
$file = @file_get_contents($url);
preg_match_all("#(http:\/\/|https:\/\/)([^64])([^\s<>\.]+)\.([^google])([^\s\n<>\"\']+)#sm", $file, $matches);
echo '<pre>';
print_r($matches);
echo '</pre>';

Open in new window

google-urls.txt
Have you contacted Google to see if they have an API for this?  You may be writing a program that is a lot of work and also violates their terms of service.  It's worth checking with them.

I ran the code posted in the snippet above and you can see what I got for output.  It looks like it would be easy to extract the information from the first array in $matches, but not if Google won't permit it!

Best regards, ~Ray
google-forbidden.png
I don't want any image from google, what I want from google is the url's of the images they present for the search I want, in this case 'sun'.
I don't want violate the Terms and Service of google, never, google is a big friend.

I just want a regex for get the url that contains in the end .jpg, .gif, like this:
[1] => http://arte.vital.zip.net/images/sun.gif
or
[17] => http://i1.trekearth.com/photos/6840/20070707_sun_raise-a.jpg
Understood. From what you posted it looks like $matches[0] contains an array with what you want.  I cannot get this array because of the 403 response, but I can show you a script that will extract the good stuff from it.  Back in a moment with some code you can adapt easily... ~Ray
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I have to substitute $array to $matches?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi, your solution work, and I do the script like you see in 'code snippet'.
I thing, but I don't know the code, is possible make better if we change this line:
preg_match_all("#(http:\/\/|https:\/\/)([^64])([^\s<>\.]+)\.([^google])([^\s\n<>\"\']+)#sm", $file, $matches);
This line give me some absurde results, like:
......
            [6] => http://
            [7] => http://
            [8] => http://
            [9] => http://
            [10] => http://
..............
or
..............
            [22] => f
            [23] => f
            [24] => f
            [25] => f
            [26] => f
..............
I just need for finish this script, a new regex for get all url's with preg_match_all, and after with the function eregi I choose jpg, gif, etc.

If is not possible do it, tell me for I finish this case, because I have the solution when I call matches[0], in this way I can get all url's that contain image extensions in the end.

I just reopen this case, for try made better.
What I can do for this line ====preg_match_all("#(http:\/\/|https:\/\/)([^64])([^\s<>\.]+)\.([^google])([^\s\n<>\"\']+)#sm", $file, $matches);======= be better?
$texto = "allfreephoto";
$url = "http://images.google.pt/images?gbv=2&hl=pt-PT&q=$texto";
$file = @file_get_contents($url);
preg_match_all("#(http:\/\/|https:\/\/)([^64])([^\s<>\.]+)\.([^google])([^\s\n<>\"\']+)#sm", $file, $matches);
echo '<pre>';
print_r($matches);
echo '</pre>';
echo "------------------------------------------------";
 
 
// ITERATE OVER THE ARRAY AND DISCARD THE PARTS THAT WE DO NOT WANT
foreach ($matches[0] as $pointer => $url_thing)
{
	if (strpos($url_thing, '&imgrefurl=') === FALSE)
	{
		if (eregi('\.GIF$', $url_thing)) continue;
		if (eregi('\.JPG$', $url_thing)) continue;
		if (eregi('\.PNG$', $url_thing)) continue;
	}
	unset ($matches[0][$pointer]);
}
//var_dump($matches[0]);
echo '<pre>';
print_r($matches[0]);
echo '</pre>';

Open in new window

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks for the points.  Glad I could help, ~Ray