Avatar of Pedro Chagas
Pedro ChagasFlag for Portugal asked on

Get image URL's from google image

Hi, I want to do a special tool for my customers, and this tool in the begginer have to get all url's of first results of google image:
Example:
1- If I search for "sun", the url of google image is "http://images.google.pt/images?gbv=2&hl=pt-PT&q=sun".
2- This result show 18 images, and each image have url like this: "http://images.google.pt/imgres?imgurl=http://arte.vital.zip.net/images/sun.gif&imgrefurl=http://ec-mello.zip.net/&usg=__TXRPHyn0eNAl9frBatKr6xkWxYI=&h=459&w=652&sz=159&hl=pt-PT&start=1&tbnid=ef6f14gPW3U0UM:&tbnh=97&tbnw=138&prev=/images%3Fq%3Dsun%26gbv%3D2%26hl%3Dpt-PT".

3- I just want this part of each url (the original localization of each image):
=======================================
http://arte.vital.zip.net/images/sun.gif
=======================================

I try with the code you see in 'code', but I need the regular expression.
Somebody have a great solution or the regular expression?
<?
$texto = "sun";
$url = "http://images.google.pt/images?gbv=2&hl=pt-PT&q=$texto";
$file = @file_get_contents($url);
preg_match_all("REGULAR EXPRESSION", $file, $matches);
?>

Open in new window

google-images.gif
Regular ExpressionsPHP

Avatar of undefined
Last Comment
Ray Paseur

8/22/2022 - Mon
Ray Paseur

No REGEX needed.  Try this...

Best regards, ~Ray


<?php // RAY_temp.php
$url = "http://images.google.pt/imgres?imgurl=http://arte.vital.zip.net/images/sun.gif&imgrefurl=http://ec-mello.zip.net/&usg=__TXRPHyn0eNAl9frBatKr6xkWxYI=&h=459&w=652&sz=159&hl=pt-PT&start=1&tbnid=ef6f14gPW3U0UM:&tbnh=97&tbnw=138&prev=/images%3Fq%3Dsun%26gbv%3D2%26hl%3Dpt-PT";
 
$array = explode('=', $url);
$part_url = $array[1];
 
$array = explode('&', $part_url);
$desired_url = $array[0];
 
echo $desired_url;

Open in new window

ASKER
Pedro Chagas

I appreciate your help Ray, but after I use your sollution I need to get all urls from search, in this case 'sun'.
If you do a search in google images, by default the result is 18 images, and I need to get first all url's, and after this I use your sollution.
How I get first all url's?
ASKER
Pedro Chagas

I try with that code, I'm almost there.
Please view the .txt in Attach File, that show the array I get from that regular expression, to mutch line, I just want line that can end in .jpg, .gif etc.
$texto = "sun";
$url = "http://images.google.pt/images?gbv=2&hl=pt-PT&q=$texto";
$file = @file_get_contents($url);
preg_match_all("#(http:\/\/|https:\/\/)([^64])([^\s<>\.]+)\.([^google])([^\s\n<>\"\']+)#sm", $file, $matches);
echo '<pre>';
print_r($matches);
echo '</pre>';

Open in new window

google-urls.txt
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
Ray Paseur

Have you contacted Google to see if they have an API for this?  You may be writing a program that is a lot of work and also violates their terms of service.  It's worth checking with them.

I ran the code posted in the snippet above and you can see what I got for output.  It looks like it would be easy to extract the information from the first array in $matches, but not if Google won't permit it!

Best regards, ~Ray
google-forbidden.png
ASKER
Pedro Chagas

I don't want any image from google, what I want from google is the url's of the images they present for the search I want, in this case 'sun'.
I don't want violate the Terms and Service of google, never, google is a big friend.

I just want a regex for get the url that contains in the end .jpg, .gif, like this:
[1] => http://arte.vital.zip.net/images/sun.gif
or
[17] => http://i1.trekearth.com/photos/6840/20070707_sun_raise-a.jpg
Ray Paseur

Understood. From what you posted it looks like $matches[0] contains an array with what you want.  I cannot get this array because of the 403 response, but I can show you a script that will extract the good stuff from it.  Back in a moment with some code you can adapt easily... ~Ray
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER CERTIFIED SOLUTION
Ray Paseur

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
See how we're fighting big data
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
ASKER
Pedro Chagas

I have to substitute $array to $matches?
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER
Pedro Chagas

Hi, your solution work, and I do the script like you see in 'code snippet'.
I thing, but I don't know the code, is possible make better if we change this line:
preg_match_all("#(http:\/\/|https:\/\/)([^64])([^\s<>\.]+)\.([^google])([^\s\n<>\"\']+)#sm", $file, $matches);
This line give me some absurde results, like:
......
            [6] => http://
            [7] => http://
            [8] => http://
            [9] => http://
            [10] => http://
..............
or
..............
            [22] => f
            [23] => f
            [24] => f
            [25] => f
            [26] => f
..............
I just need for finish this script, a new regex for get all url's with preg_match_all, and after with the function eregi I choose jpg, gif, etc.

If is not possible do it, tell me for I finish this case, because I have the solution when I call matches[0], in this way I can get all url's that contain image extensions in the end.

I just reopen this case, for try made better.
What I can do for this line ====preg_match_all("#(http:\/\/|https:\/\/)([^64])([^\s<>\.]+)\.([^google])([^\s\n<>\"\']+)#sm", $file, $matches);======= be better?
$texto = "allfreephoto";
$url = "http://images.google.pt/images?gbv=2&hl=pt-PT&q=$texto";
$file = @file_get_contents($url);
preg_match_all("#(http:\/\/|https:\/\/)([^64])([^\s<>\.]+)\.([^google])([^\s\n<>\"\']+)#sm", $file, $matches);
echo '<pre>';
print_r($matches);
echo '</pre>';
echo "------------------------------------------------";
 
 
// ITERATE OVER THE ARRAY AND DISCARD THE PARTS THAT WE DO NOT WANT
foreach ($matches[0] as $pointer => $url_thing)
{
	if (strpos($url_thing, '&imgrefurl=') === FALSE)
	{
		if (eregi('\.GIF$', $url_thing)) continue;
		if (eregi('\.JPG$', $url_thing)) continue;
		if (eregi('\.PNG$', $url_thing)) continue;
	}
	unset ($matches[0][$pointer]);
}
//var_dump($matches[0]);
echo '<pre>';
print_r($matches[0]);
echo '</pre>';

Open in new window

SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Ray Paseur

Thanks for the points.  Glad I could help, ~Ray
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck