Pedro Chagas
asked on
Get image URL's from google image
Hi, I want to do a special tool for my customers, and this tool in the begginer have to get all url's of first results of google image:
Example:
1- If I search for "sun", the url of google image is "http://images.google.pt/images?gbv=2&hl=pt-PT&q=sun".
2- This result show 18 images, and each image have url like this: "http://images.google.pt/imgres?imgurl=http://arte.vital.zip.net/images/sun.gif&imgrefurl=http://ec-mello.zip.net/&usg=__TXRPHyn0eNAl9frBatKr6xkWxYI=&h=459&w=652&sz=159&hl=pt-PT&start=1&tbnid=ef6f14gPW3U0UM:&tbnh=97&tbnw=138&prev=/images%3Fq%3Dsun%26gbv%3D2%26hl%3Dpt-PT".
3- I just want this part of each url (the original localization of each image):
========================== ========== ===
http://arte.vital.zip.net/images/sun.gif
========================== ========== ===
I try with the code you see in 'code', but I need the regular expression.
Somebody have a great solution or the regular expression?
Example:
1- If I search for "sun", the url of google image is "http://images.google.pt/images?gbv=2&hl=pt-PT&q=sun".
2- This result show 18 images, and each image have url like this: "http://images.google.pt/imgres?imgurl=http://arte.vital.zip.net/images/sun.gif&imgrefurl=http://ec-mello.zip.net/&usg=__TXRPHyn0eNAl9frBatKr6xkWxYI=&h=459&w=652&sz=159&hl=pt-PT&start=1&tbnid=ef6f14gPW3U0UM:&tbnh=97&tbnw=138&prev=/images%3Fq%3Dsun%26gbv%3D2%26hl%3Dpt-PT".
3- I just want this part of each url (the original localization of each image):
==========================
http://arte.vital.zip.net/images/sun.gif
==========================
I try with the code you see in 'code', but I need the regular expression.
Somebody have a great solution or the regular expression?
<?
$texto = "sun";
$url = "http://images.google.pt/images?gbv=2&hl=pt-PT&q=$texto";
$file = @file_get_contents($url);
preg_match_all("REGULAR EXPRESSION", $file, $matches);
?>
google-images.gif
ASKER
I appreciate your help Ray, but after I use your sollution I need to get all urls from search, in this case 'sun'.
If you do a search in google images, by default the result is 18 images, and I need to get first all url's, and after this I use your sollution.
How I get first all url's?
If you do a search in google images, by default the result is 18 images, and I need to get first all url's, and after this I use your sollution.
How I get first all url's?
ASKER
I try with that code, I'm almost there.
Please view the .txt in Attach File, that show the array I get from that regular expression, to mutch line, I just want line that can end in .jpg, .gif etc.
Please view the .txt in Attach File, that show the array I get from that regular expression, to mutch line, I just want line that can end in .jpg, .gif etc.
$texto = "sun";
$url = "http://images.google.pt/images?gbv=2&hl=pt-PT&q=$texto";
$file = @file_get_contents($url);
preg_match_all("#(http:\/\/|https:\/\/)([^64])([^\s<>\.]+)\.([^google])([^\s\n<>\"\']+)#sm", $file, $matches);
echo '<pre>';
print_r($matches);
echo '</pre>';
google-urls.txt
Have you contacted Google to see if they have an API for this? You may be writing a program that is a lot of work and also violates their terms of service. It's worth checking with them.
I ran the code posted in the snippet above and you can see what I got for output. It looks like it would be easy to extract the information from the first array in $matches, but not if Google won't permit it!
Best regards, ~Ray
google-forbidden.png
I ran the code posted in the snippet above and you can see what I got for output. It looks like it would be easy to extract the information from the first array in $matches, but not if Google won't permit it!
Best regards, ~Ray
google-forbidden.png
ASKER
I don't want any image from google, what I want from google is the url's of the images they present for the search I want, in this case 'sun'.
I don't want violate the Terms and Service of google, never, google is a big friend.
I just want a regex for get the url that contains in the end .jpg, .gif, like this:
[1] => http://arte.vital.zip.net/images/sun.gif
or
[17] => http://i1.trekearth.com/photos/6840/20070707_sun_raise-a.jpg
I don't want violate the Terms and Service of google, never, google is a big friend.
I just want a regex for get the url that contains in the end .jpg, .gif, like this:
[1] => http://arte.vital.zip.net/images/sun.gif
or
[17] => http://i1.trekearth.com/photos/6840/20070707_sun_raise-a.jpg
Understood. From what you posted it looks like $matches[0] contains an array with what you want. I cannot get this array because of the 403 response, but I can show you a script that will extract the good stuff from it. Back in a moment with some code you can adapt easily... ~Ray
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I have to substitute $array to $matches?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi, your solution work, and I do the script like you see in 'code snippet'.
I thing, but I don't know the code, is possible make better if we change this line:
preg_match_all("#(http:\/\/|https:\/\/)([^64])([ ^\s<>\.]+) \.([^googl e])([^\s\n <>\"\']+)# sm", $file, $matches);
This line give me some absurde results, like:
......
[6] => http://
[7] => http://
[8] => http://
[9] => http://
[10] => http://
..............
or
..............
[22] => f
[23] => f
[24] => f
[25] => f
[26] => f
..............
I just need for finish this script, a new regex for get all url's with preg_match_all, and after with the function eregi I choose jpg, gif, etc.
If is not possible do it, tell me for I finish this case, because I have the solution when I call matches[0], in this way I can get all url's that contain image extensions in the end.
I just reopen this case, for try made better.
What I can do for this line ====preg_match_all("#(http:\/\/|https:\/\/)([^64 ])([^\s<>\ .]+)\.([^g oogle])([^ \s\n<>\"\' ]+)#sm", $file, $matches);======= be better?
I thing, but I don't know the code, is possible make better if we change this line:
preg_match_all("#(http:\/\/|https:\/\/)([^64])([
This line give me some absurde results, like:
......
[6] => http://
[7] => http://
[8] => http://
[9] => http://
[10] => http://
..............
or
..............
[22] => f
[23] => f
[24] => f
[25] => f
[26] => f
..............
I just need for finish this script, a new regex for get all url's with preg_match_all, and after with the function eregi I choose jpg, gif, etc.
If is not possible do it, tell me for I finish this case, because I have the solution when I call matches[0], in this way I can get all url's that contain image extensions in the end.
I just reopen this case, for try made better.
What I can do for this line ====preg_match_all("#(http:\/\/|https:\/\/)([^64
$texto = "allfreephoto";
$url = "http://images.google.pt/images?gbv=2&hl=pt-PT&q=$texto";
$file = @file_get_contents($url);
preg_match_all("#(http:\/\/|https:\/\/)([^64])([^\s<>\.]+)\.([^google])([^\s\n<>\"\']+)#sm", $file, $matches);
echo '<pre>';
print_r($matches);
echo '</pre>';
echo "------------------------------------------------";
// ITERATE OVER THE ARRAY AND DISCARD THE PARTS THAT WE DO NOT WANT
foreach ($matches[0] as $pointer => $url_thing)
{
if (strpos($url_thing, '&imgrefurl=') === FALSE)
{
if (eregi('\.GIF$', $url_thing)) continue;
if (eregi('\.JPG$', $url_thing)) continue;
if (eregi('\.PNG$', $url_thing)) continue;
}
unset ($matches[0][$pointer]);
}
//var_dump($matches[0]);
echo '<pre>';
print_r($matches[0]);
echo '</pre>';
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Thanks for the points. Glad I could help, ~Ray
Best regards, ~Ray
Open in new window