1ns4nity
asked on
PHP file_get_contents() not reading entire page
Hi. Im trying to write a web page parser which will extract weather data from a web site. The URL is : "http://www.accuweather.com/adcbin/public/inthbh_local.asp? partner=accuweather&metric
I have encountered a problem where the function file_get_contents(URL) does not read the entire web page. I have even tried using fopen() and it still always gets only half of the page. Does anyone know why this is happening?
Hi
Are you sure you are not reading the whole page, because the url you posted gives me a permission denied error when trying to load the forecast data, I can load the page, but the forecast does not display...
If you have to fetch weather that way, why not use MSN, this way you only fetch the raw data feed and build a template to display the results....
Download this example.....
The download contains a simple example script that displays a weather forecast + you get the latest complete database of all weather location from around the world....
http://zip.ya-right.net/wdb1.rar
Example of what the example looks like....
http://zip.ya-right.net/weather.php
example of how I use it in my mail system, so users can look up the weather while reading their mail...
http://mail.ya-right.net/example.html
Fataqui
Are you sure you are not reading the whole page, because the url you posted gives me a permission denied error when trying to load the forecast data, I can load the page, but the forecast does not display...
If you have to fetch weather that way, why not use MSN, this way you only fetch the raw data feed and build a template to display the results....
Download this example.....
The download contains a simple example script that displays a weather forecast + you get the latest complete database of all weather location from around the world....
http://zip.ya-right.net/wdb1.rar
Example of what the example looks like....
http://zip.ya-right.net/weather.php
example of how I use it in my mail system, so users can look up the weather while reading their mail...
http://mail.ya-right.net/example.html
Fataqui
ASKER
How did you get a permission denied error? What that when PHP tried to load the page or when you viewed it in your browser. When viewed in a browser, the weather data comes out fine, however, php somehow gets cut off. I already set the user-agent string so it acts like IE but that didnt work.
The msn weather which you use seems to be a good source but i've alerady successfully parsed accuweather's 5-day forcast. Its just that I want to be able to get the hourly forecast too and am wondering why php cant retrieve the data.
The msn weather which you use seems to be a good source but i've alerady successfully parsed accuweather's 5-day forcast. Its just that I want to be able to get the hourly forecast too and am wondering why php cant retrieve the data.
hi all,
$result = file_get_contents("http://www.accuweather.com/adcbin/public/inthbh_local.asp?%20partner=accuweather&metric=1&whend=1&whent=8");
echo $result;
works perfectly for me.
What is your PHP version ?
Valoodev
$result = file_get_contents("http://www.accuweather.com/adcbin/public/inthbh_local.asp?%20partner=accuweather&metric=1&whend=1&whent=8");
echo $result;
works perfectly for me.
What is your PHP version ?
Valoodev
ASKER
Im using the latest version of PHP. Are you able to retreive even the hourly weather data?
the only ways you could read only a part of the file are either there is a \0 character present somewhere in the file itself (maybe on purpoise : i'm not sure the destination site wants you to 'pump' them out), or there is a maximum size buffer for the function you are using.
remember that extracting the whole page is not the way a parser would work :
you should use fopen and fgets to read the lines 1 by 1.
another issue (dummy one) is maybe the page displays the forecasts in a frame, iframe, div, object.... well basically any tag with a src or location property. of course this is where you need to pump them out. not the original url.
btw, the site will not display properly in ie6; nn7 or firefox
this may be because i block ads at firewall level, and some javascripts on the page require them which is nonsense.
the url you gave does not specify a city so the site will not display any forecast anyway.
might work using the location information from your browser when you try but definitely not using php.
the city, even when you browse and specify one is not stored in the url
there is very high chances that they store it on purpoise either directly on the server, or using cookies, or in a session.
COOKIES
---loading---
adc1="|||||"
adc2="||||"
adc3="|||"
partner="accuweather"
adc6="4|"
sesstime="1082627973437"
adc8="4|300730"
adc9="34|1|300730"
---choose city : show form---
ASPSESSIONIDASABSQCQ="PANB FMKDLOKLBL BMKAGCOLFB "
adc9="34|1|300730|38|1|300 730"
---validation---
adc5="LFPB|EU|FR|PARIS|48. 97|2.45| 1"
adc9="34|1|300730|38|1|300 730|0"
look at the adc5 cookie...
if u need more help, i need evidence that they agree.
in that case, they probably will let you either hotlink their site, or use their db.
note on cookies. OPERA is the browser that gives the best real-time information on cookies, and the cookie manager contains VERY cool features such as "accept but delete when closing opera.", "accept but discard changes", "accept for this server only"...
remember that extracting the whole page is not the way a parser would work :
you should use fopen and fgets to read the lines 1 by 1.
another issue (dummy one) is maybe the page displays the forecasts in a frame, iframe, div, object.... well basically any tag with a src or location property. of course this is where you need to pump them out. not the original url.
btw, the site will not display properly in ie6; nn7 or firefox
this may be because i block ads at firewall level, and some javascripts on the page require them which is nonsense.
the url you gave does not specify a city so the site will not display any forecast anyway.
might work using the location information from your browser when you try but definitely not using php.
the city, even when you browse and specify one is not stored in the url
there is very high chances that they store it on purpoise either directly on the server, or using cookies, or in a session.
COOKIES
---loading---
adc1="|||||"
adc2="||||"
adc3="|||"
partner="accuweather"
adc6="4|"
sesstime="1082627973437"
adc8="4|300730"
adc9="34|1|300730"
---choose city : show form---
ASPSESSIONIDASABSQCQ="PANB
adc9="34|1|300730|38|1|300
---validation---
adc5="LFPB|EU|FR|PARIS|48.
adc9="34|1|300730|38|1|300
look at the adc5 cookie...
if u need more help, i need evidence that they agree.
in that case, they probably will let you either hotlink their site, or use their db.
note on cookies. OPERA is the browser that gives the best real-time information on cookies, and the cookie manager contains VERY cool features such as "accept but delete when closing opera.", "accept but discard changes", "accept for this server only"...
ASKER
skullnobrains: I think you found the problem. I never noticed that the city data was not stored in the URL.
I guess how the site works is that a cookie is stored on your browser when you reach the page which gives you the weather forecast for the next 7 days. On that page, the city/country is present in the URL. When you then click to view the hourly forecast, it remembers what city/country you are viewing using a cookie. I guess there is no way I can get PHP to overcome this right?
I guess how the site works is that a cookie is stored on your browser when you reach the page which gives you the weather forecast for the next 7 days. On that page, the city/country is present in the URL. When you then click to view the hourly forecast, it remembers what city/country you are viewing using a cookie. I guess there is no way I can get PHP to overcome this right?
you can if you set the cookie yourself, setting the server to their server.
as said before
<< look at the adc5 cookie...
if u need more help, i need evidence that they agree.
in that case, they probably will let you either hotlink their site, or use their db.
>>
as said before
<< look at the adc5 cookie...
if u need more help, i need evidence that they agree.
in that case, they probably will let you either hotlink their site, or use their db.
>>
ASKER
Oh so I can spoof a cookie? Interesting...I doubt they would even reply to such a request. Im just doing this for personal convenience.
Set-Cookie: <name>=<value>[; <name>=<value>]...
[; expires=<date>][; domain=<domain_name>]
[; path=<some_path>][; secure][; httponly]
rfc syntax to be pasted in header, the server can be set to any server including a different one.
you don't spoof or steal, you just set the cookie in a regular way. (i'm no black-hat ;)
for personnal use, the simplest is to set the cookie manually in your browser. (ie go to their site once)
for clients, you must set the cookie before you call file_get_contents or an equiv syntax.
the source of the page where you choose the cities on their site probably contains the exhaustive list of the supported cities.
again, you MUST
- have their agreement as hotlinking is costly in bandwidth (and may lead to prosecutions in some countries)
- let banners and names of their site visible on yours.
btw, i'd be eager to see the working result if you can afford to paste a link sometime.
ps if you have a hard time, use opera for debugging (real-time information on cookies, rejection and limitations possible...)
[; expires=<date>][; domain=<domain_name>]
[; path=<some_path>][; secure][; httponly]
rfc syntax to be pasted in header, the server can be set to any server including a different one.
you don't spoof or steal, you just set the cookie in a regular way. (i'm no black-hat ;)
for personnal use, the simplest is to set the cookie manually in your browser. (ie go to their site once)
for clients, you must set the cookie before you call file_get_contents or an equiv syntax.
the source of the page where you choose the cities on their site probably contains the exhaustive list of the supported cities.
again, you MUST
- have their agreement as hotlinking is costly in bandwidth (and may lead to prosecutions in some countries)
- let banners and names of their site visible on yours.
btw, i'd be eager to see the working result if you can afford to paste a link sometime.
ps if you have a hard time, use opera for debugging (real-time information on cookies, rejection and limitations possible...)
ASKER
Actually it doesnt really make sense. If the cookie is still only stored by the browser, how is the weather server supposed to retreive it when it is PHP which is requesting the page and not the browser?
either you don't request the page using php but simply include it in any html element
in this case you can workout some javascript to remove the unnecessary code.
or you may try a few options using php to retrieve the page
while feeding either the name of the variable itself or $_COOKIES[adc5]
<< Oh so I can spoof a cookie? Interesting...I doubt they would even reply to such a request. Im just doing this for personal convenience. >>
if they agree, they'll let you have a look at their code and it will be much easier to work it out.
i really believe that you are doing it for personnal convenience, actually.
i'm beginning to be ashamed to explain such things.
this is my last post on the thread, unless i know more of the whereabouts, and you provide a link.
in this case you can workout some javascript to remove the unnecessary code.
or you may try a few options using php to retrieve the page
while feeding either the name of the variable itself or $_COOKIES[adc5]
<< Oh so I can spoof a cookie? Interesting...I doubt they would even reply to such a request. Im just doing this for personal convenience. >>
if they agree, they'll let you have a look at their code and it will be much easier to work it out.
i really believe that you are doing it for personnal convenience, actually.
i'm beginning to be ashamed to explain such things.
this is my last post on the thread, unless i know more of the whereabouts, and you provide a link.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks for the help skullnobrains. I'll try it out when I have time. NIce to learn something new about PHP.
<?php
// Get a file into an array. In this example we'll go through HTTP to get
// the HTML source of a URL.
$lines = file('http://www.example.com/');
// Loop through our array, show HTML source as HTML source; and line numbers too.
foreach ($lines as $line_num => $line) {
echo "Line #<b>{$line_num}</b> : " . htmlspecialchars($line) . "<br />\n";
}
// Another example, let's get a web page into a string. See also file_get_contents().
$html = implode('', file('http://www.example.com/'));
?>