We help IT Professionals succeed at work.
Get Started

Problem getting page with CURL and PHP

Thread7 asked
Last Modified: 2013-12-13
I had written some PHP code to periodically scrape a URL and it was working fine. Then the site must have changed something and now it doesn't work. It works fine through FireFox but I get a 400 Bad Request through CURL. It seems like I've tried every curl_opt setting with no success. I'm thinking if I can just send the exact same Request headers as Firefox I should be fine. But how to do that?
CURL seems to add a few extra items without my telling it to.
Lately I've been setting my own header with pretty much the same items as Firefox like this:
$header = array("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Language: en-us,en;q=0.5",
"Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7", "Keep-Alive: 300", "Connection: keep-alive", "Cache-Control: max-age=0", "Accept-Encoding: gzip,deflate");
***The working FireFox header is basically this:
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv: Gecko/2009042316 Firefox/3.0.10 GTB7.0 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: noscript=1; userid=1550521915; xsession=d9c73c024e99af04581a30521d3558ba; datrval=1276442132-05e4a9265e4ac217a93748a73720f4becd56decd0c7d576d04eb8
Cache-Control: max-age=0

There is a login that I run through curl before my request for the page I want to scrape and some of those cookies get there. But I'm pretty confident that the user and session cookies are not the problem. When I look at the header returned by curl_getinfo I see a few differences and figure one of these is the problem.
*** The non working CURL header I am sending is this:
POST /datadirectory/viewinfo.php HTTP/1.0
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv: Gecko/2009042316 Firefox/3.0.10 GTB7.0
(.NET CLR 3.5.30729)
Host: www.example.com
Cookie: xsession=d9c73c024e99af04581a30521d3558ba; userid=1550521915; noscript=1; datrval=1276442132-05e4a9265e4ac217a93748a73720f4becd56decd0c7d576d04eb8
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cache-Control: max-age=0
Accept-Encoding: gzip,deflate
Content-Length: 0
Content-Type: application/x-www-form-urlencoded

The differences that I think may be it are:
*** POST /datadirectory/viewinfo.php -- Huh? Why does CURL send this as post? The site along with this is the url I want. http://www.example.com/datadirectory/viewinfo.php

*** Content-Length: 0 -- Why am I sending it Content-Length: 0? I'd like to just leave this out since Firefox doesn't send it. But CURL is automatically adding it. Maybe that is saying the POST data length is 0?

*** Accept-Encoding: gzip,deflate. I set this manually in the CURLOPT_HTTPHEADER but if I leave it out I still have the problem.

*** I tried with setting curl to HTTP 1.0 and HTTP 1.1, neither made a difference.

Any ideas??
Watch Question
This problem has been solved!
Unlock 1 Answer and 4 Comments.
See Answer
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE