Solved

php curl, getting bad characters

Posted on 2010-11-16
8
946 Views
Last Modified: 2012-05-10
I am trying to gather data from a french website
using curl

I am getting
Complément téléobjectif


but the site says
Complément téléobjectif


0
Comment
Question by:rgb192
  • 4
  • 3
8 Comments
 
LVL 7

Expert Comment

by:printnix63
ID: 34151072
You will have to URL-Encode them, these are double byte characters and URL/curl will only get plain ASCII. or if it is a POST request, request data binary.

From the curl Manual:
-d/--data <data>
(HTTP) Sends the specified data in a POST request to the HTTP server, in the same way that a
browser does when a user has filled in an HTML form and presses the submit button. This will
cause curl to pass the data to the server using the content-type application/x-www-form-urlencoded.
Compare to -F/--form.
-d/--data is the same as --data-ascii. To post data purely binary, you should instead use the --databinary
option. To URL-encode the value of a form field you may use --data-urlencode.
If any of these options is used more than once on the same command line, the data pieces specified
will be merged together with a separating &-symbol. Thus, using ’-d name=daniel -d skill=lousy’
would generate a post chunk that looks like ’name=daniel&skill=lousy’.
If you start the data with the letter @, the rest should be a file name to read the data from, or - if
you want curl to read the data from stdin. The contents of the file must already be URL-encoded.
Multiple files can also be specified. Posting data from a file named ’foobar’ would thus be done
with --data @foobar.
--data-binary <data>
(HTTP) This posts data exactly as specified with no extra processing whatsoever.
If you start the data with the letter @, the rest should be a filename. Data is posted in a similar
manner as --data-ascii does, except that newlines are preserved and conversions are never done.
If this option is used several times, the ones following the first will append data as described in
-d/--data.
--data-urlencode <data>
(HTTP) This posts data, similar to the other --data options with the exception that this performs
URL-encoding. (Added in 7.18.0)

Maybe you'll have to check on the php side for the implementation.
The letter encoding is definitly not what curl did expect.
0
 

Author Comment

by:rgb192
ID: 34151455

so what should I change this line to
header("Content-type: text/plain");
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 34155220
What is the URL of the web site?  I may be able to show you how to retrieve the letters correctly.  I do not think that the accented characters are double-byte.
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 

Author Comment

by:rgb192
ID: 34157061
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 34158462
CURL retrieves the data correctly, and I find things like Périphériques et Stockage in the text.  

Are you telling the browser that this is UTF-8?  Because Amazon is using this character set:
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />

<?php // RAY_temp_rgb192.php
error_reporting(E_ALL);

// A FUNCTION TO RUN A CURL-GET CLIENT CALL TO A FOREIGN SERVER
function my_curl($url, $timeout=2, $error_report=FALSE)
{
    $curl = curl_init();

    // HEADERS AND OPTIONS APPEAR TO BE A FIREFOX BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt( $curl, CURLOPT_URL,            $url  );
    curl_setopt( $curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'  );
    curl_setopt( $curl, CURLOPT_HTTPHEADER,     $header  );
    curl_setopt( $curl, CURLOPT_REFERER,        'http://www.google.com'  );
    curl_setopt( $curl, CURLOPT_ENCODING,       'gzip,deflate'  );
    curl_setopt( $curl, CURLOPT_AUTOREFERER,    TRUE  );
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, TRUE  );
    curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, TRUE  );
    curl_setopt( $curl, CURLOPT_TIMEOUT,        $timeout  );

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
            $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}




// USAGE EXAMPLE - PUT YOUR FAVORITE URL HERE
$url = "http://www.amazon.fr/gp/product/B000V9D5LG";
$htm = my_curl($url, 5, TRUE);
if (!$htm) die("NO $url");


// SHOW WHAT WE GOT
echo "<pre>";
$htm = preg_replace('/ +/', ' ', $htm);
$htm = preg_replace('/\n+/', PHP_EOL, $htm);
echo htmlentities($htm);

Open in new window

0
 

Author Comment

by:rgb192
ID: 34176047
so if I change this line in my script from
header("Content-type: text/plain");

to

header("Content-type: charset=iso-8859-1");


it may work
0
 
LVL 109

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 34176627
It's easy enough to try.  I never send the Content-type header - I let my server figure it out for me.
0
 

Author Closing Comment

by:rgb192
ID: 34220314
thanks
0

Featured Post

Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question