Solved

PHP fputcsv() utf-8 magento error question...

Posted on 2014-03-25
17
2,024 Views
Last Modified: 2016-03-02
I have a php script that uses fputcsv().

This is an example:

$i = 0; // row counter
foreach ($out as &$cur)
{

    if ($i == 0) {
       $cvsHeadings = array_keys($cur);  
      
       fprintf($fpo, chr(0xEF).chr(0xBB).chr(0xBF)); // writes file header for correct encoding.

       fputcsv($fpo, $cvsHeadings); // Write column headings to the file
       fputcsv($fpo, $cur); // first row will get missed out if we don't output it here
    } else {      

        // now we have the headings in just add each row
        fputcsv($fpo, $cur); // Write information to the file
    }

    $i++;
}

Open in new window


The CSV the code produces is used to import products on a magento website.

When we do the import about 30 out of 1900 product rows have this import error:
Skipping import row, required field "sku" is not defined.

after researching this error i found this page to be useful:
http://www.magentocommerce.com/boards%20/viewthread/220376/#t302277

One that webpage it says that the error is due to not having the correct UTF-8 encoding.

Now my real question is whats going on here as i thought i was forcing UTF-8 encoding with this line in my code:
fprintf($fpo, chr(0xEF).chr(0xBB).chr(0xBF)); // writes file header for correct encoding.

Does any one have any ideas?

Thanks.
0
Comment
Question by:helpchrisplz
  • 9
  • 8
17 Comments
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39953365
Are you trying to write a Byte-Order mark?  It's not needed and often causes problems.

Can you show us a little of the input data that is causing the issue?

See also, for some background on UTF-8:
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html
0
 
LVL 1

Author Comment

by:helpchrisplz
ID: 39953503
The error doesn't give the row that the error is found on.
The CSV is 1936 rows and it only errors on 30 of them. I would have to post all 1936 rows. Do you want that?

i can give you a link to the script with an print_r(); if that will help?

I think i just need to force all the text in the CSV to use UTF-8 so that the importer can interpret the row correctly.

unless there is another reason why it would error? not sure.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39953534
UTF-8 is self-evident.  If you want to give me a download link for the big CSV file, I will try to locate the UFT-8 errors with software and we can look at them.  Or you might try looking in the UTF-8 article for Detecting JSON Errors because I think that technology can be applied to finding the errors in the CSV.
0
 
LVL 1

Author Comment

by:helpchrisplz
ID: 39953573
here is a link to the download:

http://christophersowerby.com/csv/csvForceUTF-8.php

it will take a bit of time loading...
thanks.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39953617
Should I be looking at christophersowerby.com/csv/fixedSERVER.CSV to find the bad characters?
0
 
LVL 1

Author Comment

by:helpchrisplz
ID: 39953622
Hi yes that's the one that goes up to Magento.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39953689
Working on it now.  Please remove the BOM bytes from the front of the output document.  You don't want that.  UTF-8 is self-evident.  The byte-order mark is worse than useless unless the code at Magento cannot function without it (their documentation will tell you).

Please post back when I can get a copy of the CSV without the BOM, thanks.
0
 
LVL 1

Author Comment

by:helpchrisplz
ID: 39953743
ok i just commented out this line and reran the script so that the csv has been overwritten.
 //fprintf($fpo, chr(0xEF).chr(0xBB).chr(0xBF));

please re-download: christophersowerby.com/csv/fixedSERVER.CSV
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39953773
I have to leave for a while now, and may be able to look at this again later.  The script doesn't quite do all I want it to do yet, but at least it identifies the locations of the bad UTF-8 characters.  I'll post back with more later.
http://www.iconoun.com/demo/temp_helpchrisplz.php

Scroll down to line 749 to see an example of what is being detected.
0
 
LVL 1

Author Comment

by:helpchrisplz
ID: 39953809
oh i see: BYTE:    618, CHR: na ORD: 133, HEX: 85, BIN: 10000101

and it has question marks in the above row.
so that means we just need to update them rows to fix it.

When you say it doesn't quite do all I want it to do yet. What are you planning?

Is it possible to get a code example of how you managed to output this?
This will help me debug this if the client adds further bad characters in future.

Thank you.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39954226
More to follow.  I cannot find a browser that can switch back and forth between ISO-8859-1 and UTF8 displays.  They seem to be able to display either one or the other, but not both in the same request.  So I can either show you the broken UTF-8 character or I can show you the original character.  Need a bit more experimentation, I think.

I'll post a code sample after I get it refined a bit more.
0
 
LVL 1

Author Comment

by:helpchrisplz
ID: 39954274
ok thanks
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 39954419
This is not "demonstration quality" code, but it gets the job done.  A few notes...

Line 14-15: Choose your output encoding.
Line 26-31: Try to fix any UTF-8 errors.
Line 34-46: See if the fix worked.
Line 49-end: Row-by-row and byte-by-byte verification of the encoding.

Line 45 should be changed to write the converted file back to your server so it can be used in your application.

http://www.iconoun.com/demo/temp_helpchrisplz.php

Please let me know if you still have questions after reading the article on character set collisions.

<?php // demo/temp_helpchrisplz.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Web_Development/Internet_Marketing/Search_Engine_Optimization_SEO/Q_28396768.html
// REF http://php.net/manual/en/function.utf8-encode.php
// REF http://php.net/manual/en/function.ord.php
// REF http://php.net/manual/en/function.decbin.php
// REF http://www.asciitable.com/
// REF http://en.wikipedia.org/wiki/UTF-8


// TRY THE SCRIPT BOTH WITH AND WITHOUT THIS STATEMENT (FIRST PREVAILS)
echo '<meta charset="utf8" />';        // GARBLES NON-UTF-8
echo '<meta charset="iso-8859-1" />';  // GARBLES UTF-8

// PREFORMAT FOR EASY VISUALIZATION
echo '<pre>';


// THE TEST DATA MAY CONTAIN INVALID UTF-8 CHARACTERS
$url = 'http://christophersowerby.com/csv/fixedSERVER.CSV';
$txt = file($url, FILE_IGNORE_NEW_LINES);

// TRY TO FIX THE NON-UTF-8 CHARACTERS
foreach ($txt as $row => $str)
{
    // TRY TO FIX THE NON-UTF-8 CHARACTERS
    $str = utf8_encode($str);
    $txt[$row] = $str;
}

// TEST THE RESULTING DATA SET
$sig = array();
mb_internal_encoding("UTF-8");
foreach ($txt as $row => $str)
{
    // VERIFY UTF-8 CHARACTERS
    if (!mb_check_encoding($str)) $sig[$row] = $str;
}
if (!empty($sig)) var_dump($sig);
if ( empty($sig))
{
    echo "UTF-8 CONVERSION SUCCESSFUL" . PHP_EOL;
    // FILE PUT CONTENTS
}


// LOOK FOR UTF-8 CHARACTERS OR FAILURES
ob_start();
$sig = array();

// WITH EACH ROW OF THE CSV FILE
foreach ($txt as $row => $str)
{
    echo PHP_EOL . "ROW: " .str_pad($row, 4, ' ', STR_PAD_LEFT);
    echo ' ' . $str;

    // SKIP VALID UTF-8 STRINGS
    if (mb_check_encoding($str)) continue;

	// LOOK AT THE TEST STRING BYTE-BY-BYTE (_NOT_ THE SAME AS CHARACTER-BY-CHARACTER)
	$arr = str_split($str);
	foreach ($arr as $ptr => $chr)
	{
	    $ord = ord($chr);
	    $err = FALSE;

	    // PRINTABLE INFO FOR THE BYTE VALUES
	    $hex  = strtoupper(dechex($ord));
	    $bin  = decbin($ord);
	    $ptrp = str_pad($ptr, 3, ' ', STR_PAD_LEFT);
	    $ordp = str_pad($ord, 3, ' ', STR_PAD_LEFT);

	    // FLAG THIS CHARACTER IF THE BYTE-CODE IS GT 127
	    if ($ord > 127)
	    {
	        // echo PHP_EOL . " <b>BYTE: $ptrp</a>, CHR: $chr ORD: $ordp, HEX: $hex, BIN: $bin</b>";

	        // GET POINTERS TO THE NEXT CHARACTERS
	        $pp1 = $ptr + 1;
	        $pp2 = $ptr + 2;
	        $pp3 = $ptr + 3;

	        // IF A FOUR-BYTE UTF-8 CHARACTER, NEXT 3 BYTES MUST START WITH '10'
	        $sub = substr($bin, 0, 5);
	        if ($sub == '11110')
	        {
	            $chs = array();
	            $chs[$pp1] = $arr[$pp1];
	            $chs[$pp2] = $arr[$pp2];
	            $chs[$pp3] = $arr[$pp3];
	            foreach ($chs as $ppp => $nxt)
	            {
	                $cod = decbin(ord($nxt));
	                $cod = str_pad($cod, 8, '0', STR_PAD_LEFT);
	                $utf = substr($cod,0,2);
	                if ($utf !== '10')
	                {
	                    echo ", ERROR IN BYTE $ppp: $cod";
	                    $err = TRUE;
	                }
	            }
	        }

	        // IF A THREE-BYTE UTF-8 CHARACTER, NEXT 2 BYTES MUST START WITH '10'
	        $sub = substr($bin, 0, 4);
	        if ($sub == '1110')
	        {
	            $chs = array();
	            $chs[$pp1] = $arr[$pp1];
	            $chs[$pp2] = $arr[$pp2];
	            foreach ($chs as $ppp => $nxt)
	            {
	                $cod = decbin(ord($nxt));
	                $cod = str_pad($cod, 8, '0', STR_PAD_LEFT);
	                $utf = substr($cod,0,2);
	                if ($utf !== '10')
	                {
	                    echo ", ERROR IN BYTE $ppp: $cod";
	                    $err = TRUE;
	                }
	            }
	        }

	        // IF A TWO BYTE UTF-8 CHARACTER, NEXT 1 BYTE MUST START WITH '10'
	        $sub = substr($bin, 0, 3);
	        if ($sub == '110')
	        {
	            $chs = array();
	            $chs[$pp1] = $arr[$pp1];
	            foreach ($chs as $ppp => $nxt)
	            {
	                $cod = decbin(ord($nxt));
	                $cod = str_pad($cod, 8, '0', STR_PAD_LEFT);
	                $utf = substr($cod,0,2);
	                if ($utf !== '10')
	                {
	                    echo ", ERROR IN BYTE $ppp: $cod";
	                    $err = TRUE;
	                }
	            }
	        }
	    }
	    $sig[$row] = $row;
	}
}
$out = ob_get_clean();

// IF THERE WERE ANY CHARACTERS FLAGGED
if (!empty($sig))
{
    echo PHP_EOL . '<b>POSSIBLE UTF-8 ERRORS</b>';
    echo PHP_EOL;
    print_r($sig);
    echo PHP_EOL;
}

echo PHP_EOL . '<b>ENTIRE DATA SET FOLLOWS:</b>';
echo $out;

Open in new window

Best regards, ~Ray
0
 
LVL 1

Author Comment

by:helpchrisplz
ID: 39958901
Thanks ray.

i've implemented it like this:

// WITH EACH ROW OF THE CSV FILE
foreach ($txt as $row => $str)
{
   
///// i have commented out your formating
// echo PHP_EOL . "ROW: " .str_pad($row, 4, ' ', STR_PAD_LEFT);
   // echo ' ' . $str;

//// make a new line
    echo $str."\n";

Open in new window



then i put your $str back to the file.

$url = 'fixedSERVER.CSV';
file_put_contents($url, $out);

Open in new window

0
 
LVL 1

Author Closing Comment

by:helpchrisplz
ID: 39958904
thanks
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39958931
Thanks for the points and thanks for using Experts-Exchange, ~Ray
0
 
LVL 1

Author Comment

by:helpchrisplz
ID: 39958946
i would give your more points if i could !

I'm sure there will be more questions down the line.... :) haha
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

A step by step SEO guide to creating content that drives traffic and maximizes page views by using the right tricks, tools and keyword analysis. You are the subject matter expert.   You have forgotten more about your area of expertise than most …
A/B testing is a simple and effective trick to get to know your audience, increase website conversions and make the most out of your online ad campaigns. It's widely available and doesn't need much tech knowledge to be executed, but the results it y…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This Micro Tutorial will demonstrate how to add subdomains to your content reports. This can be very importing in having a site with multiple subdomains.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now