[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2783
  • Last Modified:

PHP fputcsv() utf-8 magento error question...

I have a php script that uses fputcsv().

This is an example:

$i = 0; // row counter
foreach ($out as &$cur)
{

    if ($i == 0) {
       $cvsHeadings = array_keys($cur);  
      
       fprintf($fpo, chr(0xEF).chr(0xBB).chr(0xBF)); // writes file header for correct encoding.

       fputcsv($fpo, $cvsHeadings); // Write column headings to the file
       fputcsv($fpo, $cur); // first row will get missed out if we don't output it here
    } else {      

        // now we have the headings in just add each row
        fputcsv($fpo, $cur); // Write information to the file
    }

    $i++;
}

Open in new window


The CSV the code produces is used to import products on a magento website.

When we do the import about 30 out of 1900 product rows have this import error:
Skipping import row, required field "sku" is not defined.

after researching this error i found this page to be useful:
http://www.magentocommerce.com/boards%20/viewthread/220376/#t302277

One that webpage it says that the error is due to not having the correct UTF-8 encoding.

Now my real question is whats going on here as i thought i was forcing UTF-8 encoding with this line in my code:
fprintf($fpo, chr(0xEF).chr(0xBB).chr(0xBF)); // writes file header for correct encoding.

Does any one have any ideas?

Thanks.
0
helpchrisplz
Asked:
helpchrisplz
  • 9
  • 8
1 Solution
 
Ray PaseurCommented:
Are you trying to write a Byte-Order mark?  It's not needed and often causes problems.

Can you show us a little of the input data that is causing the issue?

See also, for some background on UTF-8:
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html
0
 
helpchrisplzAuthor Commented:
The error doesn't give the row that the error is found on.
The CSV is 1936 rows and it only errors on 30 of them. I would have to post all 1936 rows. Do you want that?

i can give you a link to the script with an print_r(); if that will help?

I think i just need to force all the text in the CSV to use UTF-8 so that the importer can interpret the row correctly.

unless there is another reason why it would error? not sure.
0
 
Ray PaseurCommented:
UTF-8 is self-evident.  If you want to give me a download link for the big CSV file, I will try to locate the UFT-8 errors with software and we can look at them.  Or you might try looking in the UTF-8 article for Detecting JSON Errors because I think that technology can be applied to finding the errors in the CSV.
0
The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

 
helpchrisplzAuthor Commented:
here is a link to the download:

http://christophersowerby.com/csv/csvForceUTF-8.php

it will take a bit of time loading...
thanks.
0
 
Ray PaseurCommented:
Should I be looking at christophersowerby.com/csv/fixedSERVER.CSV to find the bad characters?
0
 
helpchrisplzAuthor Commented:
Hi yes that's the one that goes up to Magento.
0
 
Ray PaseurCommented:
Working on it now.  Please remove the BOM bytes from the front of the output document.  You don't want that.  UTF-8 is self-evident.  The byte-order mark is worse than useless unless the code at Magento cannot function without it (their documentation will tell you).

Please post back when I can get a copy of the CSV without the BOM, thanks.
0
 
helpchrisplzAuthor Commented:
ok i just commented out this line and reran the script so that the csv has been overwritten.
 //fprintf($fpo, chr(0xEF).chr(0xBB).chr(0xBF));

please re-download: christophersowerby.com/csv/fixedSERVER.CSV
0
 
Ray PaseurCommented:
I have to leave for a while now, and may be able to look at this again later.  The script doesn't quite do all I want it to do yet, but at least it identifies the locations of the bad UTF-8 characters.  I'll post back with more later.
http://www.iconoun.com/demo/temp_helpchrisplz.php

Scroll down to line 749 to see an example of what is being detected.
0
 
helpchrisplzAuthor Commented:
oh i see: BYTE:    618, CHR: na ORD: 133, HEX: 85, BIN: 10000101

and it has question marks in the above row.
so that means we just need to update them rows to fix it.

When you say it doesn't quite do all I want it to do yet. What are you planning?

Is it possible to get a code example of how you managed to output this?
This will help me debug this if the client adds further bad characters in future.

Thank you.
0
 
Ray PaseurCommented:
More to follow.  I cannot find a browser that can switch back and forth between ISO-8859-1 and UTF8 displays.  They seem to be able to display either one or the other, but not both in the same request.  So I can either show you the broken UTF-8 character or I can show you the original character.  Need a bit more experimentation, I think.

I'll post a code sample after I get it refined a bit more.
0
 
helpchrisplzAuthor Commented:
ok thanks
0
 
Ray PaseurCommented:
This is not "demonstration quality" code, but it gets the job done.  A few notes...

Line 14-15: Choose your output encoding.
Line 26-31: Try to fix any UTF-8 errors.
Line 34-46: See if the fix worked.
Line 49-end: Row-by-row and byte-by-byte verification of the encoding.

Line 45 should be changed to write the converted file back to your server so it can be used in your application.

http://www.iconoun.com/demo/temp_helpchrisplz.php

Please let me know if you still have questions after reading the article on character set collisions.

<?php // demo/temp_helpchrisplz.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Web_Development/Internet_Marketing/Search_Engine_Optimization_SEO/Q_28396768.html
// REF http://php.net/manual/en/function.utf8-encode.php
// REF http://php.net/manual/en/function.ord.php
// REF http://php.net/manual/en/function.decbin.php
// REF http://www.asciitable.com/
// REF http://en.wikipedia.org/wiki/UTF-8


// TRY THE SCRIPT BOTH WITH AND WITHOUT THIS STATEMENT (FIRST PREVAILS)
echo '<meta charset="utf8" />';        // GARBLES NON-UTF-8
echo '<meta charset="iso-8859-1" />';  // GARBLES UTF-8

// PREFORMAT FOR EASY VISUALIZATION
echo '<pre>';


// THE TEST DATA MAY CONTAIN INVALID UTF-8 CHARACTERS
$url = 'http://christophersowerby.com/csv/fixedSERVER.CSV';
$txt = file($url, FILE_IGNORE_NEW_LINES);

// TRY TO FIX THE NON-UTF-8 CHARACTERS
foreach ($txt as $row => $str)
{
    // TRY TO FIX THE NON-UTF-8 CHARACTERS
    $str = utf8_encode($str);
    $txt[$row] = $str;
}

// TEST THE RESULTING DATA SET
$sig = array();
mb_internal_encoding("UTF-8");
foreach ($txt as $row => $str)
{
    // VERIFY UTF-8 CHARACTERS
    if (!mb_check_encoding($str)) $sig[$row] = $str;
}
if (!empty($sig)) var_dump($sig);
if ( empty($sig))
{
    echo "UTF-8 CONVERSION SUCCESSFUL" . PHP_EOL;
    // FILE PUT CONTENTS
}


// LOOK FOR UTF-8 CHARACTERS OR FAILURES
ob_start();
$sig = array();

// WITH EACH ROW OF THE CSV FILE
foreach ($txt as $row => $str)
{
    echo PHP_EOL . "ROW: " .str_pad($row, 4, ' ', STR_PAD_LEFT);
    echo ' ' . $str;

    // SKIP VALID UTF-8 STRINGS
    if (mb_check_encoding($str)) continue;

	// LOOK AT THE TEST STRING BYTE-BY-BYTE (_NOT_ THE SAME AS CHARACTER-BY-CHARACTER)
	$arr = str_split($str);
	foreach ($arr as $ptr => $chr)
	{
	    $ord = ord($chr);
	    $err = FALSE;

	    // PRINTABLE INFO FOR THE BYTE VALUES
	    $hex  = strtoupper(dechex($ord));
	    $bin  = decbin($ord);
	    $ptrp = str_pad($ptr, 3, ' ', STR_PAD_LEFT);
	    $ordp = str_pad($ord, 3, ' ', STR_PAD_LEFT);

	    // FLAG THIS CHARACTER IF THE BYTE-CODE IS GT 127
	    if ($ord > 127)
	    {
	        // echo PHP_EOL . " <b>BYTE: $ptrp</a>, CHR: $chr ORD: $ordp, HEX: $hex, BIN: $bin</b>";

	        // GET POINTERS TO THE NEXT CHARACTERS
	        $pp1 = $ptr + 1;
	        $pp2 = $ptr + 2;
	        $pp3 = $ptr + 3;

	        // IF A FOUR-BYTE UTF-8 CHARACTER, NEXT 3 BYTES MUST START WITH '10'
	        $sub = substr($bin, 0, 5);
	        if ($sub == '11110')
	        {
	            $chs = array();
	            $chs[$pp1] = $arr[$pp1];
	            $chs[$pp2] = $arr[$pp2];
	            $chs[$pp3] = $arr[$pp3];
	            foreach ($chs as $ppp => $nxt)
	            {
	                $cod = decbin(ord($nxt));
	                $cod = str_pad($cod, 8, '0', STR_PAD_LEFT);
	                $utf = substr($cod,0,2);
	                if ($utf !== '10')
	                {
	                    echo ", ERROR IN BYTE $ppp: $cod";
	                    $err = TRUE;
	                }
	            }
	        }

	        // IF A THREE-BYTE UTF-8 CHARACTER, NEXT 2 BYTES MUST START WITH '10'
	        $sub = substr($bin, 0, 4);
	        if ($sub == '1110')
	        {
	            $chs = array();
	            $chs[$pp1] = $arr[$pp1];
	            $chs[$pp2] = $arr[$pp2];
	            foreach ($chs as $ppp => $nxt)
	            {
	                $cod = decbin(ord($nxt));
	                $cod = str_pad($cod, 8, '0', STR_PAD_LEFT);
	                $utf = substr($cod,0,2);
	                if ($utf !== '10')
	                {
	                    echo ", ERROR IN BYTE $ppp: $cod";
	                    $err = TRUE;
	                }
	            }
	        }

	        // IF A TWO BYTE UTF-8 CHARACTER, NEXT 1 BYTE MUST START WITH '10'
	        $sub = substr($bin, 0, 3);
	        if ($sub == '110')
	        {
	            $chs = array();
	            $chs[$pp1] = $arr[$pp1];
	            foreach ($chs as $ppp => $nxt)
	            {
	                $cod = decbin(ord($nxt));
	                $cod = str_pad($cod, 8, '0', STR_PAD_LEFT);
	                $utf = substr($cod,0,2);
	                if ($utf !== '10')
	                {
	                    echo ", ERROR IN BYTE $ppp: $cod";
	                    $err = TRUE;
	                }
	            }
	        }
	    }
	    $sig[$row] = $row;
	}
}
$out = ob_get_clean();

// IF THERE WERE ANY CHARACTERS FLAGGED
if (!empty($sig))
{
    echo PHP_EOL . '<b>POSSIBLE UTF-8 ERRORS</b>';
    echo PHP_EOL;
    print_r($sig);
    echo PHP_EOL;
}

echo PHP_EOL . '<b>ENTIRE DATA SET FOLLOWS:</b>';
echo $out;

Open in new window

Best regards, ~Ray
0
 
helpchrisplzAuthor Commented:
Thanks ray.

i've implemented it like this:

// WITH EACH ROW OF THE CSV FILE
foreach ($txt as $row => $str)
{
   
///// i have commented out your formating
// echo PHP_EOL . "ROW: " .str_pad($row, 4, ' ', STR_PAD_LEFT);
   // echo ' ' . $str;

//// make a new line
    echo $str."\n";

Open in new window



then i put your $str back to the file.

$url = 'fixedSERVER.CSV';
file_put_contents($url, $out);

Open in new window

0
 
helpchrisplzAuthor Commented:
thanks
0
 
Ray PaseurCommented:
Thanks for the points and thanks for using Experts-Exchange, ~Ray
0
 
helpchrisplzAuthor Commented:
i would give your more points if i could !

I'm sure there will be more questions down the line.... :) haha
0

Featured Post

The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

  • 9
  • 8
Tackle projects and never again get stuck behind a technical roadblock.
Join Now