grblades
asked on
Problem reading UTF8 formatted file
Trying to read in TSV Unicode file and then insert the data into a MySQL database. This example is in Mandarin, but we'll be dealing with Turkish, Arabic, Japanese... the whole works. Reading the data out of the file (first var_dump()) appears to show most of the Chinese character symbols, but when getting the contents into the array using fgetcsv(), both the English and the Mandarin get garbled with the <?> character interspersed, and the Chinese characters disappear completely - I've tried using several techniques two of the latest are included in the first 4 lines of the while() loop, and have no discernable effect to readability (although the outputted garbage does change). Text file attached.
output-html.log
$row = 0;
$importdata = Array();
// Help fgetcsv() to read in UTF8
setlocale( LC_ALL, 'en_US.UTF-8' );
$handle = fopen( $_FILES[ "uploadcsv" ][ "tmp_name" ], "r" );
var_dump( ( fread( $handle, 10000 ) ) );
while ( ( $data = fgetcsv( $handle, 10000, " " ) ) !== FALSE )
{
$importdata[ $row ][ "englishlanguagename" ] =
mb_convert_encoding( $data[ 0 ], "UTF-8", "auto" );
$importdata[ $row ][ "nativelanguagename" ] =
mb_convert_encoding( $data[ 1 ], "UTF-8", "auto" );
$importdata[ $row ][ "englishcategoryname" ] =
utf8_encode( $data[ 2 ] );
$importdata[ $row ][ "nativecategoryname" ] =
utf8_encode( $data[ 3 ] );
$importdata[ $row ][ "englishsubcategoryname" ] = $data[
4 ];
$importdata[ $row ][ "nativesubcategoryname" ] = $data[ 5 ];
$importdata[ $row ][ "englishphrase" ] = $data[ 6 ];
$importdata[ $row ][ "nativephrase" ] = $data[ 7 ];
$importdata[ $row ][ "audiofile" ] = $data[ 8 ];
$row++;
}
fclose( $handle );
unlink( $_FILES[ "uploadcsv" ][ "tmp_name" ] );
echo "<pre>"; var_dump( $importdata ); echo "</pre>";
Mandarin-phrases-not-final.txtoutput-html.log
please search in EE with "read utf-8 file" ... lot of answers are given already...
are you work on this?
ASKER
Yes. I am in the UK so posted only an hour before finishing work. Going to try a few things today and I will let you know.
Thanks
Thanks
ASKER
Thanks for the link, however I already have the mb_convert functions installed, and am using them on lines 10 & 12 of the above, however using them appears to have no effect on the outcome. The var_dump on line 6 does appear to get most of the multi-byte characters correct (in IE at least, Firefox shows something completely different!), but then once the lines have been run through the fgetcsv() function, they appear to turn to gibberish, so my feeling is that this is where the error lies.
I don't believe the file is having any issues with UTF8, and the fread() function working almost correctly leads me to believe that the reading of the file isn't having any problems - so searching for read utf8 file really doesn't help me much - and yes, I have tried it.
I don't believe the file is having any issues with UTF8, and the fread() function working almost correctly leads me to believe that the reading of the file isn't having any problems - so searching for read utf8 file really doesn't help me much - and yes, I have tried it.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
https://www.experts-exchange.com/questions/23032508/How-to-read-in-UTF-8-encoded-text-file-using-file-get-contents.html