LOAD DATA INFILE error caused by foreign characters

I am importing from a CSV file into MySQL using LOAD DATA INFILE as follows:
LOAD DATA INFILE '#theCSVFile#' INTO TABLE tempTable FIELDS TERMINATED BY '|'

Open in new window

This has worked successfully before, but today I get an error:
Incorrect string value: '\xF6gberg' for column 'AuthorisedAssigneeFullName' at row 17
which is caused by the person's name
Högberg
containing a foreign character.

How do I fix the problem so names with foreign characters are processed normally?

For completeness, I created the CSV file by extracting data from an Excel spreadsheet under ColdFusion 9, using the following code:

<!---Read the xlsx file into a query---> 
<cfspreadsheet action="read" src="#sourcefile#" sheet="2" query="xlsxData" columns="2-48" rows="8-9999"> 
<!--- Write the contents of the query out to a new file--->
<cfset Delim = "|">
<cfset sb = createObject("java", "java.lang.StringBuffer")/><!--- creates empty StringBuffer --->
<cfset endLine = chr(10)><!--- set endline to {LF} chr(10)--->
<cfloop query="xlsxData" endrow="#xlsxData.recordCount#">
<cfset sb.append(COL_1).append(Delim).append(COL_2).append(Delim).append(COL_3).append(Delim).append(COL_4).append(Delim).append(COL_5).append(Delim).append(COL_6).append(Delim).append(COL_7).append(Delim).append(COL_8).append(Delim).append(COL_9).append(Delim).append(COL_10).append(Delim).append(COL_11).append(Delim).append(COL_12).append(Delim).append(COL_13).append(Delim).append(COL_14).append(Delim).append(COL_15).append(Delim).append(COL_16).append(Delim).append(COL_17).append(Delim).append(COL_18).append(Delim).append(COL_19).append(Delim).append(COL_20).append(Delim).append(COL_21).append(Delim).append(COL_22).append(Delim).append(COL_23).append(Delim).append(COL_24).append(Delim).append(COL_25).append(Delim).append(COL_26).append(Delim).append(COL_27).append(Delim).append(COL_28).append(Delim).append(COL_29).append(Delim).append(COL_30).append(Delim).append(COL_31).append(Delim).append(COL_32).append(Delim).append(COL_33).append(Delim).append(COL_34).append(Delim).append(COL_35).append(Delim).append(COL_36).append(Delim).append(COL_37).append(Delim).append(COL_38).append(Delim).append(COL_39).append(Delim).append(COL_40).append(Delim).append(COL_41).append(Delim).append(COL_42).append(Delim).append(COL_43).append(Delim).append(COL_44).append(Delim).append(COL_45).append(Delim).append(COL_46).append(Delim).append(COL_47)>
<!--- Add a newline unless it's the last record in the query --->
<cfif currentRow NEQ recordCount>
<cfset sb.append(endLine)>
</cfif>
</cfloop>
<cfset theCSVFile = "C:\\Documents and Settings\\All Users\\Application Data\\MySQL\\MySQL Server 5.1\\Data\\mytracker\\tempFile.csv">
<cffile action="write" file="#theCSVFile#" output="#sb.toString()#">

Open in new window

ShanghaiDAsked:
Who is Participating?
 
_agx_Connect With a Mentor Commented:
Change the charset of your cffile and it should work. Example:

<cfprocessingdirective pageEncoding="utf-8">
<cfsavecontent variable="sampleData">
Högberg      
Classical Greek: ¿a¿¿¿ ¿a¿e¿¿ d¿¿aµa¿¿
Greek (monotonic): ¿p¿¿¿ ¿a f¿¿ spasµ¿¿a ¿¿a¿¿¿ ¿¿¿¿¿ ¿a p¿¿¿ t¿p¿ta.
¿ ¿ ¿ ¿
¿ ¿ ¿ ¿
</cfsavecontent>
<cffile action="write" file="c:\test.txt" output="#sampleData#" charset="utf-8">

Don't know if you need to specify the encoding on your buffer too
ie
<cffile action="write" file="#theCSVFile#" output="#sb.toString("UTF8")#"  charset="utf-8">

0
 
objectsCommented:
you can specify the charset to use

LOAD DATA INFILE '#theCSVFile#' INTO TABLE tempTable  CHARACTER SET XXX  FIELDS TERMINATED BY '|'

and make sure the database is created with the appropriate charset
0
 
_agx_Commented:
I don't remember if you also need to enable unicode for the dsn. Try objects's suggestions first.  IF you get an error/or the characters don't insert correctly, then also add the unicode settings to your dsn
http://www.experts-exchange.com/Database/MySQL/Q_26874251.html#a35086595

1) Select  Data & Services > Datasources > (Your MySQL Datasource)
2) Click:   Show Advanced Settings
3) In "Connection Settings" enter:
useUnicode=true&characterEncoding=UTF-8

Open in new window

0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

 
ShanghaiDAuthor Commented:
Thank you both, but still getting the error message.  My table code is:
<!---Create Temporary table--->
<cfquery name="CreateTemporaryTable" datasource="mysqlTracker">
CREATE TABLE `temptable` (
`AuthorisedAssigneeFullName` varchar(499) DEFAULT NULL,
`Emailaddress` varchar(199) DEFAULT NULL,
`Nationality` varchar(499) DEFAULT NULL,
etc
etc
etc
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT; 
</cfquery>
				
<cfquery name="LoadDataInfile" datasource="mysqlTracker">
LOAD DATA INFILE '#theCSVFile#' INTO TABLE tempTable CHARACTER SET utf8 FIELDS TERMINATED BY '|' 
</cfquery>

Open in new window

and My dsn connection string is set as:
zeroDateTimeBehavior=convertToNull&useUnicode=true&characterEncoding=UTF-8

Open in new window

I'm  currently working aroudn the problem by loopign through the query object created when CF9 imported the Excel spreadhseet, but using LOAD DATA INFILE would be faster if we could get it to work.
0
 
objectsCommented:
is the data in the csv correct ie. is it being generated correctly?
0
 
ShanghaiDAuthor Commented:
Yes, it is (and everything works fine if there are no foreign characters in a person's name).

Here is the offending line in the CSV file  whcih is triggering the error:

J Högberg|||||||||1-Mar-10|OPEN|XY EA||||||||||||N/A|Commuter|||||N/A|N|||||||||||||||0|(

Open in new window

0
 
ShanghaiDAuthor Commented:
I'm going to have to abandon trying to get LOAD DATA INFILE to work on this and continue instead to use CFLOOP through each spreadsheet row because I have now also found another problem  with Chinese characters in some persons' names.  Unlike the Högberg character problem, the Chinese characters do not throw an error, but when I view the Mysql table, I see ?????? for the characters which have been inserted.  However, when I process the spreadsheet using the CFLOOP query, the Chinese characters (and also Högberg) come across correctly.  
So, although looping is slower than LOAD DATA INFILE,  at least I have a practical solution to my problem.  Thanks everyone for trying to help me.
0
 
_agx_Commented:
If you're seeing "??????" it means it's not being inserted with the correct encoding. So you get the "??" unknown characters instead.  I'm going to test it and see what I get.  What version of MySQL are you using?  
0
 
objectsCommented:
whats the encoding of the CSV file?
0
 
ShanghaiDAuthor Commented:
Server Version: 5.1.53-communityand Encoding: 65001 (UTF-8).
0
 
_agx_Commented:
Darn .. it mangled my utf8 samples ;-) But the important part is the <cffile ..> code anyway.
0
 
ShanghaiDAuthor Commented:
Success!    You have found my solution (both for Högberg and the Chinese characters)!
I was not explicitly setting the CSV encoding, and when I changed this in the CFFILE tag from:
<cffile action="write" file="#theCSVFile#" output="#sb.toString()#">

Open in new window

to:
<cffile action="write" file="#theCSVFile#" charset="utf-8" output="#sb.toString()#">

Open in new window

then everything works without a problem (using LOAD DATA INFILE).
Thanks for persevering to guide me to the solution!  You deserve the points!
0
 
_agx_Commented:
Great, glad it's working now :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.