Solved

LOAD DATA INFILE error caused by foreign characters

Posted on 2011-03-10
13
914 Views
Last Modified: 2012-06-27
I am importing from a CSV file into MySQL using LOAD DATA INFILE as follows:
LOAD DATA INFILE '#theCSVFile#' INTO TABLE tempTable FIELDS TERMINATED BY '|'

Open in new window

This has worked successfully before, but today I get an error:
Incorrect string value: '\xF6gberg' for column 'AuthorisedAssigneeFullName' at row 17
which is caused by the person's name
Högberg
containing a foreign character.

How do I fix the problem so names with foreign characters are processed normally?

For completeness, I created the CSV file by extracting data from an Excel spreadsheet under ColdFusion 9, using the following code:

<!---Read the xlsx file into a query---> 
<cfspreadsheet action="read" src="#sourcefile#" sheet="2" query="xlsxData" columns="2-48" rows="8-9999"> 
<!--- Write the contents of the query out to a new file--->
<cfset Delim = "|">
<cfset sb = createObject("java", "java.lang.StringBuffer")/><!--- creates empty StringBuffer --->
<cfset endLine = chr(10)><!--- set endline to {LF} chr(10)--->
<cfloop query="xlsxData" endrow="#xlsxData.recordCount#">
<cfset sb.append(COL_1).append(Delim).append(COL_2).append(Delim).append(COL_3).append(Delim).append(COL_4).append(Delim).append(COL_5).append(Delim).append(COL_6).append(Delim).append(COL_7).append(Delim).append(COL_8).append(Delim).append(COL_9).append(Delim).append(COL_10).append(Delim).append(COL_11).append(Delim).append(COL_12).append(Delim).append(COL_13).append(Delim).append(COL_14).append(Delim).append(COL_15).append(Delim).append(COL_16).append(Delim).append(COL_17).append(Delim).append(COL_18).append(Delim).append(COL_19).append(Delim).append(COL_20).append(Delim).append(COL_21).append(Delim).append(COL_22).append(Delim).append(COL_23).append(Delim).append(COL_24).append(Delim).append(COL_25).append(Delim).append(COL_26).append(Delim).append(COL_27).append(Delim).append(COL_28).append(Delim).append(COL_29).append(Delim).append(COL_30).append(Delim).append(COL_31).append(Delim).append(COL_32).append(Delim).append(COL_33).append(Delim).append(COL_34).append(Delim).append(COL_35).append(Delim).append(COL_36).append(Delim).append(COL_37).append(Delim).append(COL_38).append(Delim).append(COL_39).append(Delim).append(COL_40).append(Delim).append(COL_41).append(Delim).append(COL_42).append(Delim).append(COL_43).append(Delim).append(COL_44).append(Delim).append(COL_45).append(Delim).append(COL_46).append(Delim).append(COL_47)>
<!--- Add a newline unless it's the last record in the query --->
<cfif currentRow NEQ recordCount>
<cfset sb.append(endLine)>
</cfif>
</cfloop>
<cfset theCSVFile = "C:\\Documents and Settings\\All Users\\Application Data\\MySQL\\MySQL Server 5.1\\Data\\mytracker\\tempFile.csv">
<cffile action="write" file="#theCSVFile#" output="#sb.toString()#">

Open in new window

0
Comment
Question by:ShanghaiD
  • 5
  • 5
  • 3
13 Comments
 
LVL 92

Expert Comment

by:objects
ID: 35103415
you can specify the charset to use

LOAD DATA INFILE '#theCSVFile#' INTO TABLE tempTable  CHARACTER SET XXX  FIELDS TERMINATED BY '|'

and make sure the database is created with the appropriate charset
0
 
LVL 52

Expert Comment

by:_agx_
ID: 35104036
I don't remember if you also need to enable unicode for the dsn. Try objects's suggestions first.  IF you get an error/or the characters don't insert correctly, then also add the unicode settings to your dsn
http://www.experts-exchange.com/Database/MySQL/Q_26874251.html#a35086595

1) Select  Data & Services > Datasources > (Your MySQL Datasource)
2) Click:   Show Advanced Settings
3) In "Connection Settings" enter:
useUnicode=true&characterEncoding=UTF-8

Open in new window

0
 

Author Comment

by:ShanghaiD
ID: 35104254
Thank you both, but still getting the error message.  My table code is:
<!---Create Temporary table--->
<cfquery name="CreateTemporaryTable" datasource="mysqlTracker">
CREATE TABLE `temptable` (
`AuthorisedAssigneeFullName` varchar(499) DEFAULT NULL,
`Emailaddress` varchar(199) DEFAULT NULL,
`Nationality` varchar(499) DEFAULT NULL,
etc
etc
etc
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT; 
</cfquery>
				
<cfquery name="LoadDataInfile" datasource="mysqlTracker">
LOAD DATA INFILE '#theCSVFile#' INTO TABLE tempTable CHARACTER SET utf8 FIELDS TERMINATED BY '|' 
</cfquery>

Open in new window

and My dsn connection string is set as:
zeroDateTimeBehavior=convertToNull&useUnicode=true&characterEncoding=UTF-8

Open in new window

I'm  currently working aroudn the problem by loopign through the query object created when CF9 imported the Excel spreadhseet, but using LOAD DATA INFILE would be faster if we could get it to work.
0
 
LVL 92

Expert Comment

by:objects
ID: 35104338
is the data in the csv correct ie. is it being generated correctly?
0
 

Author Comment

by:ShanghaiD
ID: 35104458
Yes, it is (and everything works fine if there are no foreign characters in a person's name).

Here is the offending line in the CSV file  whcih is triggering the error:

J Högberg|||||||||1-Mar-10|OPEN|XY EA||||||||||||N/A|Commuter|||||N/A|N|||||||||||||||0|(

Open in new window

0
 

Author Comment

by:ShanghaiD
ID: 35104714
I'm going to have to abandon trying to get LOAD DATA INFILE to work on this and continue instead to use CFLOOP through each spreadsheet row because I have now also found another problem  with Chinese characters in some persons' names.  Unlike the Högberg character problem, the Chinese characters do not throw an error, but when I view the Mysql table, I see ?????? for the characters which have been inserted.  However, when I process the spreadsheet using the CFLOOP query, the Chinese characters (and also Högberg) come across correctly.  
So, although looping is slower than LOAD DATA INFILE,  at least I have a practical solution to my problem.  Thanks everyone for trying to help me.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 52

Expert Comment

by:_agx_
ID: 35104799
If you're seeing "??????" it means it's not being inserted with the correct encoding. So you get the "??" unknown characters instead.  I'm going to test it and see what I get.  What version of MySQL are you using?  
0
 
LVL 92

Expert Comment

by:objects
ID: 35104834
whats the encoding of the CSV file?
0
 

Author Comment

by:ShanghaiD
ID: 35104836
Server Version: 5.1.53-communityand Encoding: 65001 (UTF-8).
0
 
LVL 52

Accepted Solution

by:
_agx_ earned 500 total points
ID: 35104916
Change the charset of your cffile and it should work. Example:

<cfprocessingdirective pageEncoding="utf-8">
<cfsavecontent variable="sampleData">
Högberg      
Classical Greek: ¿a¿¿¿ ¿a¿e¿¿ d¿¿aµa¿¿
Greek (monotonic): ¿p¿¿¿ ¿a f¿¿ spasµ¿¿a ¿¿a¿¿¿ ¿¿¿¿¿ ¿a p¿¿¿ t¿p¿ta.
¿ ¿ ¿ ¿
¿ ¿ ¿ ¿
</cfsavecontent>
<cffile action="write" file="c:\test.txt" output="#sampleData#" charset="utf-8">

Don't know if you need to specify the encoding on your buffer too
ie
<cffile action="write" file="#theCSVFile#" output="#sb.toString("UTF8")#"  charset="utf-8">

0
 
LVL 52

Expert Comment

by:_agx_
ID: 35104923
Darn .. it mangled my utf8 samples ;-) But the important part is the <cffile ..> code anyway.
0
 

Author Comment

by:ShanghaiD
ID: 35104943
Success!    You have found my solution (both for Högberg and the Chinese characters)!
I was not explicitly setting the CSV encoding, and when I changed this in the CFFILE tag from:
<cffile action="write" file="#theCSVFile#" output="#sb.toString()#">

Open in new window

to:
<cffile action="write" file="#theCSVFile#" charset="utf-8" output="#sb.toString()#">

Open in new window

then everything works without a problem (using LOAD DATA INFILE).
Thanks for persevering to guide me to the solution!  You deserve the points!
0
 
LVL 52

Expert Comment

by:_agx_
ID: 35104982
Great, glad it's working now :)
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
Viewers will learn about if statements in Java and their use The if statement: The condition required to create an if statement: Variations of if statements: An example using if statements:
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now