European language characters when importing posts to Mysql

Posted on 2013-06-15
Medium Priority
Last Modified: 2013-07-01

My programmer is having trouble handling a csv file with data for wordpress post content.
His plugin imports from the csv and inserts into Wordpress to create posts out of the data file. Some of the columns in the data file contain characters such as ä and ö which appear in e.g. the names of businesses in Germany for example. His import seems to be choking on those rows.

So for example, we get this error in Apache_error.log:

[Sat Jun 15 21:27:17 2013] [client] WordPress database error You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1 for query SELECT `image_id`, `userId`,`filename`, `uploaded` FROM `wp_agent_img` WHERE `userId` =  made by WP_List_Table->display, WP_List_Table->display_rows_or_placeholder, WP_Posts_List_Table->display_rows, WP_Posts_List_Table->_display_rows, WP_Posts_List_Table->single_row, do_action('manage_posts_custom_column'), call_user_func_array, gtre_custom_columns, agent_list, referer: http://localhost/wp-admin/admin.php?page=wp-csv-import

My questions then:

1. Can you help me make heads or tails out of the error message above? Is it likely to be caused by trying to import data with special characters or is it more likely to be something else?

2. What factors need to be lined up with each other across the environment of Wordpress, Mysql, Php, and possibly Apache to ensure that none of those settings are causing the issue with the special characters?

Thanks in advance!
Question by:bobinorlando
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
LVL 70

Assisted Solution

by:Jason C. Levine
Jason C. Levine earned 800 total points
ID: 39250857
Can you help me make heads or tails out of the error message above? Is it likely to be caused by trying to import data with special characters or is it more likely to be something else?

How big is the import file and did you take the extra step of removing all line breaks from the import file before importing it?

What factors need to be lined up with each other across the environment of Wordpress, Mysql, Php, and possibly Apache to ensure that none of those settings are causing the issue with the special characters?

Just making sure the MySQL is set to accept UTF-8 characters should be enough and I believe WordPress does that by default.  If you are really paranoid you can search and replace all of the umlaut characters with HTML entities (e.g. ö).
LVL 111

Assisted Solution

by:Ray Paseur
Ray Paseur earned 600 total points
ID: 39251243
Part of this may be unique to WordPress and with Jason1178 on the case you're getting the best help available.  

Part of this may be a character-encoding issue.  It shows up in the form of things like the "ñ" being rendered in the form of ñ The A-Tilde character is one of the "signature" characters of botched UTF-8 encoding (another common signature character is the A-Ring).  To get something of a background in character encoding, please read this article.  You may find, as I did, that you need to read it more than once to absorb the information -- it's concentrated material and very worthwhile knowledge!

You may also want to read this.

Here is what you may be dealing with... ASCII is a one-byte character set.  Since the maximum value of a byte is 256 decimal, it follows that ASCII is not capable of rendering more than 256 charcacters.  UTF-8 is a multi-byte character set.  It can have one, two, three or four bytes in each character.  UTF-8 is capable of rendering millions of characters.

All of the characters below code point 128 decimal are the same in UTF-8 and ASCII. This covers the American English alphabet, most western punctuation and all the Arabic numbers.  Up to code point 127 you can "get away" with mixing UTF-8 and ASCII.

"Special" characters, including umlauts, accents and tildes all work the same way, and in ASCII they are all above code point 127.  You can render the n-tilde in ASCII.  It resides at code point 241 decimal.  You can also render the n-tilde in UTF-8.  In UTF-8 it takes up two bytes, 195 and 177 decimal.  

You have to choose between ASCII and UTF-8; you cannot intermix them reliably if you ever want to use any of the accented characters.  The appropriate meta charset declaration will tell the browser which encoding you want to use.

The decimal number 128 is hexadecimal 80 and is binary 1000 0000.  It is this leftmost bit that signals the UTF-8 multi-byte characters.  If the browser expects UTF-8 and sees this bit it will look at one, two or three of the following bytes to determine what character to render.  If you use any ASCII character above code point 127 (hex 7F and binary 0111 1111), but you have told the browser that it's getting UTF-8 characters, a collision will occur and the output will not be rendered correctly.  In some instances (JSON comes to mind) only UTF-8 is supported and a character encoding error will cause data loss.  If you have told the browser that it's getting ASCII characters and you give it UTF-8 multi-byte characters the browser will render things like the A-tilde followed by one, two or three additional "goofy looking" characters.

Executive summary: Use consistent encoding throughout the application.  If at all possible, lean into the direction of UTF-8, since it is becoming the overarching standard.

You can usually use PHP functions strlen() and mb_strlen() to detect byte-count values and character-count values, respectively.  If these functions give different values, you may have a multi-byte character string in play.  My guess is that you will find you have UTF-8 encoding in the data base, but you're getting ASCII data from the CSV file.
LVL 34

Accepted Solution

Slick812 earned 600 total points
ID: 39251528
greetings  bobinorlando, for the error message, I deal with php error messages alot, and in yours the thing to look at is =
WHERE `userId` =

if you can look at that line of code and analyze the what comes after the "WHERE `userId` =", you may can trace the error, It is likely a variable such as -
WHERE `userId` = $getID
and this variable may suppose to be a NUMBER?, but the variable is empty, OR is a string with an improper   ( , ) " '  in it or other bad SQL syntax.  If this variable value comes out of the CSV , then the parser of the CSV may be erroring out, retuning false or something.
I really do not think the problem would be related to any text character values for other languages, at least as far as your SQL syntax error message.
LVL 111

Expert Comment

by:Ray Paseur
ID: 39291922
After seeing this question and a few others along the same lines, I wrote this article and provided some code examples to help move things forward.  Thanks for the points -- it's a great question and one that is becoming more and more frequent.

Best regards, ~Ray

Featured Post

Get your Disaster Recovery as a Service basics

Disaster Recovery as a Service is one go-to solution that revolutionizes DR planning. Implementing DRaaS could be an efficient process, easily accessible to non-DR experts. Learn about monitoring, testing, executing failovers and failbacks to ensure a "healthy" DR environment.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
In this blog post, we’ll look at how using thread_statistics can cause high memory usage.
The purpose of this video is to demonstrate how to set up an RSS Feed on a WordPress Website. This will be demonstrated using a Windows 8 PC. Feedburner will be used for this demonstration. Go to your WordPress login page. This will look like the…
The is a quite short video tutorial. In this video, I'm going to show you how to create self-host WordPress blog with free hosting service.
Suggested Courses

800 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question