Solved

European language characters when importing posts to Mysql

Posted on 2013-06-15
4
405 Views
Last Modified: 2013-07-01
Experts,

My programmer is having trouble handling a csv file with data for wordpress post content.
His plugin imports from the csv and inserts into Wordpress to create posts out of the data file. Some of the columns in the data file contain characters such as ä and ö which appear in e.g. the names of businesses in Germany for example. His import seems to be choking on those rows.

So for example, we get this error in Apache_error.log:

[Sat Jun 15 21:27:17 2013] [client 127.0.0.1] WordPress database error You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1 for query SELECT `image_id`, `userId`,`filename`, `uploaded` FROM `wp_agent_img` WHERE `userId` =  made by WP_List_Table->display, WP_List_Table->display_rows_or_placeholder, WP_Posts_List_Table->display_rows, WP_Posts_List_Table->_display_rows, WP_Posts_List_Table->single_row, do_action('manage_posts_custom_column'), call_user_func_array, gtre_custom_columns, agent_list, referer: http://localhost/wp-admin/admin.php?page=wp-csv-import

My questions then:

1. Can you help me make heads or tails out of the error message above? Is it likely to be caused by trying to import data with special characters or is it more likely to be something else?

2. What factors need to be lined up with each other across the environment of Wordpress, Mysql, Php, and possibly Apache to ensure that none of those settings are causing the issue with the special characters?

Thanks in advance!
0
Comment
Question by:bobinorlando
  • 2
4 Comments
 
LVL 70

Assisted Solution

by:Jason C. Levine
Jason C. Levine earned 200 total points
Comment Utility
Can you help me make heads or tails out of the error message above? Is it likely to be caused by trying to import data with special characters or is it more likely to be something else?

How big is the import file and did you take the extra step of removing all line breaks from the import file before importing it?

What factors need to be lined up with each other across the environment of Wordpress, Mysql, Php, and possibly Apache to ensure that none of those settings are causing the issue with the special characters?

Just making sure the MySQL is set to accept UTF-8 characters should be enough and I believe WordPress does that by default.  If you are really paranoid you can search and replace all of the umlaut characters with HTML entities (e.g. ö).
0
 
LVL 108

Assisted Solution

by:Ray Paseur
Ray Paseur earned 150 total points
Comment Utility
Part of this may be unique to WordPress and with Jason1178 on the case you're getting the best help available.  

Part of this may be a character-encoding issue.  It shows up in the form of things like the "ñ" being rendered in the form of ñ The A-Tilde character is one of the "signature" characters of botched UTF-8 encoding (another common signature character is the A-Ring).  To get something of a background in character encoding, please read this article.  You may find, as I did, that you need to read it more than once to absorb the information -- it's concentrated material and very worthwhile knowledge!
http://www.joelonsoftware.com/articles/Unicode.html

You may also want to read this.
https://en.wikipedia.org/wiki/UTF-8

Here is what you may be dealing with... ASCII is a one-byte character set.  Since the maximum value of a byte is 256 decimal, it follows that ASCII is not capable of rendering more than 256 charcacters.  UTF-8 is a multi-byte character set.  It can have one, two, three or four bytes in each character.  UTF-8 is capable of rendering millions of characters.

All of the characters below code point 128 decimal are the same in UTF-8 and ASCII. This covers the American English alphabet, most western punctuation and all the Arabic numbers.  Up to code point 127 you can "get away" with mixing UTF-8 and ASCII.

"Special" characters, including umlauts, accents and tildes all work the same way, and in ASCII they are all above code point 127.  You can render the n-tilde in ASCII.  It resides at code point 241 decimal.  You can also render the n-tilde in UTF-8.  In UTF-8 it takes up two bytes, 195 and 177 decimal.  

You have to choose between ASCII and UTF-8; you cannot intermix them reliably if you ever want to use any of the accented characters.  The appropriate meta charset declaration will tell the browser which encoding you want to use.

The decimal number 128 is hexadecimal 80 and is binary 1000 0000.  It is this leftmost bit that signals the UTF-8 multi-byte characters.  If the browser expects UTF-8 and sees this bit it will look at one, two or three of the following bytes to determine what character to render.  If you use any ASCII character above code point 127 (hex 7F and binary 0111 1111), but you have told the browser that it's getting UTF-8 characters, a collision will occur and the output will not be rendered correctly.  In some instances (JSON comes to mind) only UTF-8 is supported and a character encoding error will cause data loss.  If you have told the browser that it's getting ASCII characters and you give it UTF-8 multi-byte characters the browser will render things like the A-tilde followed by one, two or three additional "goofy looking" characters.

Executive summary: Use consistent encoding throughout the application.  If at all possible, lean into the direction of UTF-8, since it is becoming the overarching standard.

You can usually use PHP functions strlen() and mb_strlen() to detect byte-count values and character-count values, respectively.  If these functions give different values, you may have a multi-byte character string in play.  My guess is that you will find you have UTF-8 encoding in the data base, but you're getting ASCII data from the CSV file.
0
 
LVL 33

Accepted Solution

by:
Slick812 earned 150 total points
Comment Utility
greetings  bobinorlando, for the error message, I deal with php error messages alot, and in yours the thing to look at is =
WHERE `userId` =

if you can look at that line of code and analyze the what comes after the "WHERE `userId` =", you may can trace the error, It is likely a variable such as -
WHERE `userId` = $getID
and this variable may suppose to be a NUMBER?, but the variable is empty, OR is a string with an improper   ( , ) " '  in it or other bad SQL syntax.  If this variable value comes out of the CSV , then the parser of the CSV may be erroring out, retuning false or something.
I really do not think the problem would be related to any text character values for other languages, at least as far as your SQL syntax error message.
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
After seeing this question and a few others along the same lines, I wrote this article and provided some code examples to help move things forward.  Thanks for the points -- it's a great question and one that is becoming more and more frequent.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html

Best regards, ~Ray
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
The purpose of this video is to demonstrate how to reset a WordPress password if you are locked out and cannot reset the password. A typical use would be if you cannot access the email to which WordPress would send the password recovery email to…
The purpose of this video is to demonstrate how to set up the permalinks on a WordPress Website. This will be demonstrated using a Windows 8 PC. Go to your WordPress login page. This will look like the following: mywebsite.com/wp-login.php : Go t…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now