Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

European language characters when importing posts to Mysql

Posted on 2013-06-15
4
Medium Priority
?
414 Views
Last Modified: 2013-07-01
Experts,

My programmer is having trouble handling a csv file with data for wordpress post content.
His plugin imports from the csv and inserts into Wordpress to create posts out of the data file. Some of the columns in the data file contain characters such as ä and ö which appear in e.g. the names of businesses in Germany for example. His import seems to be choking on those rows.

So for example, we get this error in Apache_error.log:

[Sat Jun 15 21:27:17 2013] [client 127.0.0.1] WordPress database error You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1 for query SELECT `image_id`, `userId`,`filename`, `uploaded` FROM `wp_agent_img` WHERE `userId` =  made by WP_List_Table->display, WP_List_Table->display_rows_or_placeholder, WP_Posts_List_Table->display_rows, WP_Posts_List_Table->_display_rows, WP_Posts_List_Table->single_row, do_action('manage_posts_custom_column'), call_user_func_array, gtre_custom_columns, agent_list, referer: http://localhost/wp-admin/admin.php?page=wp-csv-import

My questions then:

1. Can you help me make heads or tails out of the error message above? Is it likely to be caused by trying to import data with special characters or is it more likely to be something else?

2. What factors need to be lined up with each other across the environment of Wordpress, Mysql, Php, and possibly Apache to ensure that none of those settings are causing the issue with the special characters?

Thanks in advance!
0
Comment
Question by:bobinorlando
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 70

Assisted Solution

by:Jason C. Levine
Jason C. Levine earned 800 total points
ID: 39250857
Can you help me make heads or tails out of the error message above? Is it likely to be caused by trying to import data with special characters or is it more likely to be something else?

How big is the import file and did you take the extra step of removing all line breaks from the import file before importing it?

What factors need to be lined up with each other across the environment of Wordpress, Mysql, Php, and possibly Apache to ensure that none of those settings are causing the issue with the special characters?

Just making sure the MySQL is set to accept UTF-8 characters should be enough and I believe WordPress does that by default.  If you are really paranoid you can search and replace all of the umlaut characters with HTML entities (e.g. ö).
0
 
LVL 111

Assisted Solution

by:Ray Paseur
Ray Paseur earned 600 total points
ID: 39251243
Part of this may be unique to WordPress and with Jason1178 on the case you're getting the best help available.  

Part of this may be a character-encoding issue.  It shows up in the form of things like the "ñ" being rendered in the form of ñ The A-Tilde character is one of the "signature" characters of botched UTF-8 encoding (another common signature character is the A-Ring).  To get something of a background in character encoding, please read this article.  You may find, as I did, that you need to read it more than once to absorb the information -- it's concentrated material and very worthwhile knowledge!
http://www.joelonsoftware.com/articles/Unicode.html

You may also want to read this.
https://en.wikipedia.org/wiki/UTF-8

Here is what you may be dealing with... ASCII is a one-byte character set.  Since the maximum value of a byte is 256 decimal, it follows that ASCII is not capable of rendering more than 256 charcacters.  UTF-8 is a multi-byte character set.  It can have one, two, three or four bytes in each character.  UTF-8 is capable of rendering millions of characters.

All of the characters below code point 128 decimal are the same in UTF-8 and ASCII. This covers the American English alphabet, most western punctuation and all the Arabic numbers.  Up to code point 127 you can "get away" with mixing UTF-8 and ASCII.

"Special" characters, including umlauts, accents and tildes all work the same way, and in ASCII they are all above code point 127.  You can render the n-tilde in ASCII.  It resides at code point 241 decimal.  You can also render the n-tilde in UTF-8.  In UTF-8 it takes up two bytes, 195 and 177 decimal.  

You have to choose between ASCII and UTF-8; you cannot intermix them reliably if you ever want to use any of the accented characters.  The appropriate meta charset declaration will tell the browser which encoding you want to use.

The decimal number 128 is hexadecimal 80 and is binary 1000 0000.  It is this leftmost bit that signals the UTF-8 multi-byte characters.  If the browser expects UTF-8 and sees this bit it will look at one, two or three of the following bytes to determine what character to render.  If you use any ASCII character above code point 127 (hex 7F and binary 0111 1111), but you have told the browser that it's getting UTF-8 characters, a collision will occur and the output will not be rendered correctly.  In some instances (JSON comes to mind) only UTF-8 is supported and a character encoding error will cause data loss.  If you have told the browser that it's getting ASCII characters and you give it UTF-8 multi-byte characters the browser will render things like the A-tilde followed by one, two or three additional "goofy looking" characters.

Executive summary: Use consistent encoding throughout the application.  If at all possible, lean into the direction of UTF-8, since it is becoming the overarching standard.

You can usually use PHP functions strlen() and mb_strlen() to detect byte-count values and character-count values, respectively.  If these functions give different values, you may have a multi-byte character string in play.  My guess is that you will find you have UTF-8 encoding in the data base, but you're getting ASCII data from the CSV file.
0
 
LVL 34

Accepted Solution

by:
Slick812 earned 600 total points
ID: 39251528
greetings  bobinorlando, for the error message, I deal with php error messages alot, and in yours the thing to look at is =
WHERE `userId` =

if you can look at that line of code and analyze the what comes after the "WHERE `userId` =", you may can trace the error, It is likely a variable such as -
WHERE `userId` = $getID
and this variable may suppose to be a NUMBER?, but the variable is empty, OR is a string with an improper   ( , ) " '  in it or other bad SQL syntax.  If this variable value comes out of the CSV , then the parser of the CSV may be erroring out, retuning false or something.
I really do not think the problem would be related to any text character values for other languages, at least as far as your SQL syntax error message.
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 39291922
After seeing this question and a few others along the same lines, I wrote this article and provided some code examples to help move things forward.  Thanks for the points -- it's a great question and one that is becoming more and more frequent.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html

Best regards, ~Ray
0

Featured Post

NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows the steps required to install WordPress on Azure. Web Apps, Mobile Apps, API Apps, or Functions, in Azure all these run in an App Service plan. WordPress is no exception and requires an App Service Plan and Database to install
This article discusses how to implement server side field validation and display customized error messages to the client.
The purpose of this video is to demonstrate how to prevent comment spam on a WordPress Website. This will be demonstrated using a Windows 8 PC. Plugin Akismet will be used. Go to your WordPress login page. This will look like the following: myw…
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
Suggested Courses

604 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question