Solved

Importing MySQL File with Special Characters

Posted on 2013-06-23
8
841 Views
Last Modified: 2013-06-24
I am trying to transfer a WordPress installation to a new server and am experiencing errors bringing over the pages and posts.  Specifically, any special characters will escape the *.sql file I am trying to upload, so the full post is not transferred into the new database.  

Here is an example where the "Welcome to" is copied into the new page, but when it hits the double quotes for “Alternative Considerations of Jonestown and Peoples Temple,” it stops importing the content, and moves on:
INSERT INTO `wp_posts` (`ID`, `post_author`, `post_date`, `post_date_gmt`, `post_content`, `post_title`, `post_excerpt`, `post_status`, `comment_status`, `ping_status`, `post_password`, `post_name`, `to_ping`, `pinged`, `post_modified`, `post_modified_gmt`, `post_content_filtered`, `post_parent`, `guid`, `menu_order`, `post_type`, `post_mime_type`, `comment_count`) VALUES
(7, 1, '2013-01-05 18:33:49', '2013-01-06 02:33:49', '[caption id="attachment_69" align="alignright" width="240"]<a href="http://jonestown.sdsu.edu/wordpress/?attachment_id=69" rel="attachment wp-att-69"><img class="size-full wp-image-69" alt="Children in Peoples Temple" src="http://jonestown.sdsu.edu/wordpress/wp-content/uploads/2013/01/snapshot_children.jpg" width="240" height="213" /></a> Children in Peoples Temple[/caption]\r\n\r\nWelcome to “Alternative Considerations of Jonestown and Peoples Temple,” sponsored by the Department of Religious Studies at San Diego State University. This website is designed to give personal and scholarly perspectives on a major event in the history of religion in America. Its primary purpose is to present information about Peoples Temple as accurately and objectively as possible. In an effort to be impartial, we offer many diverse views and opinions about the Temple and the events in Jonestown.\r\n\r\nWe hope that visitors to the site will come away with an understanding that the story of Jonestown did not start or end on 18 November 1978.\r\n\r\nWhat is unique about this website are three main features:\r\n<ol>\r\n      <li>Memorialization of those who died and those who survived the tragedy of 18 November 1978 in order to remember their lives and humanize their deaths.</li>\r\n      <li>Documentation of the numerous government investigations into Peoples Temple and Jonestown through materials released under the Freedom of Information Act.</li>\r\n      <li>Presentation of Peoples Temple and its members in their own words: through articles, tapes, letters, photographs and other items. These materials let readers make their own judgments about the group and its end.</li>\r\n</ol>\r\nTape transcripts, summaries, some primary source documents, and photographs not otherwise designated as copyrighted on this site are free and available to the public for use by crediting: The Jonestown Institute, <a title="http://jonestown.sdsu.edu" href="http://jonestown.sdsu.edu">http://jonestown.sdsu.edu</a>. Other items require the author’s permission for re-publication.\r\n<table style="width: 100%; background-color: #f2e4c9; box-shadow: 0 0 3px rgba(0, 0, 0, 0.1);" border="0" cellspacing="0" cellpadding="0">\r\n<tbody>\r\n<tr>\r\n<td style="width: 22%;" valign="top"><a href="mailto:remoore@mail.sdsu.edu">Rebecca Moore</a>\r\nSite Manager</td>\r\n<td style="width: 28%;" valign="top"><a href="mailto:fieldingmcgehee@yahoo.com">Fielding M. McGehee III</a>\r\nResearch Director</td>\r\n<td style="width: 22%;" valign="top"><a href="mailto:ekshaules@gmail.com"> Elizabeth Shaules</a>\r\nSite Designer</td>\r\n<td style="width: 28%;" valign="top"><a href="mailto:Rikke@Wettendorff.net"><span>Rikke Wettendorff</span></a>\r\n<span class="MsoNormal">Peoples Temple\r\nDiscussion Forum</span></td>\r\n</tr>\r\n</tbody>\r\n</table>', 'Welcome!', '', 'publish', 'open', 'open', '', 'welcome', '', '', '2013-01-11 21:48:00', '2013-01-12 05:48:00', '', 0, 'http://jonestown.sdsu.edu/wordpress/?page_id=7', 0, 'page', '', 0),

How can I get around this?  

Thanks in advance!
0
Comment
Question by:maeve100
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 15

Expert Comment

by:gplana
ID: 39269045
Have you identified which are the special characters that aren't recognized well? Maybe it's a problem of character encoding? See if your file is using ASCII and your database is using UTF-8, for example.
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 39269194
The SQL above works just fine on a Wordpress import. In my test the table type was UTF8_general_ci however the SQL did fail until I removed the trailing comma

', 0, 'page', '', 0),  

changed to

', 0, 'page', '', 0)

(Last comma removed)
0
 
LVL 15

Expert Comment

by:gplana
ID: 39269207
You should precede any single quote (') with a back-slash (\')

So this snipped:
               ', 0, 'page', '', 0),  
should be:
               \', 0, \'page\', \'\', 0),
0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 
LVL 110

Accepted Solution

by:
Ray Paseur earned 400 total points
ID: 39269429
ASCII is a subset of UTF-8, and no UTF-8 multibyte character appears in ASCII, so by definition a collision cannot occur between these encodings.  However Windows-1252 defines characters in the "dead zones" of both UTF-8 and ISO-8859-x.  The quote mark immediately before "Alternative" is hex 93, decimal 147 and the trailing quote mark is hex 94, decimal code point 148.  These are invalid codes unless your entire process is using Windows-1252 encoding.  See these links:
http://www.laprbass.com/RAY_temp_maeve100.php?charset=windows-1252 WORKS
http://www.laprbass.com/RAY_temp_maeve100.php?charset=utf-8 FAILS
http://www.laprbass.com/RAY_temp_maeve100.php?charset=ISO-8859-1 WORKS IN SOME BROWSERS

PHP internal encoding, absent any deliberate changes, is ISO-8859-1.  So my conclusion is that there is a character-encoding error somewhere along the line.  You might want to get the background knowledge from this article.
http://www.joelonsoftware.com/articles/Unicode.html

Check the original installation to see what character set was used in each of the HTML, the PHP, the MySQL.  If you use those same encodings consistently in the new installation you should be OK.

If you want to experiment with different encodings, this script will help.

<?php // RAY_temp_maeve100.php
error_reporting(E_ALL);


// SEE http://www.joelonsoftware.com/articles/Unicode.html


// CHOOSE A CHARSET VALUE FROM THE URL ARGUMENT utf-8, windows-1252, iso-8859-1, iso-8859-15, etc.
$charset = isset($_GET['charset']) ? $_GET['charset'] : 'windows-1252';

// SET THE INTERNAL ENCODING
mb_internal_encoding($charset);


// START WITH HTML5 DOCTYPE
$html5 = <<<ENDHTML5
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta charset="$charset" />
<title>CHARACTER SET $charset</title>
</head>
<body>
<pre>
THE ORIGINAL CHARACTER SET IS <b>$charset</b>

Welcome to “Alternative Considerations of Jonestown and Peoples Temple,” sponsored by 

ENDHTML5;

echo $html5;

Open in new window

HTH, ~Ray
0
 
LVL 70

Assisted Solution

by:Jason C. Levine
Jason C. Levine earned 100 total points
ID: 39269573
The "fancy quotes" can cause problems on import.  I always suggest doing a search and replace on those from their normal state to the HTML entity version.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39269632
This will replace the angle-quotes with numeric entities.  Use "View Source" to see the results.

<?php // RAY_temp_maeve100.php
error_reporting(E_ALL);


// THE STRING WITH MICROSOFT CHARACTERS
$str = 'Welcome to “Alternative Considerations of Jonestown and Peoples Temple,” sponsored by';

// TRANSFORM THE QUOTES
$arr = array
( '“' => '&#147;'
, '”' => '&#148;'
)
;
$new = strtr($str, $arr);

// SHOW THE NEW VALUES
echo $new;

Open in new window

HTH, ~Ray
0
 

Author Closing Comment

by:maeve100
ID: 39272541
Unfortunately, I ended up having to comb the SQL files and replace all the special characters with their HTML counterpart.  I'm not sure why the import is so sensitive that it can't compensate for things like this - even an accented letter would break it.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39273034
Yeah, it's a nasty problem once the data base gets polluted!
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article discusses four methods for overlaying images in a container on a web page
When table data gets too large to manage or queries take too long to execute the solution is often to buy bigger hardware or assign more CPUs and memory resources to the machine to solve the problem. However, the best, cheapest and most effective so…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question