Solved

Importing MySQL File with Special Characters

Posted on 2013-06-23
8
844 Views
Last Modified: 2013-06-24
I am trying to transfer a WordPress installation to a new server and am experiencing errors bringing over the pages and posts.  Specifically, any special characters will escape the *.sql file I am trying to upload, so the full post is not transferred into the new database.  

Here is an example where the "Welcome to" is copied into the new page, but when it hits the double quotes for “Alternative Considerations of Jonestown and Peoples Temple,” it stops importing the content, and moves on:
INSERT INTO `wp_posts` (`ID`, `post_author`, `post_date`, `post_date_gmt`, `post_content`, `post_title`, `post_excerpt`, `post_status`, `comment_status`, `ping_status`, `post_password`, `post_name`, `to_ping`, `pinged`, `post_modified`, `post_modified_gmt`, `post_content_filtered`, `post_parent`, `guid`, `menu_order`, `post_type`, `post_mime_type`, `comment_count`) VALUES
(7, 1, '2013-01-05 18:33:49', '2013-01-06 02:33:49', '[caption id="attachment_69" align="alignright" width="240"]<a href="http://jonestown.sdsu.edu/wordpress/?attachment_id=69" rel="attachment wp-att-69"><img class="size-full wp-image-69" alt="Children in Peoples Temple" src="http://jonestown.sdsu.edu/wordpress/wp-content/uploads/2013/01/snapshot_children.jpg" width="240" height="213" /></a> Children in Peoples Temple[/caption]\r\n\r\nWelcome to “Alternative Considerations of Jonestown and Peoples Temple,” sponsored by the Department of Religious Studies at San Diego State University. This website is designed to give personal and scholarly perspectives on a major event in the history of religion in America. Its primary purpose is to present information about Peoples Temple as accurately and objectively as possible. In an effort to be impartial, we offer many diverse views and opinions about the Temple and the events in Jonestown.\r\n\r\nWe hope that visitors to the site will come away with an understanding that the story of Jonestown did not start or end on 18 November 1978.\r\n\r\nWhat is unique about this website are three main features:\r\n<ol>\r\n      <li>Memorialization of those who died and those who survived the tragedy of 18 November 1978 in order to remember their lives and humanize their deaths.</li>\r\n      <li>Documentation of the numerous government investigations into Peoples Temple and Jonestown through materials released under the Freedom of Information Act.</li>\r\n      <li>Presentation of Peoples Temple and its members in their own words: through articles, tapes, letters, photographs and other items. These materials let readers make their own judgments about the group and its end.</li>\r\n</ol>\r\nTape transcripts, summaries, some primary source documents, and photographs not otherwise designated as copyrighted on this site are free and available to the public for use by crediting: The Jonestown Institute, <a title="http://jonestown.sdsu.edu" href="http://jonestown.sdsu.edu">http://jonestown.sdsu.edu</a>. Other items require the author’s permission for re-publication.\r\n<table style="width: 100%; background-color: #f2e4c9; box-shadow: 0 0 3px rgba(0, 0, 0, 0.1);" border="0" cellspacing="0" cellpadding="0">\r\n<tbody>\r\n<tr>\r\n<td style="width: 22%;" valign="top"><a href="mailto:remoore@mail.sdsu.edu">Rebecca Moore</a>\r\nSite Manager</td>\r\n<td style="width: 28%;" valign="top"><a href="mailto:fieldingmcgehee@yahoo.com">Fielding M. McGehee III</a>\r\nResearch Director</td>\r\n<td style="width: 22%;" valign="top"><a href="mailto:ekshaules@gmail.com"> Elizabeth Shaules</a>\r\nSite Designer</td>\r\n<td style="width: 28%;" valign="top"><a href="mailto:Rikke@Wettendorff.net"><span>Rikke Wettendorff</span></a>\r\n<span class="MsoNormal">Peoples Temple\r\nDiscussion Forum</span></td>\r\n</tr>\r\n</tbody>\r\n</table>', 'Welcome!', '', 'publish', 'open', 'open', '', 'welcome', '', '', '2013-01-11 21:48:00', '2013-01-12 05:48:00', '', 0, 'http://jonestown.sdsu.edu/wordpress/?page_id=7', 0, 'page', '', 0),

How can I get around this?  

Thanks in advance!
0
Comment
Question by:maeve100
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 15

Expert Comment

by:gplana
ID: 39269045
Have you identified which are the special characters that aren't recognized well? Maybe it's a problem of character encoding? See if your file is using ASCII and your database is using UTF-8, for example.
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 39269194
The SQL above works just fine on a Wordpress import. In my test the table type was UTF8_general_ci however the SQL did fail until I removed the trailing comma

', 0, 'page', '', 0),  

changed to

', 0, 'page', '', 0)

(Last comma removed)
0
 
LVL 15

Expert Comment

by:gplana
ID: 39269207
You should precede any single quote (') with a back-slash (\')

So this snipped:
               ', 0, 'page', '', 0),  
should be:
               \', 0, \'page\', \'\', 0),
0
Learn how to optimize MySQL for your business need

With the increasing importance of apps & networks in both business & personal interconnections, perfor. has become one of the key metrics of successful communication. This ebook is a hands-on business-case-driven guide to understanding MySQL query parameter tuning & database perf

 
LVL 110

Accepted Solution

by:
Ray Paseur earned 400 total points
ID: 39269429
ASCII is a subset of UTF-8, and no UTF-8 multibyte character appears in ASCII, so by definition a collision cannot occur between these encodings.  However Windows-1252 defines characters in the "dead zones" of both UTF-8 and ISO-8859-x.  The quote mark immediately before "Alternative" is hex 93, decimal 147 and the trailing quote mark is hex 94, decimal code point 148.  These are invalid codes unless your entire process is using Windows-1252 encoding.  See these links:
http://www.laprbass.com/RAY_temp_maeve100.php?charset=windows-1252 WORKS
http://www.laprbass.com/RAY_temp_maeve100.php?charset=utf-8 FAILS
http://www.laprbass.com/RAY_temp_maeve100.php?charset=ISO-8859-1 WORKS IN SOME BROWSERS

PHP internal encoding, absent any deliberate changes, is ISO-8859-1.  So my conclusion is that there is a character-encoding error somewhere along the line.  You might want to get the background knowledge from this article.
http://www.joelonsoftware.com/articles/Unicode.html

Check the original installation to see what character set was used in each of the HTML, the PHP, the MySQL.  If you use those same encodings consistently in the new installation you should be OK.

If you want to experiment with different encodings, this script will help.

<?php // RAY_temp_maeve100.php
error_reporting(E_ALL);


// SEE http://www.joelonsoftware.com/articles/Unicode.html


// CHOOSE A CHARSET VALUE FROM THE URL ARGUMENT utf-8, windows-1252, iso-8859-1, iso-8859-15, etc.
$charset = isset($_GET['charset']) ? $_GET['charset'] : 'windows-1252';

// SET THE INTERNAL ENCODING
mb_internal_encoding($charset);


// START WITH HTML5 DOCTYPE
$html5 = <<<ENDHTML5
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta charset="$charset" />
<title>CHARACTER SET $charset</title>
</head>
<body>
<pre>
THE ORIGINAL CHARACTER SET IS <b>$charset</b>

Welcome to “Alternative Considerations of Jonestown and Peoples Temple,” sponsored by 

ENDHTML5;

echo $html5;

Open in new window

HTH, ~Ray
0
 
LVL 70

Assisted Solution

by:Jason C. Levine
Jason C. Levine earned 100 total points
ID: 39269573
The "fancy quotes" can cause problems on import.  I always suggest doing a search and replace on those from their normal state to the HTML entity version.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39269632
This will replace the angle-quotes with numeric entities.  Use "View Source" to see the results.

<?php // RAY_temp_maeve100.php
error_reporting(E_ALL);


// THE STRING WITH MICROSOFT CHARACTERS
$str = 'Welcome to “Alternative Considerations of Jonestown and Peoples Temple,” sponsored by';

// TRANSFORM THE QUOTES
$arr = array
( '“' => '&#147;'
, '”' => '&#148;'
)
;
$new = strtr($str, $arr);

// SHOW THE NEW VALUES
echo $new;

Open in new window

HTH, ~Ray
0
 

Author Closing Comment

by:maeve100
ID: 39272541
Unfortunately, I ended up having to comb the SQL files and replace all the special characters with their HTML counterpart.  I'm not sure why the import is so sensitive that it can't compensate for things like this - even an accented letter would break it.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39273034
Yeah, it's a nasty problem once the data base gets polluted!
0

Featured Post

Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article discusses four methods for overlaying images in a container on a web page
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
The purpose of this video is to demonstrate how to manually back up a WordPress Database. This will be demonstrated using a Windows 8 PC. The Host used will be IPage.com Log into your Hosting account. IPage will be used for demonstration : Locat…
The purpose of this video is to demonstrate how to Import and export files in WordPress. This will be demonstrated using a Windows 8 PC. Go to your WordPress login page. This will look like the following: mywebsite.com/wp-login.php : Click on Too…

635 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question