Solved

Importing MySQL File with Special Characters

Posted on 2013-06-23
8
836 Views
Last Modified: 2013-06-24
I am trying to transfer a WordPress installation to a new server and am experiencing errors bringing over the pages and posts.  Specifically, any special characters will escape the *.sql file I am trying to upload, so the full post is not transferred into the new database.  

Here is an example where the "Welcome to" is copied into the new page, but when it hits the double quotes for “Alternative Considerations of Jonestown and Peoples Temple,” it stops importing the content, and moves on:
INSERT INTO `wp_posts` (`ID`, `post_author`, `post_date`, `post_date_gmt`, `post_content`, `post_title`, `post_excerpt`, `post_status`, `comment_status`, `ping_status`, `post_password`, `post_name`, `to_ping`, `pinged`, `post_modified`, `post_modified_gmt`, `post_content_filtered`, `post_parent`, `guid`, `menu_order`, `post_type`, `post_mime_type`, `comment_count`) VALUES
(7, 1, '2013-01-05 18:33:49', '2013-01-06 02:33:49', '[caption id="attachment_69" align="alignright" width="240"]<a href="http://jonestown.sdsu.edu/wordpress/?attachment_id=69" rel="attachment wp-att-69"><img class="size-full wp-image-69" alt="Children in Peoples Temple" src="http://jonestown.sdsu.edu/wordpress/wp-content/uploads/2013/01/snapshot_children.jpg" width="240" height="213" /></a> Children in Peoples Temple[/caption]\r\n\r\nWelcome to “Alternative Considerations of Jonestown and Peoples Temple,” sponsored by the Department of Religious Studies at San Diego State University. This website is designed to give personal and scholarly perspectives on a major event in the history of religion in America. Its primary purpose is to present information about Peoples Temple as accurately and objectively as possible. In an effort to be impartial, we offer many diverse views and opinions about the Temple and the events in Jonestown.\r\n\r\nWe hope that visitors to the site will come away with an understanding that the story of Jonestown did not start or end on 18 November 1978.\r\n\r\nWhat is unique about this website are three main features:\r\n<ol>\r\n      <li>Memorialization of those who died and those who survived the tragedy of 18 November 1978 in order to remember their lives and humanize their deaths.</li>\r\n      <li>Documentation of the numerous government investigations into Peoples Temple and Jonestown through materials released under the Freedom of Information Act.</li>\r\n      <li>Presentation of Peoples Temple and its members in their own words: through articles, tapes, letters, photographs and other items. These materials let readers make their own judgments about the group and its end.</li>\r\n</ol>\r\nTape transcripts, summaries, some primary source documents, and photographs not otherwise designated as copyrighted on this site are free and available to the public for use by crediting: The Jonestown Institute, <a title="http://jonestown.sdsu.edu" href="http://jonestown.sdsu.edu">http://jonestown.sdsu.edu</a>. Other items require the author’s permission for re-publication.\r\n<table style="width: 100%; background-color: #f2e4c9; box-shadow: 0 0 3px rgba(0, 0, 0, 0.1);" border="0" cellspacing="0" cellpadding="0">\r\n<tbody>\r\n<tr>\r\n<td style="width: 22%;" valign="top"><a href="mailto:remoore@mail.sdsu.edu">Rebecca Moore</a>\r\nSite Manager</td>\r\n<td style="width: 28%;" valign="top"><a href="mailto:fieldingmcgehee@yahoo.com">Fielding M. McGehee III</a>\r\nResearch Director</td>\r\n<td style="width: 22%;" valign="top"><a href="mailto:ekshaules@gmail.com"> Elizabeth Shaules</a>\r\nSite Designer</td>\r\n<td style="width: 28%;" valign="top"><a href="mailto:Rikke@Wettendorff.net"><span>Rikke Wettendorff</span></a>\r\n<span class="MsoNormal">Peoples Temple\r\nDiscussion Forum</span></td>\r\n</tr>\r\n</tbody>\r\n</table>', 'Welcome!', '', 'publish', 'open', 'open', '', 'welcome', '', '', '2013-01-11 21:48:00', '2013-01-12 05:48:00', '', 0, 'http://jonestown.sdsu.edu/wordpress/?page_id=7', 0, 'page', '', 0),

How can I get around this?  

Thanks in advance!
0
Comment
Question by:maeve100
8 Comments
 
LVL 15

Expert Comment

by:gplana
Comment Utility
Have you identified which are the special characters that aren't recognized well? Maybe it's a problem of character encoding? See if your file is using ASCII and your database is using UTF-8, for example.
0
 
LVL 34

Expert Comment

by:Beverley Portlock
Comment Utility
The SQL above works just fine on a Wordpress import. In my test the table type was UTF8_general_ci however the SQL did fail until I removed the trailing comma

', 0, 'page', '', 0),  

changed to

', 0, 'page', '', 0)

(Last comma removed)
0
 
LVL 15

Expert Comment

by:gplana
Comment Utility
You should precede any single quote (') with a back-slash (\')

So this snipped:
               ', 0, 'page', '', 0),  
should be:
               \', 0, \'page\', \'\', 0),
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 400 total points
Comment Utility
ASCII is a subset of UTF-8, and no UTF-8 multibyte character appears in ASCII, so by definition a collision cannot occur between these encodings.  However Windows-1252 defines characters in the "dead zones" of both UTF-8 and ISO-8859-x.  The quote mark immediately before "Alternative" is hex 93, decimal 147 and the trailing quote mark is hex 94, decimal code point 148.  These are invalid codes unless your entire process is using Windows-1252 encoding.  See these links:
http://www.laprbass.com/RAY_temp_maeve100.php?charset=windows-1252 WORKS
http://www.laprbass.com/RAY_temp_maeve100.php?charset=utf-8 FAILS
http://www.laprbass.com/RAY_temp_maeve100.php?charset=ISO-8859-1 WORKS IN SOME BROWSERS

PHP internal encoding, absent any deliberate changes, is ISO-8859-1.  So my conclusion is that there is a character-encoding error somewhere along the line.  You might want to get the background knowledge from this article.
http://www.joelonsoftware.com/articles/Unicode.html

Check the original installation to see what character set was used in each of the HTML, the PHP, the MySQL.  If you use those same encodings consistently in the new installation you should be OK.

If you want to experiment with different encodings, this script will help.

<?php // RAY_temp_maeve100.php
error_reporting(E_ALL);


// SEE http://www.joelonsoftware.com/articles/Unicode.html


// CHOOSE A CHARSET VALUE FROM THE URL ARGUMENT utf-8, windows-1252, iso-8859-1, iso-8859-15, etc.
$charset = isset($_GET['charset']) ? $_GET['charset'] : 'windows-1252';

// SET THE INTERNAL ENCODING
mb_internal_encoding($charset);


// START WITH HTML5 DOCTYPE
$html5 = <<<ENDHTML5
<!DOCTYPE html>
<html dir="ltr" lang="en-US">
<head>
<meta charset="$charset" />
<title>CHARACTER SET $charset</title>
</head>
<body>
<pre>
THE ORIGINAL CHARACTER SET IS <b>$charset</b>

Welcome to “Alternative Considerations of Jonestown and Peoples Temple,” sponsored by 

ENDHTML5;

echo $html5;

Open in new window

HTH, ~Ray
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 70

Assisted Solution

by:Jason C. Levine
Jason C. Levine earned 100 total points
Comment Utility
The "fancy quotes" can cause problems on import.  I always suggest doing a search and replace on those from their normal state to the HTML entity version.
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
This will replace the angle-quotes with numeric entities.  Use "View Source" to see the results.

<?php // RAY_temp_maeve100.php
error_reporting(E_ALL);


// THE STRING WITH MICROSOFT CHARACTERS
$str = 'Welcome to “Alternative Considerations of Jonestown and Peoples Temple,” sponsored by';

// TRANSFORM THE QUOTES
$arr = array
( '“' => '&#147;'
, '”' => '&#148;'
)
;
$new = strtr($str, $arr);

// SHOW THE NEW VALUES
echo $new;

Open in new window

HTH, ~Ray
0
 

Author Closing Comment

by:maeve100
Comment Utility
Unfortunately, I ended up having to comb the SQL files and replace all the special characters with their HTML counterpart.  I'm not sure why the import is so sensitive that it can't compensate for things like this - even an accented letter would break it.
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Yeah, it's a nasty problem once the data base gets polluted!
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Foreword (July, 2015) Since I first wrote this article, years ago, a great many more people have begun using the internet.  They are coming online from every part of the globe, learning, reading, shopping and spending money at an ever-increasing ra…
Introduction Since I wrote the original article about Handling Date and Time in PHP and MySQL (http://www.experts-exchange.com/articles/201/Handling-Date-and-Time-in-PHP-and-MySQL.html) several years ago, it seemed like now was a good time to updat…
The purpose of this video is to demonstrate how to manually back up a WordPress Database. This will be demonstrated using a Windows 8 PC. The Host used will be IPage.com Log into your Hosting account. IPage will be used for demonstration : Locat…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now