?
Solved

How to zap "curly quotes" or "smart quotes" from scraped page

Posted on 2004-08-01
6
Medium Priority
?
346 Views
Last Modified: 2012-06-21
I'm scraping content (news headlines) from another web site which I then feed into a MySQL database.  But the morons who create content for the site have "special quotes" or "curly quotes" generated by Microsoft products in their text.  So when a headline has an apostrophe, it shows up as a question mark on my web pages.  So instead of "can't", you see "can?t"

I ran the HTML generated by the offending page thourgh a hex editor and found the apostrophe it creates has a hex code of 92.  A real apostrophe has a hex code of 27.

I've always been confused as hell by character sets and getting around these kinds problems.  Is there a function in PHP that will solve this problem.  I tried the htmlspecialchars() function with no results.  Any tips/help would be great.
0
Comment
Question by:nysus1
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
6 Comments
 
LVL 36

Accepted Solution

by:
Zyloch earned 2000 total points
ID: 11689316
You can try htmlentities()

Regards,
${Zyloch}
0
 

Author Comment

by:nysus1
ID: 11689347
You da man!  That did it.  But why would htmlentities work and not htmspecialchars?  The PHP at manual at http://us4.php.net/htmlentities says the two functions are identical 'except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.'  Not too clear to me.

0
 
LVL 36

Expert Comment

by:Zyloch
ID: 11689387
Yes, it means with htmlspecialchars, it'll only change ",',& and some other basic chars but htmlentities changes every single one that has an equivalent char code in HTML, basically anything that appears on the ASCII chart.

Regards,
${Zyloch}
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:nysus1
ID: 11689466
OK, one more question then, if I might.  Why does htmlspecialchars need a third argument, then?  If all it is translating is &, ", ' <, and > chars, why would it need to know which character set to use in the conversion?  Wouldn't those basic characters have the same ascii code across the different character sets?
0
 
LVL 36

Expert Comment

by:Zyloch
ID: 11689717
Not necessarily. For instance, Big5 is mainly for Asian languages and may have different character codes representing different characters. Most of the time, though, you'll only be using the default.

Regards,
${Zyloch}
0
 

Author Comment

by:nysus1
ID: 11690688
OK, thanks for you help and explanation.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question