Solved

How to zap "curly quotes" or "smart quotes" from scraped page

Posted on 2004-08-01
6
345 Views
Last Modified: 2012-06-21
I'm scraping content (news headlines) from another web site which I then feed into a MySQL database.  But the morons who create content for the site have "special quotes" or "curly quotes" generated by Microsoft products in their text.  So when a headline has an apostrophe, it shows up as a question mark on my web pages.  So instead of "can't", you see "can?t"

I ran the HTML generated by the offending page thourgh a hex editor and found the apostrophe it creates has a hex code of 92.  A real apostrophe has a hex code of 27.

I've always been confused as hell by character sets and getting around these kinds problems.  Is there a function in PHP that will solve this problem.  I tried the htmlspecialchars() function with no results.  Any tips/help would be great.
0
Comment
Question by:nysus1
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
6 Comments
 
LVL 36

Accepted Solution

by:
Zyloch earned 500 total points
ID: 11689316
You can try htmlentities()

Regards,
${Zyloch}
0
 

Author Comment

by:nysus1
ID: 11689347
You da man!  That did it.  But why would htmlentities work and not htmspecialchars?  The PHP at manual at http://us4.php.net/htmlentities says the two functions are identical 'except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.'  Not too clear to me.

0
 
LVL 36

Expert Comment

by:Zyloch
ID: 11689387
Yes, it means with htmlspecialchars, it'll only change ",',& and some other basic chars but htmlentities changes every single one that has an equivalent char code in HTML, basically anything that appears on the ASCII chart.

Regards,
${Zyloch}
0
Secure Your WordPress Site: 5 Essential Approaches

WordPress is the web's most popular CMS, but its dominance also makes it a target for attackers. Our eBook will show you how to:

Prevent costly exploits of core and plugin vulnerabilities
Repel automated attacks
Lock down your dashboard, secure your code, and protect your users

 

Author Comment

by:nysus1
ID: 11689466
OK, one more question then, if I might.  Why does htmlspecialchars need a third argument, then?  If all it is translating is &, ", ' <, and > chars, why would it need to know which character set to use in the conversion?  Wouldn't those basic characters have the same ascii code across the different character sets?
0
 
LVL 36

Expert Comment

by:Zyloch
ID: 11689717
Not necessarily. For instance, Big5 is mainly for Asian languages and may have different character codes representing different characters. Most of the time, though, you'll only be using the default.

Regards,
${Zyloch}
0
 

Author Comment

by:nysus1
ID: 11690688
OK, thanks for you help and explanation.
0

Featured Post

Creating Instructional Tutorials  

For Any Use & On Any Platform

Contextual Guidance at the moment of need helps your employees/users adopt software o& achieve even the most complex tasks instantly. Boost knowledge retention, software adoption & employee engagement with easy solution.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
This article discusses how to implement server side field validation and display customized error messages to the client.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question