Solved

How to zap "curly quotes" or "smart quotes" from scraped page

Posted on 2004-08-01
6
342 Views
Last Modified: 2012-06-21
I'm scraping content (news headlines) from another web site which I then feed into a MySQL database.  But the morons who create content for the site have "special quotes" or "curly quotes" generated by Microsoft products in their text.  So when a headline has an apostrophe, it shows up as a question mark on my web pages.  So instead of "can't", you see "can?t"

I ran the HTML generated by the offending page thourgh a hex editor and found the apostrophe it creates has a hex code of 92.  A real apostrophe has a hex code of 27.

I've always been confused as hell by character sets and getting around these kinds problems.  Is there a function in PHP that will solve this problem.  I tried the htmlspecialchars() function with no results.  Any tips/help would be great.
0
Comment
Question by:nysus1
  • 3
  • 3
6 Comments
 
LVL 36

Accepted Solution

by:
Zyloch earned 500 total points
ID: 11689316
You can try htmlentities()

Regards,
${Zyloch}
0
 

Author Comment

by:nysus1
ID: 11689347
You da man!  That did it.  But why would htmlentities work and not htmspecialchars?  The PHP at manual at http://us4.php.net/htmlentities says the two functions are identical 'except with htmlentities(), all characters which have HTML character entity equivalents are translated into these entities.'  Not too clear to me.

0
 
LVL 36

Expert Comment

by:Zyloch
ID: 11689387
Yes, it means with htmlspecialchars, it'll only change ",',& and some other basic chars but htmlentities changes every single one that has an equivalent char code in HTML, basically anything that appears on the ASCII chart.

Regards,
${Zyloch}
0
Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

 

Author Comment

by:nysus1
ID: 11689466
OK, one more question then, if I might.  Why does htmlspecialchars need a third argument, then?  If all it is translating is &, ", ' <, and > chars, why would it need to know which character set to use in the conversion?  Wouldn't those basic characters have the same ascii code across the different character sets?
0
 
LVL 36

Expert Comment

by:Zyloch
ID: 11689717
Not necessarily. For instance, Big5 is mainly for Asian languages and may have different character codes representing different characters. Most of the time, though, you'll only be using the default.

Regards,
${Zyloch}
0
 

Author Comment

by:nysus1
ID: 11690688
OK, thanks for you help and explanation.
0

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I imagine that there are some, like me, who require a way of getting currency exchange rates for implementation in web project from time to time, so I thought I would share a solution that I have developed for this purpose. It turns out that Yaho…
This article discusses four methods for overlaying images in a container on a web page
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

813 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now