How to zap "curly quotes" or "smart quotes" from scraped page
Posted on 2004-08-01
I'm scraping content (news headlines) from another web site which I then feed into a MySQL database. But the morons who create content for the site have "special quotes" or "curly quotes" generated by Microsoft products in their text. So when a headline has an apostrophe, it shows up as a question mark on my web pages. So instead of "can't", you see "can?t"
I ran the HTML generated by the offending page thourgh a hex editor and found the apostrophe it creates has a hex code of 92. A real apostrophe has a hex code of 27.
I've always been confused as hell by character sets and getting around these kinds problems. Is there a function in PHP that will solve this problem. I tried the htmlspecialchars() function with no results. Any tips/help would be great.