[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 155
  • Last Modified:

how do you get these � Marks in your text?

Hi There,
All over my news pages here http://www.bizgro.jobs/news/ you can see rouge glyphs like �
How on earth do they get there and how do I remove them without actually going through every single page and removing them?
Thanks in advance, A
0
Amanda Watson
Asked:
Amanda Watson
2 Solutions
 
Dave BaldwinFixer of ProblemsCommented:
They indicate a mismatch in character sets.  Your page is set as UTF-8 but the articles are encoded as Western or Latin-1.  When I switch the encoding in Firefox to Western, those glyphs disappear.  Probably the most common reason for that is copying and pasting text that was generated in Word which uses Windows-1252 which is a Latin / Western character encoding.
0
 
Terry WoodsIT GuruCommented:
There's a solution suggested here: http://wpfab.com/clean-up-weird-characters-in-your-wordpress-posts/

Back up your site first, or limit it to one post, just in case it does something unexpected.

UPDATE wp_posts SET post_content = REPLACE(post_content, '�', '');

Open in new window

0
 
Amanda WatsonWeb DeveloperAuthor Commented:
Shall I enter that code into SQL in the database via phpmyAdmin?
0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
Geert GruwezOracle dbaCommented:
this has nothing to do with Delphi
Tag removed
0
 
Amanda WatsonWeb DeveloperAuthor Commented:
Well I ran the query and 0 rows were affected so they remain?
Now what, any ideas?
0
 
PortletPaulCommented:
It may not be as simple as a single update query in the database. There seem to be several reasons for those glyphs including quotes, hyphens, apostrophes and possibly more than just those.

Here is an example (but for some parts I am guessing)
(as is)
�Marketing is telling the truth attractively� � as heard from Peter Daniels. To me, 
marketing is opening yourself up to the world – which is risky � and being proud of what you
 can give and serve others with. I also believe that if a business is not marketing it will have 
poor customer service. We should always be willing to give with no expectation of a return. 
That doesn�t mean we should never ask for remuneration for our goods and services. It just
 means that we have enough of our own self-esteem that we don�t need outward 
affirmation (paid or unpaid) that we have given enough.

Open in new window


(to be)
"Marketing is telling the truth attractively" as heard from Peter Daniels.

To me, marketing is opening yourself up to the world – which is risky -  and being proud of what you can give and serve others with. I also believe that if a business is not marketing it will have poor customer service. We should always be willing to give with no expectation of a return. That doesn't mean we should never ask for remuneration for our goods and services. It just means that we have enough of our own self-esteem that we don't need outward affirmation (paid or unpaid) that we have given enough.

In that example some of the glyphs relate to possible quotation marks or perhaps bullets while others relate to hyphenation or apostrophes.

Most likely is that someone is using a WYSIWYG editor (such as Word) and pasting into into the Wordpress text boxes assuming all formatting is universal (which is not true).

That practice will have to stop if you are to solve this from happening again and again.
2
 
DansDadUKCommented:
I agree with the analysis of the problem by previous responders, indicating that the most likely cause is "... a mismatch in character sets ..." and "... glyphs relate to possible quotation marks or perhaps bullets while others relate to hyphenation or apostrophes ...".

The ISO-8859-1 Latin-1 character set is an exact subset of the Unicode character set.

But the Windows Latin-1 (CP1252) character set (which is a 'superset' of ISO-8859-1) is not an exact subset of Unicode.

In ISO-8859-1 (and Unicode), the code-point range 0x80 -> 0x9F is reserved for the (little-used) non-graphic C1 control-code characters.

But the (frequently used) Windows Latin-1 (CP1252) character set uses this range to define various additional graphic characters, including 'smart quotes', and 'dot' and 'dash' characters:

C1 range in CP1252
So (as others have said) if text, encoded using this character set includes such characters, is pasted directly into a page which is expecting the character set to be the UTF-8 encoding of Unicode, then these characters will map to the Unicode "REPLACEMENT CHARACTER", used to replace an unknown or unrepresentable character.
0
 
Amanda WatsonWeb DeveloperAuthor Commented:
Thanks for the explanation.   Any idea how to remove them easily as per the question?
0
 
PortletPaulCommented:
As I attempted to demonstrate, it probably isn't as simple as a single update query.

I suggest you try manual correction on one or two, and you will then understand that if you replace all those glyphs with a single character you will still have a large problem to solve. Perhaps worse than before by the way as it will be harder to identify.

Sorry.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now