Character encoding problem copying from Word to Wordpress

I use Wordpress on a large site that has multiple authors. Some authors are copying their articles from Microsoft Word in Wordpress. The character encoding is different. How can I make it write to the database in UTF-8 instead of Windows Western European?

WP is set up for UTF-8 in the wp-config.php file, but it does not convert the encoding when something is pasted into it.

I have written a REST service to pull the info I need. When it pulls the content from the database, it returns an XML error for encoding. WP is pulling from the same database, but somehow converts it to UTF-8 before it prints to the WP feed. Does anyone know what WP is doing so that I can do the same in my web service?
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

thsotoAuthor Commented:
Bernard S.CTOCommented:
Any time you are pasting from word into some blog or cms, you face formatting problem.

My own experience is now to paste word into notepad, the select all the notepad text and paste it into wodpress or equivalent.
You need to complete formatting by hand... but usually this will be faster than trying to recover from Word personal style of html.

Not sure that if you were following this quick and dirty trick you would still face you charset problem...
thsotoAuthor Commented:
It's not a formatting issue, it's an encoding issue.

Word has it's default set as Western European. WP is set for UTF-8. Quotation marks, apostrophes, hyphens, etc. are coded differently in the character sets. If you paste from Word to WP it pastes the Western European into a UTF-8 file. If I output that to an XML that is encoded as UTF-8, I get XML errors when it gets to the first unknown character.

I have no idea what character set each persons word processing program will use.
Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

Bernard S.CTOCommented:
Have you tested my suggestion? what is the redsult?
thsotoAuthor Commented:
Your suggestion reformats the text, but does not change the encoding.

When pasted from Word 2007 into Notepad, bullet points turn into boxes. Notepad does not recognize the bullets as UTF-8.
Bernard S.CTOCommented:
1 - And what happens when you copy from notepad into wordpress?

2 - Test the option of saving the (empty at that time) notepad file as UTF-8 / test also saving the (empty) file as 8859-15

These tests are rather tedious, but if/when they work then the process is really easy.

3 - On the parts where you have control in php, you might test the effect of saving to MySQL a string after converting it with utf8_encode.
BE CAREFUL run your test only on staging data, since some effects might not be visible at once
thsotoAuthor Commented:
My fix has been to change the default character encoding on each of the authors' computers in Word to UTF-8. This only helps if they use their desktop that I changed this on to write the articles.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Bernard S.CTOCommented:
The comment you gave "No one answered my question." is slightly inappropriate since you got answers and suggestions.
Although you did not test the answers given, you have found a solution on your own, which is fine since it solves your problem.

Next time a similar occasion occurs, please be kind enough to word correctly your reason, on EE it is very common to read something like "I have found a solution on my own", and this makes everybody happy.
thsotoAuthor Commented:
I did test the answers given, they didn't work.

I found a solution on my own.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.