Parse/strip/replace bad MS Word characters

I have a friend who I wrote some very simple publishing software,
basically he takes his writtings and puts them online.  Well his
writtings are in Word and so he has alot of special characters that he
inputs, some unknowingly, into the database.  Are there any classes or
samples of what others have done to strip/replace/find these special
characters, I have asked him to be careful, but he will do it once or
twice and then forget and lapse, and I get a call saying can you help
me get these out.

Any ideas?
Who is Participating?
RoonaanConnect With a Mentor Commented:
What i noticed that sometimes might work is to use:

echo htmlentities(utf8_decode($stringFromDatabase));

If you know what particular characters he uses and what you'd like them to be, you can use the strtr function.
Otherwise, you can use either ereg_replace or preg_replace to strip anything that isn't allowed.
iceman19330Author Commented:
Thats my problem is that in Word they look like one thing in the db they look like another, and I tried parsing the script and it couldnt pick them up.  So I was looking to see if someone knew about what the charaters would look like to the script.  The ones that I know are an issue are ` and the fancy double quotes.
You can use the ord() function to find out the ASCII value of the special characters.

Look in the user comments on that page; somebody has already written some functions with this particular problem in mind.
Muhammad WasifCommented:
While pasting word document to textarea or whatever you have, use javascripts function getData(). It work only with IE.
var content = clipboardData.getData("Text");
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.