iceman19330
asked on
Parse/strip/replace bad MS Word characters
I have a friend who I wrote some very simple publishing software,
basically he takes his writtings and puts them online. Well his
writtings are in Word and so he has alot of special characters that he
inputs, some unknowingly, into the database. Are there any classes or
samples of what others have done to strip/replace/find these special
characters, I have asked him to be careful, but he will do it once or
twice and then forget and lapse, and I get a call saying can you help
me get these out.
Any ideas?
basically he takes his writtings and puts them online. Well his
writtings are in Word and so he has alot of special characters that he
inputs, some unknowingly, into the database. Are there any classes or
samples of what others have done to strip/replace/find these special
characters, I have asked him to be careful, but he will do it once or
twice and then forget and lapse, and I get a call saying can you help
me get these out.
Any ideas?
ASKER
Thats my problem is that in Word they look like one thing in the db they look like another, and I tried parsing the script and it couldnt pick them up. So I was looking to see if someone knew about what the charaters would look like to the script. The ones that I know are an issue are ` and the fancy double quotes.
You can use the ord() function to find out the ASCII value of the special characters. http://www.php.net/ord
Look in the user comments on that page; somebody has already written some functions with this particular problem in mind.
Look in the user comments on that page; somebody has already written some functions with this particular problem in mind.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
While pasting word document to textarea or whatever you have, use javascripts function getData(). It work only with IE.
var content = clipboardData.getData("Tex t");
http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/reference/methods/getdata.asp
var content = clipboardData.getData("Tex
http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/reference/methods/getdata.asp
Otherwise, you can use either ereg_replace or preg_replace to strip anything that isn't allowed.
http://www.php.net/ereg_replace
http://www.php.net/preg_replace