birwin
asked on
How to Clean Word text pasted into a text box with PHP
I have a large text box. Users can type content into it, but many are creating their content in Word, or other programs, and pasting it into the text box. Especially with Word, I get some very unpredictable results.
For example, this was entered recently:
Even though we were never in one village for more than a day, or in one hospital for more than an afternoon, Iâï&ique st;½ ;ï&iq uest;&frac 12;d find myself meeting children whom Iâï&ique st;½ ;ï&iq uest;&frac 12;d form a bond with. . .. . . and so on
The âï&iques t;½ ï&iqu est;&frac1 2; represents an apostrophe.
I strip the code of html entities, but how do I get rid of this type of garbage and replace it with the appropriate characters?
Also, line feeds from pasted documents are not predicable. Is there a way to tame them?
Thanks
For example, this was entered recently:
Even though we were never in one village for more than a day, or in one hospital for more than an afternoon, Iâï&ique
The âï&iques
I strip the code of html entities, but how do I get rid of this type of garbage and replace it with the appropriate characters?
Also, line feeds from pasted documents are not predicable. Is there a way to tame them?
Thanks
ASKER
Thank you for your comment.
I realize that I can use str_replace as shown to clean known code, and I am currently using that for some common codes, but I keep getting new combinations that I haven't seen before. I assume that is because people paste from different versions of word, or perhaps from Open Office.
My hope was that there was some class that had a full library of the possible Word codes.
I realize that I can use str_replace as shown to clean known code, and I am currently using that for some common codes, but I keep getting new combinations that I haven't seen before. I assume that is because people paste from different versions of word, or perhaps from Open Office.
My hope was that there was some class that had a full library of the possible Word codes.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
@birwin: This is a great question, and a screaming pain (thanks again MSFT). You can tell people to post only plain text, but many clients will not know how to do that. Suggest you ask the moderators to help you find a couple of other zones and add the question to those zones. It deserves an answer because it is a problem we all face. Best, ~Ray
ASKER
I think I found the answer on another posting. I have applied it, and no garbage has come through since, although not a lot of postings have been made to test it.
http://www.experts-exchang e.com/Web_ Developmen t/Web_Lang uages-Stan dards/PHP/ Q_22810036 .html?
http://www.experts-exchang
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
You'd need to append each Word-submitted string to the top array, and it's replacement in the same position in the bottom array, I hope this makes sense.
Open in new window