Solved

How to Clean Word text pasted into a text box with PHP

Posted on 2009-04-06
6
1,277 Views
Last Modified: 2012-05-06
I have a large text box. Users can type content into it, but many are creating their content in Word, or other programs, and pasting it into the text box. Especially with Word, I get some very unpredictable results.

For example, this was entered recently:
Even though we were never in one village for more than a day, or in one hospital for more than an afternoon, I�d find myself meeting children whom I�d form a bond with. . .. . . and so on

The � represents an apostrophe.

I strip the code of html entities, but how do I get rid of this type of garbage and replace it with the appropriate characters?

Also, line feeds from pasted documents are not predicable. Is there a way to tame them?

Thanks
0
Comment
Question by:birwin
  • 2
  • 2
  • 2
6 Comments
 
LVL 10

Expert Comment

by:Phatzer
ID: 24101278
I'm not sure if there is a PHP engine to do this an easier way, but if you don't mind going over Word submitted content to discover what represents what, then this should do the job:

You'd need to append each Word-submitted string to the top array, and it's replacement in the same position in the bottom array, I hope this makes sense.
// Word-submitted strings
$replaceVar = array(
	"�",
	"SOMETHING ELSE"
);
 
// Replacement content for above strings
$replaceVal = array(
	"'",
	"REPLACEMENT"
);
 
// Go through content and replace occurrences of Word-submitted content with specified replacement
$submittedContent = str_replace($replaceVar, $replaceVal, $submittedContent);

Open in new window

0
 
LVL 6

Author Comment

by:birwin
ID: 24101511
Thank you for your comment.
I realize that I can use str_replace as shown to clean known code, and I am currently using that for some common codes, but I keep getting new combinations that I haven't seen before. I assume that is because people paste from different versions of word, or perhaps from Open Office.
My hope was that there was some class that had a full library of the possible Word codes.
0
 
LVL 10

Accepted Solution

by:
Phatzer earned 250 total points
ID: 24101705
I've had a look around but I can't see much for this issue, it's so damn annoying that Microsoft has to throw in this metadata, but I'd guess it's to preserve formatting when pasting between Word applications. If you are patient enough to build a list of the annoying tags, then it will be worth it in the end I guess, but sorry i can't help you any more.
0
NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

 
LVL 109

Expert Comment

by:Ray Paseur
ID: 24109776
@birwin: This is a great question, and a screaming pain (thanks again MSFT).  You can tell people to post only plain text, but many clients will not know how to do that.  Suggest you ask the moderators to help you find a couple of other zones and add the question to those zones.  It deserves an answer because it is a problem we all face.  Best, ~Ray
0
 
LVL 6

Author Comment

by:birwin
ID: 24110309
I think I found the answer on another posting. I have applied it, and no garbage has come through since, although not a lot of postings have been made to test it.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_22810036.html?
 
0
 
LVL 109

Assisted Solution

by:Ray Paseur
Ray Paseur earned 250 total points
ID: 24110424
That looks pretty good at first glance.  Glad you've found a solution!
0

Featured Post

NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question