How to paste special characters into a form text field.

I have numerous html forms where scientists paste abstract/summaries of their research into a textarea field for peer review. Unfortunately, they use many special characters (greek letters, scientific notation, etc) that are available in their word processing programs, but which get lost in translation into the html form textarea field. This is causing much grief!

Example: use MSWORD to write the following statement -- "The symbol of Pi is (symbol of Pi here)." Now paste that statement into an html form textarea field, and watch the Pi symbol turn into a generic open square (as all special characters do).

I don't know how to solve this, and would greatly appreciate any hints, clues or solves.


Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

The only way I know to get around this is by using the character code, in your example it would be "Π" instead of the pi character. But that's pretty cumbersome...
What's the charset for your page? Try adding a

<meta http-equiv="Content-Type" value="text/html; charset=utf-8">

to the page. And make sure your scientists are using that charset (Arial Unicode MS font) in their Word docs as well. See "Unicode" in the MS Word Help for more info.
Other solutions would be to write a converter script that they could upload text files to that would do the replacements. Or if you're using Dreamweaver, you might be able to write a custom extension for it that would let them pick the symbols out from a menu and automatically insert the equivalent entities into an HTML doc.
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

mupledgeAuthor Commented:
I appreciate the suggestions. Changing the charset to utf-8 was a good idea, though it does not solve the problem.

To restate in more detail: Various scientists apply for speaking times at conferences through my office. Everyone wants to do it online now. But, they use diverse word processing programs in creating their abstract submissions, and when they paste those into an online html form, many special characters are lost.

I cannot ask the submitters to write the html numeric or character entities -- they just copy and paste, then submit the form. I have thought of writing a translation block in javascript, but the problem appears that the html form doesn't get the special characters from the word processing format in the first place. Or am I wrong there, and just need to figure out how to decode those characters...

Stil looking for the magic answer, and thank those that have responded thus far.



I've just tested my hypothesis and it appears to be correct. This will work, but

1. The charset of the HTML page containing the textarea MUST be UTF-8.

2. The charset being used in the word processor MUST be UTF-8.

A better solution might to have them upload their abstracts files for conversion on the server -- but it would have to be a Windows server.
Another thought -- I know that Mozilla can read MathML, but I have no idea what it's editing capabilities are in that regard.

Amaya can do some MathML editing --
Why don't you just let them to attach the Word file?
That wil solve your problem for sure. :c)
Try this. Copy a few letters out of word highlight the text on the HTML page press preview, pressing ConvertHTML will give you the equvalent html in the text area. Problem is the amount of html is huge, but may be OK. I have shown the amount necessary for sigma on the page which is much less than word generates. It may be possible to edit out the spurious stuff using regex. (Word converts to fonts not the codes unfortunately)  

<table id="t1">
<td id="c1">&nbsp; &nbsp Highlight This &nbsp
<form id="f1">
<input type="button" value="preview" onclick="document.execCommand('Paste')"><p>
<textarea rows='10' cols=80 id="abstract"></textarea>
<input type="button" value="convertHTML" onclick="getElementById('f1').abstract.value=getElementById('t1').rows[0].cells[0].innerHTML">                                    
<SPAN style="FONT-SIZE: 12pt; FONT-FAMILY: Symbol;">S</SPAN>
Are you still interested in this question cause I have seen how to do it?
mupledgeAuthor Commented:

Yes, I'm still interested.
   This does it, the user can edit the box content as well. I have  set the action to GET so you can see what gets sent, you will change it to POST for the final App.The problem with your design is that Word docs and many other types generate a very large amount of HTML. It may be OK and there are ways of editing out the extra stuff.

function sendSpan(){
 theHTML = document.getElementById("theContent").innerHTML;
 theForm = document.getElementById("MyForm");
 theForm.newText.value = theHTML;

<form action="" id="MyForm" action="get">
<p><input type=hidden name="newText">
<input type=button value=Send onClick="sendSpan()"></P>
Paste Document here
<table  border=1 height=40% width=95% align="center"><tr><td>
<span id="theContent" contentEditable style="height: 100%; width: 100%;"></span>

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
mupledgeAuthor Commented:

This solution works well using IE, but I couldn't get the text box to work with either mozilla or Opera. Any thoughts on that?

In any event, this appears to be the best solution to the problem, so I will close this question. I appreciate all the suggestions and help.
The solution I posted will work in MSIE, Moz, and Opera.

Of course, it's not clueless-user-proof, I'm afraid.

(contentEditable has no analogue in the other two browsers, which is why I didn't suggest it.)
My Date: 11/21/2003 10:04PM PST post does work crossbrowser but need a bit of work on its interface. Thx for the points. GfW
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Fonts Typography

From novice to tech pro — start learning today.