Solved

Character encoding - Form submit (Internet Explorer oddity)

Posted on 2002-04-16
12
2,794 Views
Last Modified: 2013-12-03
[ Edit : Please note that the special characters that I describe is not displayed correctly in Experts Exchange.... speaking of charset issues ;-) ]

I have a very annoying character encoding issue with Internet Explorer.
I have a web-application with an administration module that must handle many langauges (e.g. Norwegian (charset iso-8859-1) and Polish (charset iso-8859-2)).

Try looking at http://testzone.subsero.dk/polish/
When I use Internet Explorer, I get the following result when submitting the form :

     testtext=k%F8rer&testtext2=Os%26%23322%3Buga

But with any other browser (e.g. Netscape 6, Opera 6), I get :

     testtext=k%F8rer&testtext2=Os%3Fuga

My problem is that I can't process the data correctly when I don't know what the input will be.

I've set the charset to ISO-8859-1 in both the content-type, the accept-charset property on the <form>, and the ASP property Session.Codepage.

- Why does Internet Explorer perform this HTML encoding?
- Is there any way to force it to behave differently?
- Which versions of Internet Explorer does this?
0
Comment
Question by:Rohde
  • 7
  • 4
12 Comments
 
LVL 23

Expert Comment

by:b1xml2
ID: 6944381
when you use the following in the show.asp, you will get the correct values for MSIE

<%@language="VBScript"%>
<b><%=unescape(Request.Form)%></b>


Next, in the headers that MSIE 6 and NS 6 sends, the following is interesting....


MSIE 6
======

HTTP_CONTENT_TYPE
application/x-www-form-urlencoded

HTTP_ACCEPT_ENCODING
gzip, deflate

(NO HTTP_ACCEPT_CHARSET)
NS 6
====

HTTP_CONTENT_TYPE
application/x-www-form-urlencoded

HTTP_ACCEPT_ENCODING
gzip, deflate, compress;q=0.9

HTTP_ACCEPT_CHARSET
ISO-8859-1, utf-8;q=0.66, *;q=0.66


By the way, the correct encoding is not ISO_8859-1 but ISO-8859-1

0
 

Author Comment

by:Rohde
ID: 6944412
ISO_8859-1 -> Thanks, took the wrong one from the IANA list.

The unescape doesn't help (and it is done automatically by the ASP engine when accessing the Request.Form collection, e.g. Request.Form("testtext")).

The unescaped form becomes "testtext=k¸rer&testtext2=Os&#322;uga".
Note that the ¸ is shown "as is", whereas the polish character is HTML encoded.... not very nice, either both should be HTML encoded, or none of them should.

If I change all the charset settings from iso-8859-1 to iso-8859-2, the resulting string in Internet Explorer becomes "testtext=k%26oslash%3Brer&testtext2=Os%B3uga" (unescaped "testtext=k&oslash;rer&testtext2=Osùuga"). As you can see this just switches it so the ¸ becomes HTMLEncoded (&oslash; is ¸), and the polish character (ù) is written as is.
0
 

Author Comment

by:Rohde
ID: 6944424
The headers that you describe, was those the ones returned from my testsite, or just a general observation?

What program/util do you use to view the request and headers?
I'm using various collection from ASP to check them, but it's tedious, and I'm not sure that it is not sometimes doing "auto-stuff" behind my back, so that I don't get the full picture.
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 23

Expert Comment

by:b1xml2
ID: 6944467
I am running IIS 5 and use MSIE 6, when I use the unescape function, the data is correctly viewed inside MSIE 6. and the encoding settings for MSIE 6 is Western European ISO.  
0
 
LVL 23

Expert Comment

by:b1xml2
ID: 6944473
the HTTP headers were obtained from my dev site to see what's going on under the covers.
0
 

Author Comment

by:Rohde
ID: 6944745
No problems at all....hmmm, would you try posting the exact output from "Response.Write Request.Form", just to make sure (because I'm getting fairly confused by this :)

Btw.: My files look like this:

default.asp
*****************
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1
%>
<!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Polish Test</title>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>

<body>
<form accept-charset="ISO-8859-1" action="show.asp" name="polishForm" method="post">
     <input type="text" value="k&#248;rer" name="testtext"><br>
     <input type="text" value="Os&#322;uga" name="testtext2"><br>
     <br>
     <br>
     <input type="submit"><br>
     <br>
</form>

</body>
</html>

show.asp
*******************
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1

Response.Write Request.Form
Response.Write VBNewline & VBNewline
Response.Write Request.ServerVariables("ALL_RAW")
%>
0
 
LVL 23

Expert Comment

by:b1xml2
ID: 6944856
try changing the show.asp to
============================
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1

Response.Write unescape(Request.Form)
Response.Write VBNewline & VBNewline
Response.Write Request.ServerVariables("ALL_RAW")
%>

and for you info about the ServerVariables

my show.asp is this

<%@language="VBScript"%>
<html>
<head>
<title>Polish Test</title>
<meta http-equiv="content-type" content="text/html;charset=iso-8859-1">
</head>
<body>
<%
 For Each oItem In Request.ServerVariables
  Response.Write "<div>" & oItem & "</div>"
  Response.Write "<xmp>" & Request.ServerVariables(oItem) & "</xmp>"
  Response.Write "<hr>"
 
 Next

%>
<b style="font-variant:small-caps"><%Response.Write unescape(Request.Form)%></b>
</body>
</html>
0
 
LVL 19

Accepted Solution

by:
webwoman earned 150 total points
ID: 6944908
Can you just set it as Unicode, rather than specific character sets? That should allow all the characters, and IE should handle it better.

At least, I think it should...I don't have to deal with multiple languages, so it's a guess. But it is something that's built into most MS stuff.
0
 

Author Comment

by:Rohde
ID: 6944936
Ok, thanks.
Seems that the show.asp that you posted is exactly the same except for the unescape. But the escaped characters are not the problem, the URL encoding has been done before the HTML encoding.

When I use your show.asp, I get the following string :

   testtext=kører&testtext2=Os&#322;uga

The polish special char is HTMLEncoded, but not the Danish one.....

Try viewing the source and posting the exact string that the page outputs (not the one that is shown by the browser, because that will show the HTML encoded special-char correctly).
0
 

Author Comment

by:Rohde
ID: 6945035
Interleaving posts webwoman, didn't mean to ignore you....

Yes I have also considered UNIcode. I would have to install it on the server, and change the database. (and the rest of the application is finished, so I'd rather not have to go into all that if it could be avoided).

But does anyone have any good or bad experiences with using UNIcode in a IIS/SQL Server environment? Would that be an easy way of working around the problem?
0
 

Author Comment

by:Rohde
ID: 6945173
Ok, I've tried it......... and I actually got it to work!! :-)
(when I'd tried using unicode earlier, I'd used the Family Code Page (#1200) instead of the "direct" codepage number (#65001), and that caused it to say "unknown codepage", as if it wasn't installed on the server...... the Family Code Page number for e.g. ISO-8859-1 DOES work.....).

I can just HTML encode the unicode that I recieve from the form submit (UTF-8 encoded), before I put it in the database (although it is not a particularly nice thing to do). (storing the data HTML encoded is only a problem if I later want to use the data for something other than websites).

I am however still interested in why Internet Explorer HTML encodes characters outside it's current codepage, and what I can do about it.
0
 

Author Comment

by:Rohde
ID: 6947679
Thanks for the hints :)

After fixing the normal forms I found out that I also had to change the upload component that I was using(http://www.safileup.com) because it didn't support changing codepage. (switched to ASPUpload from http://www.persits.com/)
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Learn by example how to specify CSS selectors for Selenium WebDriver test automation software.
Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

815 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now