Solved

Character encoding - Form submit (Internet Explorer oddity)

Posted on 2002-04-16
12
2,787 Views
Last Modified: 2013-12-03
[ Edit : Please note that the special characters that I describe is not displayed correctly in Experts Exchange.... speaking of charset issues ;-) ]

I have a very annoying character encoding issue with Internet Explorer.
I have a web-application with an administration module that must handle many langauges (e.g. Norwegian (charset iso-8859-1) and Polish (charset iso-8859-2)).

Try looking at http://testzone.subsero.dk/polish/
When I use Internet Explorer, I get the following result when submitting the form :

     testtext=k%F8rer&testtext2=Os%26%23322%3Buga

But with any other browser (e.g. Netscape 6, Opera 6), I get :

     testtext=k%F8rer&testtext2=Os%3Fuga

My problem is that I can't process the data correctly when I don't know what the input will be.

I've set the charset to ISO-8859-1 in both the content-type, the accept-charset property on the <form>, and the ASP property Session.Codepage.

- Why does Internet Explorer perform this HTML encoding?
- Is there any way to force it to behave differently?
- Which versions of Internet Explorer does this?
0
Comment
Question by:Rohde
  • 7
  • 4
12 Comments
 
LVL 23

Expert Comment

by:b1xml2
Comment Utility
when you use the following in the show.asp, you will get the correct values for MSIE

<%@language="VBScript"%>
<b><%=unescape(Request.Form)%></b>


Next, in the headers that MSIE 6 and NS 6 sends, the following is interesting....


MSIE 6
======

HTTP_CONTENT_TYPE
application/x-www-form-urlencoded

HTTP_ACCEPT_ENCODING
gzip, deflate

(NO HTTP_ACCEPT_CHARSET)
NS 6
====

HTTP_CONTENT_TYPE
application/x-www-form-urlencoded

HTTP_ACCEPT_ENCODING
gzip, deflate, compress;q=0.9

HTTP_ACCEPT_CHARSET
ISO-8859-1, utf-8;q=0.66, *;q=0.66


By the way, the correct encoding is not ISO_8859-1 but ISO-8859-1

0
 

Author Comment

by:Rohde
Comment Utility
ISO_8859-1 -> Thanks, took the wrong one from the IANA list.

The unescape doesn't help (and it is done automatically by the ASP engine when accessing the Request.Form collection, e.g. Request.Form("testtext")).

The unescaped form becomes "testtext=k¸rer&testtext2=Os&#322;uga".
Note that the ¸ is shown "as is", whereas the polish character is HTML encoded.... not very nice, either both should be HTML encoded, or none of them should.

If I change all the charset settings from iso-8859-1 to iso-8859-2, the resulting string in Internet Explorer becomes "testtext=k%26oslash%3Brer&testtext2=Os%B3uga" (unescaped "testtext=k&oslash;rer&testtext2=Osùuga"). As you can see this just switches it so the ¸ becomes HTMLEncoded (&oslash; is ¸), and the polish character (ù) is written as is.
0
 

Author Comment

by:Rohde
Comment Utility
The headers that you describe, was those the ones returned from my testsite, or just a general observation?

What program/util do you use to view the request and headers?
I'm using various collection from ASP to check them, but it's tedious, and I'm not sure that it is not sometimes doing "auto-stuff" behind my back, so that I don't get the full picture.
0
 
LVL 23

Expert Comment

by:b1xml2
Comment Utility
I am running IIS 5 and use MSIE 6, when I use the unescape function, the data is correctly viewed inside MSIE 6. and the encoding settings for MSIE 6 is Western European ISO.  
0
 
LVL 23

Expert Comment

by:b1xml2
Comment Utility
the HTTP headers were obtained from my dev site to see what's going on under the covers.
0
 

Author Comment

by:Rohde
Comment Utility
No problems at all....hmmm, would you try posting the exact output from "Response.Write Request.Form", just to make sure (because I'm getting fairly confused by this :)

Btw.: My files look like this:

default.asp
*****************
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1
%>
<!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Polish Test</title>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>

<body>
<form accept-charset="ISO-8859-1" action="show.asp" name="polishForm" method="post">
     <input type="text" value="k&#248;rer" name="testtext"><br>
     <input type="text" value="Os&#322;uga" name="testtext2"><br>
     <br>
     <br>
     <input type="submit"><br>
     <br>
</form>

</body>
</html>

show.asp
*******************
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1

Response.Write Request.Form
Response.Write VBNewline & VBNewline
Response.Write Request.ServerVariables("ALL_RAW")
%>
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 23

Expert Comment

by:b1xml2
Comment Utility
try changing the show.asp to
============================
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1

Response.Write unescape(Request.Form)
Response.Write VBNewline & VBNewline
Response.Write Request.ServerVariables("ALL_RAW")
%>

and for you info about the ServerVariables

my show.asp is this

<%@language="VBScript"%>
<html>
<head>
<title>Polish Test</title>
<meta http-equiv="content-type" content="text/html;charset=iso-8859-1">
</head>
<body>
<%
 For Each oItem In Request.ServerVariables
  Response.Write "<div>" & oItem & "</div>"
  Response.Write "<xmp>" & Request.ServerVariables(oItem) & "</xmp>"
  Response.Write "<hr>"
 
 Next

%>
<b style="font-variant:small-caps"><%Response.Write unescape(Request.Form)%></b>
</body>
</html>
0
 
LVL 19

Accepted Solution

by:
webwoman earned 150 total points
Comment Utility
Can you just set it as Unicode, rather than specific character sets? That should allow all the characters, and IE should handle it better.

At least, I think it should...I don't have to deal with multiple languages, so it's a guess. But it is something that's built into most MS stuff.
0
 

Author Comment

by:Rohde
Comment Utility
Ok, thanks.
Seems that the show.asp that you posted is exactly the same except for the unescape. But the escaped characters are not the problem, the URL encoding has been done before the HTML encoding.

When I use your show.asp, I get the following string :

   testtext=kører&testtext2=Os&#322;uga

The polish special char is HTMLEncoded, but not the Danish one.....

Try viewing the source and posting the exact string that the page outputs (not the one that is shown by the browser, because that will show the HTML encoded special-char correctly).
0
 

Author Comment

by:Rohde
Comment Utility
Interleaving posts webwoman, didn't mean to ignore you....

Yes I have also considered UNIcode. I would have to install it on the server, and change the database. (and the rest of the application is finished, so I'd rather not have to go into all that if it could be avoided).

But does anyone have any good or bad experiences with using UNIcode in a IIS/SQL Server environment? Would that be an easy way of working around the problem?
0
 

Author Comment

by:Rohde
Comment Utility
Ok, I've tried it......... and I actually got it to work!! :-)
(when I'd tried using unicode earlier, I'd used the Family Code Page (#1200) instead of the "direct" codepage number (#65001), and that caused it to say "unknown codepage", as if it wasn't installed on the server...... the Family Code Page number for e.g. ISO-8859-1 DOES work.....).

I can just HTML encode the unicode that I recieve from the form submit (UTF-8 encoded), before I put it in the database (although it is not a particularly nice thing to do). (storing the data HTML encoded is only a problem if I later want to use the data for something other than websites).

I am however still interested in why Internet Explorer HTML encodes characters outside it's current codepage, and what I can do about it.
0
 

Author Comment

by:Rohde
Comment Utility
Thanks for the hints :)

After fixing the normal forms I found out that I also had to change the upload component that I was using(http://www.safileup.com) because it didn't support changing codepage. (switched to ASPUpload from http://www.persits.com/)
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Envision that you are chipping away at another e-business site with a team of pundit developers and designers. Everything seems, by all accounts, to be going easily.
"In order to have an organized way for empathy mapping, we rely on a psychological model and trying to model it in a simple way, so we will split the board to three section for each persona and a scenario and try to see what those personas would Do,…
This tutorial demonstrates how to identify and create boundary or building outlines in Google Maps. In this example, I outline the boundaries of an enclosed skatepark within a community park.  Login to your Google Account, then  Google for "Google M…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now