• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2837
  • Last Modified:

Character encoding - Form submit (Internet Explorer oddity)

[ Edit : Please note that the special characters that I describe is not displayed correctly in Experts Exchange.... speaking of charset issues ;-) ]

I have a very annoying character encoding issue with Internet Explorer.
I have a web-application with an administration module that must handle many langauges (e.g. Norwegian (charset iso-8859-1) and Polish (charset iso-8859-2)).

Try looking at http://testzone.subsero.dk/polish/
When I use Internet Explorer, I get the following result when submitting the form :

     testtext=k%F8rer&testtext2=Os%26%23322%3Buga

But with any other browser (e.g. Netscape 6, Opera 6), I get :

     testtext=k%F8rer&testtext2=Os%3Fuga

My problem is that I can't process the data correctly when I don't know what the input will be.

I've set the charset to ISO-8859-1 in both the content-type, the accept-charset property on the <form>, and the ASP property Session.Codepage.

- Why does Internet Explorer perform this HTML encoding?
- Is there any way to force it to behave differently?
- Which versions of Internet Explorer does this?
0
Rohde
Asked:
Rohde
  • 7
  • 4
1 Solution
 
b1xml2Commented:
when you use the following in the show.asp, you will get the correct values for MSIE

<%@language="VBScript"%>
<b><%=unescape(Request.Form)%></b>


Next, in the headers that MSIE 6 and NS 6 sends, the following is interesting....


MSIE 6
======

HTTP_CONTENT_TYPE
application/x-www-form-urlencoded

HTTP_ACCEPT_ENCODING
gzip, deflate

(NO HTTP_ACCEPT_CHARSET)
NS 6
====

HTTP_CONTENT_TYPE
application/x-www-form-urlencoded

HTTP_ACCEPT_ENCODING
gzip, deflate, compress;q=0.9

HTTP_ACCEPT_CHARSET
ISO-8859-1, utf-8;q=0.66, *;q=0.66


By the way, the correct encoding is not ISO_8859-1 but ISO-8859-1

0
 
RohdeAuthor Commented:
ISO_8859-1 -> Thanks, took the wrong one from the IANA list.

The unescape doesn't help (and it is done automatically by the ASP engine when accessing the Request.Form collection, e.g. Request.Form("testtext")).

The unescaped form becomes "testtext=k¸rer&testtext2=Os&#322;uga".
Note that the ¸ is shown "as is", whereas the polish character is HTML encoded.... not very nice, either both should be HTML encoded, or none of them should.

If I change all the charset settings from iso-8859-1 to iso-8859-2, the resulting string in Internet Explorer becomes "testtext=k%26oslash%3Brer&testtext2=Os%B3uga" (unescaped "testtext=k&oslash;rer&testtext2=Osùuga"). As you can see this just switches it so the ¸ becomes HTMLEncoded (&oslash; is ¸), and the polish character (ù) is written as is.
0
 
RohdeAuthor Commented:
The headers that you describe, was those the ones returned from my testsite, or just a general observation?

What program/util do you use to view the request and headers?
I'm using various collection from ASP to check them, but it's tedious, and I'm not sure that it is not sometimes doing "auto-stuff" behind my back, so that I don't get the full picture.
0
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

 
b1xml2Commented:
I am running IIS 5 and use MSIE 6, when I use the unescape function, the data is correctly viewed inside MSIE 6. and the encoding settings for MSIE 6 is Western European ISO.  
0
 
b1xml2Commented:
the HTTP headers were obtained from my dev site to see what's going on under the covers.
0
 
RohdeAuthor Commented:
No problems at all....hmmm, would you try posting the exact output from "Response.Write Request.Form", just to make sure (because I'm getting fairly confused by this :)

Btw.: My files look like this:

default.asp
*****************
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1
%>
<!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Polish Test</title>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>

<body>
<form accept-charset="ISO-8859-1" action="show.asp" name="polishForm" method="post">
     <input type="text" value="k&#248;rer" name="testtext"><br>
     <input type="text" value="Os&#322;uga" name="testtext2"><br>
     <br>
     <br>
     <input type="submit"><br>
     <br>
</form>

</body>
</html>

show.asp
*******************
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1

Response.Write Request.Form
Response.Write VBNewline & VBNewline
Response.Write Request.ServerVariables("ALL_RAW")
%>
0
 
b1xml2Commented:
try changing the show.asp to
============================
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1

Response.Write unescape(Request.Form)
Response.Write VBNewline & VBNewline
Response.Write Request.ServerVariables("ALL_RAW")
%>

and for you info about the ServerVariables

my show.asp is this

<%@language="VBScript"%>
<html>
<head>
<title>Polish Test</title>
<meta http-equiv="content-type" content="text/html;charset=iso-8859-1">
</head>
<body>
<%
 For Each oItem In Request.ServerVariables
  Response.Write "<div>" & oItem & "</div>"
  Response.Write "<xmp>" & Request.ServerVariables(oItem) & "</xmp>"
  Response.Write "<hr>"
 
 Next

%>
<b style="font-variant:small-caps"><%Response.Write unescape(Request.Form)%></b>
</body>
</html>
0
 
webwomanCommented:
Can you just set it as Unicode, rather than specific character sets? That should allow all the characters, and IE should handle it better.

At least, I think it should...I don't have to deal with multiple languages, so it's a guess. But it is something that's built into most MS stuff.
0
 
RohdeAuthor Commented:
Ok, thanks.
Seems that the show.asp that you posted is exactly the same except for the unescape. But the escaped characters are not the problem, the URL encoding has been done before the HTML encoding.

When I use your show.asp, I get the following string :

   testtext=kører&testtext2=Os&#322;uga

The polish special char is HTMLEncoded, but not the Danish one.....

Try viewing the source and posting the exact string that the page outputs (not the one that is shown by the browser, because that will show the HTML encoded special-char correctly).
0
 
RohdeAuthor Commented:
Interleaving posts webwoman, didn't mean to ignore you....

Yes I have also considered UNIcode. I would have to install it on the server, and change the database. (and the rest of the application is finished, so I'd rather not have to go into all that if it could be avoided).

But does anyone have any good or bad experiences with using UNIcode in a IIS/SQL Server environment? Would that be an easy way of working around the problem?
0
 
RohdeAuthor Commented:
Ok, I've tried it......... and I actually got it to work!! :-)
(when I'd tried using unicode earlier, I'd used the Family Code Page (#1200) instead of the "direct" codepage number (#65001), and that caused it to say "unknown codepage", as if it wasn't installed on the server...... the Family Code Page number for e.g. ISO-8859-1 DOES work.....).

I can just HTML encode the unicode that I recieve from the form submit (UTF-8 encoded), before I put it in the database (although it is not a particularly nice thing to do). (storing the data HTML encoded is only a problem if I later want to use the data for something other than websites).

I am however still interested in why Internet Explorer HTML encodes characters outside it's current codepage, and what I can do about it.
0
 
RohdeAuthor Commented:
Thanks for the hints :)

After fixing the normal forms I found out that I also had to change the upload component that I was using(http://www.safileup.com) because it didn't support changing codepage. (switched to ASPUpload from http://www.persits.com/)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

  • 7
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now