Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Character encoding - Form submit (Internet Explorer oddity)

Posted on 2002-04-16
12
Medium Priority
?
2,815 Views
Last Modified: 2013-12-03
[ Edit : Please note that the special characters that I describe is not displayed correctly in Experts Exchange.... speaking of charset issues ;-) ]

I have a very annoying character encoding issue with Internet Explorer.
I have a web-application with an administration module that must handle many langauges (e.g. Norwegian (charset iso-8859-1) and Polish (charset iso-8859-2)).

Try looking at http://testzone.subsero.dk/polish/
When I use Internet Explorer, I get the following result when submitting the form :

     testtext=k%F8rer&testtext2=Os%26%23322%3Buga

But with any other browser (e.g. Netscape 6, Opera 6), I get :

     testtext=k%F8rer&testtext2=Os%3Fuga

My problem is that I can't process the data correctly when I don't know what the input will be.

I've set the charset to ISO-8859-1 in both the content-type, the accept-charset property on the <form>, and the ASP property Session.Codepage.

- Why does Internet Explorer perform this HTML encoding?
- Is there any way to force it to behave differently?
- Which versions of Internet Explorer does this?
0
Comment
Question by:Rohde
  • 7
  • 4
12 Comments
 
LVL 23

Expert Comment

by:b1xml2
ID: 6944381
when you use the following in the show.asp, you will get the correct values for MSIE

<%@language="VBScript"%>
<b><%=unescape(Request.Form)%></b>


Next, in the headers that MSIE 6 and NS 6 sends, the following is interesting....


MSIE 6
======

HTTP_CONTENT_TYPE
application/x-www-form-urlencoded

HTTP_ACCEPT_ENCODING
gzip, deflate

(NO HTTP_ACCEPT_CHARSET)
NS 6
====

HTTP_CONTENT_TYPE
application/x-www-form-urlencoded

HTTP_ACCEPT_ENCODING
gzip, deflate, compress;q=0.9

HTTP_ACCEPT_CHARSET
ISO-8859-1, utf-8;q=0.66, *;q=0.66


By the way, the correct encoding is not ISO_8859-1 but ISO-8859-1

0
 

Author Comment

by:Rohde
ID: 6944412
ISO_8859-1 -> Thanks, took the wrong one from the IANA list.

The unescape doesn't help (and it is done automatically by the ASP engine when accessing the Request.Form collection, e.g. Request.Form("testtext")).

The unescaped form becomes "testtext=k¸rer&testtext2=Os&#322;uga".
Note that the ¸ is shown "as is", whereas the polish character is HTML encoded.... not very nice, either both should be HTML encoded, or none of them should.

If I change all the charset settings from iso-8859-1 to iso-8859-2, the resulting string in Internet Explorer becomes "testtext=k%26oslash%3Brer&testtext2=Os%B3uga" (unescaped "testtext=k&oslash;rer&testtext2=Osùuga"). As you can see this just switches it so the ¸ becomes HTMLEncoded (&oslash; is ¸), and the polish character (ù) is written as is.
0
 

Author Comment

by:Rohde
ID: 6944424
The headers that you describe, was those the ones returned from my testsite, or just a general observation?

What program/util do you use to view the request and headers?
I'm using various collection from ASP to check them, but it's tedious, and I'm not sure that it is not sometimes doing "auto-stuff" behind my back, so that I don't get the full picture.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 23

Expert Comment

by:b1xml2
ID: 6944467
I am running IIS 5 and use MSIE 6, when I use the unescape function, the data is correctly viewed inside MSIE 6. and the encoding settings for MSIE 6 is Western European ISO.  
0
 
LVL 23

Expert Comment

by:b1xml2
ID: 6944473
the HTTP headers were obtained from my dev site to see what's going on under the covers.
0
 

Author Comment

by:Rohde
ID: 6944745
No problems at all....hmmm, would you try posting the exact output from "Response.Write Request.Form", just to make sure (because I'm getting fairly confused by this :)

Btw.: My files look like this:

default.asp
*****************
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1
%>
<!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Polish Test</title>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>

<body>
<form accept-charset="ISO-8859-1" action="show.asp" name="polishForm" method="post">
     <input type="text" value="k&#248;rer" name="testtext"><br>
     <input type="text" value="Os&#322;uga" name="testtext2"><br>
     <br>
     <br>
     <input type="submit"><br>
     <br>
</form>

</body>
</html>

show.asp
*******************
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1

Response.Write Request.Form
Response.Write VBNewline & VBNewline
Response.Write Request.ServerVariables("ALL_RAW")
%>
0
 
LVL 23

Expert Comment

by:b1xml2
ID: 6944856
try changing the show.asp to
============================
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1

Response.Write unescape(Request.Form)
Response.Write VBNewline & VBNewline
Response.Write Request.ServerVariables("ALL_RAW")
%>

and for you info about the ServerVariables

my show.asp is this

<%@language="VBScript"%>
<html>
<head>
<title>Polish Test</title>
<meta http-equiv="content-type" content="text/html;charset=iso-8859-1">
</head>
<body>
<%
 For Each oItem In Request.ServerVariables
  Response.Write "<div>" & oItem & "</div>"
  Response.Write "<xmp>" & Request.ServerVariables(oItem) & "</xmp>"
  Response.Write "<hr>"
 
 Next

%>
<b style="font-variant:small-caps"><%Response.Write unescape(Request.Form)%></b>
</body>
</html>
0
 
LVL 19

Accepted Solution

by:
webwoman earned 300 total points
ID: 6944908
Can you just set it as Unicode, rather than specific character sets? That should allow all the characters, and IE should handle it better.

At least, I think it should...I don't have to deal with multiple languages, so it's a guess. But it is something that's built into most MS stuff.
0
 

Author Comment

by:Rohde
ID: 6944936
Ok, thanks.
Seems that the show.asp that you posted is exactly the same except for the unescape. But the escaped characters are not the problem, the URL encoding has been done before the HTML encoding.

When I use your show.asp, I get the following string :

   testtext=kører&testtext2=Os&#322;uga

The polish special char is HTMLEncoded, but not the Danish one.....

Try viewing the source and posting the exact string that the page outputs (not the one that is shown by the browser, because that will show the HTML encoded special-char correctly).
0
 

Author Comment

by:Rohde
ID: 6945035
Interleaving posts webwoman, didn't mean to ignore you....

Yes I have also considered UNIcode. I would have to install it on the server, and change the database. (and the rest of the application is finished, so I'd rather not have to go into all that if it could be avoided).

But does anyone have any good or bad experiences with using UNIcode in a IIS/SQL Server environment? Would that be an easy way of working around the problem?
0
 

Author Comment

by:Rohde
ID: 6945173
Ok, I've tried it......... and I actually got it to work!! :-)
(when I'd tried using unicode earlier, I'd used the Family Code Page (#1200) instead of the "direct" codepage number (#65001), and that caused it to say "unknown codepage", as if it wasn't installed on the server...... the Family Code Page number for e.g. ISO-8859-1 DOES work.....).

I can just HTML encode the unicode that I recieve from the form submit (UTF-8 encoded), before I put it in the database (although it is not a particularly nice thing to do). (storing the data HTML encoded is only a problem if I later want to use the data for something other than websites).

I am however still interested in why Internet Explorer HTML encodes characters outside it's current codepage, and what I can do about it.
0
 

Author Comment

by:Rohde
ID: 6947679
Thanks for the hints :)

After fixing the normal forms I found out that I also had to change the upload component that I was using(http://www.safileup.com) because it didn't support changing codepage. (switched to ASPUpload from http://www.persits.com/)
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There’s a good reason for why it’s called a homepage – it closely resembles that of a physical house and the only real difference is that it’s online. Your website’s homepage is where people come to visit you. It’s the family room of your website wh…
When it comes to security, close monitoring is a must. According to WhiteHat Security annual report, a substantial number of all web applications are vulnerable always. Monitis offers a new product - fully-featured Website security monitoring and pr…
This tutorial walks through the best practices in adding a local business to Google Maps including how to properly search for duplicates, marker placement, and inputing business details. Login to your Google Account, then search for "Google Mapmaker…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Suggested Courses

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question