Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Character encoding - Form submit (Internet Explorer oddity)

Posted on 2002-04-16
12
Medium Priority
?
2,809 Views
Last Modified: 2013-12-03
[ Edit : Please note that the special characters that I describe is not displayed correctly in Experts Exchange.... speaking of charset issues ;-) ]

I have a very annoying character encoding issue with Internet Explorer.
I have a web-application with an administration module that must handle many langauges (e.g. Norwegian (charset iso-8859-1) and Polish (charset iso-8859-2)).

Try looking at http://testzone.subsero.dk/polish/
When I use Internet Explorer, I get the following result when submitting the form :

     testtext=k%F8rer&testtext2=Os%26%23322%3Buga

But with any other browser (e.g. Netscape 6, Opera 6), I get :

     testtext=k%F8rer&testtext2=Os%3Fuga

My problem is that I can't process the data correctly when I don't know what the input will be.

I've set the charset to ISO-8859-1 in both the content-type, the accept-charset property on the <form>, and the ASP property Session.Codepage.

- Why does Internet Explorer perform this HTML encoding?
- Is there any way to force it to behave differently?
- Which versions of Internet Explorer does this?
0
Comment
Question by:Rohde
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 4
12 Comments
 
LVL 23

Expert Comment

by:b1xml2
ID: 6944381
when you use the following in the show.asp, you will get the correct values for MSIE

<%@language="VBScript"%>
<b><%=unescape(Request.Form)%></b>


Next, in the headers that MSIE 6 and NS 6 sends, the following is interesting....


MSIE 6
======

HTTP_CONTENT_TYPE
application/x-www-form-urlencoded

HTTP_ACCEPT_ENCODING
gzip, deflate

(NO HTTP_ACCEPT_CHARSET)
NS 6
====

HTTP_CONTENT_TYPE
application/x-www-form-urlencoded

HTTP_ACCEPT_ENCODING
gzip, deflate, compress;q=0.9

HTTP_ACCEPT_CHARSET
ISO-8859-1, utf-8;q=0.66, *;q=0.66


By the way, the correct encoding is not ISO_8859-1 but ISO-8859-1

0
 

Author Comment

by:Rohde
ID: 6944412
ISO_8859-1 -> Thanks, took the wrong one from the IANA list.

The unescape doesn't help (and it is done automatically by the ASP engine when accessing the Request.Form collection, e.g. Request.Form("testtext")).

The unescaped form becomes "testtext=k¸rer&testtext2=Os&#322;uga".
Note that the ¸ is shown "as is", whereas the polish character is HTML encoded.... not very nice, either both should be HTML encoded, or none of them should.

If I change all the charset settings from iso-8859-1 to iso-8859-2, the resulting string in Internet Explorer becomes "testtext=k%26oslash%3Brer&testtext2=Os%B3uga" (unescaped "testtext=k&oslash;rer&testtext2=Osùuga"). As you can see this just switches it so the ¸ becomes HTMLEncoded (&oslash; is ¸), and the polish character (ù) is written as is.
0
 

Author Comment

by:Rohde
ID: 6944424
The headers that you describe, was those the ones returned from my testsite, or just a general observation?

What program/util do you use to view the request and headers?
I'm using various collection from ASP to check them, but it's tedious, and I'm not sure that it is not sometimes doing "auto-stuff" behind my back, so that I don't get the full picture.
0
RHCE - Red Hat OpenStack Prep Course

This course will provide in-depth training so that students who currently hold the EX200 & EX210 certifications can sit for the EX310 exam. Students will learn how to deploy & manage a full Red Hat environment with Ceph block storage, & integrate Ceph into other OpenStack service

 
LVL 23

Expert Comment

by:b1xml2
ID: 6944467
I am running IIS 5 and use MSIE 6, when I use the unescape function, the data is correctly viewed inside MSIE 6. and the encoding settings for MSIE 6 is Western European ISO.  
0
 
LVL 23

Expert Comment

by:b1xml2
ID: 6944473
the HTTP headers were obtained from my dev site to see what's going on under the covers.
0
 

Author Comment

by:Rohde
ID: 6944745
No problems at all....hmmm, would you try posting the exact output from "Response.Write Request.Form", just to make sure (because I'm getting fairly confused by this :)

Btw.: My files look like this:

default.asp
*****************
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1
%>
<!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Polish Test</title>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>

<body>
<form accept-charset="ISO-8859-1" action="show.asp" name="polishForm" method="post">
     <input type="text" value="k&#248;rer" name="testtext"><br>
     <input type="text" value="Os&#322;uga" name="testtext2"><br>
     <br>
     <br>
     <input type="submit"><br>
     <br>
</form>

</body>
</html>

show.asp
*******************
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1

Response.Write Request.Form
Response.Write VBNewline & VBNewline
Response.Write Request.ServerVariables("ALL_RAW")
%>
0
 
LVL 23

Expert Comment

by:b1xml2
ID: 6944856
try changing the show.asp to
============================
<%
Option Explicit

Session.Codepage = 28591

Response.ExpiresAbsolute = Now -1

Response.Write unescape(Request.Form)
Response.Write VBNewline & VBNewline
Response.Write Request.ServerVariables("ALL_RAW")
%>

and for you info about the ServerVariables

my show.asp is this

<%@language="VBScript"%>
<html>
<head>
<title>Polish Test</title>
<meta http-equiv="content-type" content="text/html;charset=iso-8859-1">
</head>
<body>
<%
 For Each oItem In Request.ServerVariables
  Response.Write "<div>" & oItem & "</div>"
  Response.Write "<xmp>" & Request.ServerVariables(oItem) & "</xmp>"
  Response.Write "<hr>"
 
 Next

%>
<b style="font-variant:small-caps"><%Response.Write unescape(Request.Form)%></b>
</body>
</html>
0
 
LVL 19

Accepted Solution

by:
webwoman earned 300 total points
ID: 6944908
Can you just set it as Unicode, rather than specific character sets? That should allow all the characters, and IE should handle it better.

At least, I think it should...I don't have to deal with multiple languages, so it's a guess. But it is something that's built into most MS stuff.
0
 

Author Comment

by:Rohde
ID: 6944936
Ok, thanks.
Seems that the show.asp that you posted is exactly the same except for the unescape. But the escaped characters are not the problem, the URL encoding has been done before the HTML encoding.

When I use your show.asp, I get the following string :

   testtext=kører&testtext2=Os&#322;uga

The polish special char is HTMLEncoded, but not the Danish one.....

Try viewing the source and posting the exact string that the page outputs (not the one that is shown by the browser, because that will show the HTML encoded special-char correctly).
0
 

Author Comment

by:Rohde
ID: 6945035
Interleaving posts webwoman, didn't mean to ignore you....

Yes I have also considered UNIcode. I would have to install it on the server, and change the database. (and the rest of the application is finished, so I'd rather not have to go into all that if it could be avoided).

But does anyone have any good or bad experiences with using UNIcode in a IIS/SQL Server environment? Would that be an easy way of working around the problem?
0
 

Author Comment

by:Rohde
ID: 6945173
Ok, I've tried it......... and I actually got it to work!! :-)
(when I'd tried using unicode earlier, I'd used the Family Code Page (#1200) instead of the "direct" codepage number (#65001), and that caused it to say "unknown codepage", as if it wasn't installed on the server...... the Family Code Page number for e.g. ISO-8859-1 DOES work.....).

I can just HTML encode the unicode that I recieve from the form submit (UTF-8 encoded), before I put it in the database (although it is not a particularly nice thing to do). (storing the data HTML encoded is only a problem if I later want to use the data for something other than websites).

I am however still interested in why Internet Explorer HTML encodes characters outside it's current codepage, and what I can do about it.
0
 

Author Comment

by:Rohde
ID: 6947679
Thanks for the hints :)

After fixing the normal forms I found out that I also had to change the upload component that I was using(http://www.safileup.com) because it didn't support changing codepage. (switched to ASPUpload from http://www.persits.com/)
0

Featured Post

How to Use the Help Bell

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention.  Check out this how-to article for more information.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When crafting your “Why Us” page, there are a plethora of pitfalls to avoid. Follow these five tips, and you’ll be well on your way to creating an effective page.
Ready to get certified? Check out some courses that help you prepare for third-party exams.
The viewer will learn how to count occurrences of each item in an array.
The viewer will get a basic understanding of what section 508 compliance can entail, learn about skip navigation links, alt text, transcripts, and font size controls.
Suggested Courses

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question