Solved

UTF-8 character set does not support the ¬  character (logical not sign) in ASP pages

Posted on 2006-06-15
10
554 Views
Last Modified: 2008-01-09
I am moving a product over to unicode from all ascii/varchar data.

I changed all pages to have

Response.Charset = "UTF-8"

and

<meta http-equiv="content-type" content="text/html;charset=utf-8">

But we have made extensive use throughout the product of the ¬ character as a separator. When I debug the following line under UTF8

            sHTMLTree = dc.GetPage("HTMLTree", "1", "CURRENTLIST", 0, 0, 0, 0, "", 0, 0, "MainMenu", 2, "0¬-1¬1,")

I see this..

            sHTMLTree = dc.GetPage("HTMLTree", "1", "CURRENTLIST", 0, 0, 0, 0, "", 0, 0, "MainMenu", 2, "0-11,")

So the IIS process has removed all ¬ characters.

Is there a charset we can use that would support this character ? Most other characters like dollar, hash(pound in us) etc have been used so we dont really know an alternative ?
0
Comment
Question by:plq
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
10 Comments
 
LVL 8

Author Comment

by:plq
ID: 16912205
By the way, if I change my browser from ISO to unicode, the logical not sign (even on this question html page) changes to a question mark

&#172
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 16912435
The UTF-8 encoding definitely supports this character: the problem isn't in UTF-8. Are you certain that the thing is missing and hasn't been filtered out by the debug display? The NOT sign is U+00AC, or 0xC2 ,0xAC in UTF-8. Another thing to watch out for is whether the original source has been stored correctly - perhaps the editor has stripped out the character? Of course, you may have included the character as its native Windows encoding, instead of its UTF value, which is meaningless.

When you change your browser from ISO (which ISO? I presume you mean ISO-8859-1 or Latin-1) to Unicode, the reason it doesn't display is because you have included the character as its Windows CP1252 value, not as UTF8, i.e. this page isn't Unicode, so it can't display as Unicode.
0
 
LVL 9

Accepted Solution

by:
smidgie82 earned 250 total points
ID: 16912502
The ¬ character isn't part of the ASCII-7 character set (rather, the extended ASCII, also known as ISO-8859-1).  As such, if you just change the character set to render in without ensuring that the encoding on disk is properly updated to match, it will cause problems with that character (and any character with index above 127 in the character set).  This should solve the problem:

Back up all your code first.  Then, copy the existing code (viewed as ISO-8859-1) into Notepad.  Make sure it displays the way you want it to. Now do a "Save As," and select "UTF-8" under the "Encoding" drop-down menu.  Save it over itself.  You'll need to repeat this for every file, which could be a major task depending on the size of your site, but I don't know of any faster automated way to do it.  After you get done, the file encoding should match the display character set, and you shouldn't see any more problems.
0
Guide to Performance: Optimization & Monitoring

Nowadays, monitoring is a mixture of tools, systems, and codes—making it a very complex process. And with this complexity, comes variables for failure. Get DZone’s new Guide to Performance to learn how to proactively find these variables and solve them before a disruption occurs.

 
LVL 15

Assisted Solution

by:bpmurray
bpmurray earned 250 total points
ID: 16912600
Actually, ASCII is only 7-bit (ISO-646). It NEVER has 8 bits and there is actually no such thing as extended ASCII. The usual confusion is that ANSI and ISO-8859-1 and Windows 1252 are the same. In fact, ANSI Latin-1 and ISO 8859-1 are identical, and the C1 region is not populated. However, Microsoft have seen fit to put characters into the range 0x80-0x9F. However, the NOT sign *is* part of 8859-1 and has the value 0xAC.

The bug here is that plg is including the NOT character as is, as the value 0xAC, in his code. This is *NOT* UTF-8. Therefore, it's being ignored.
0
 
LVL 8

Author Comment

by:plq
ID: 16912608
OK go easy on me as I'm not totally clear on encoding just yet.

The dc object is written in vb.net - that dc.GetPage function parameter gets the value "0-11" instead of "0¬-1¬1" so the memory behind the scenes has definitely been modified. When I remove the meta tag the ¬ characters are preserved and get passed through to vb.

So, lets go back to basics. I create a test.asp with the following content.

<meta http-equiv="content-type" content="text/html;charset=utf-8">
<%
      Response.Write "Hello¬World"
%>

.. and that worked fine.

So now I'm going to paste this into the product to see where ¬ starts getting ignored. I will come back in a short while...
0
 
LVL 8

Author Comment

by:plq
ID: 16912645
Right. .. crossed comments. I think I understand this.

The 300 asp pages are being maintained in vb2005 so I think it will give me an option to upgrade the whole lot to unicode if I just paste a bit of unicode in there.. I will report back on this.
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 16912706
Just be careful when you refer to "Unicode". There are a number of representations of this:

   *  UTF-8: this is 1-3 bytes to encode all of Unicode
   *  UTF-16: this uses 16-bit values (unsigned short) to encode the Basic Multilingual Plane (BMP), using "surrogates" to address non-BMP characters, resulting in 2 x 16-bit values per character for non-BMP chars
   *  UCS-2: like UTF-16, this uses 16-bit values, but does not support surrogates, i.e. it only supports characters in the BMP
   *  UTF-32: this uses 32-bit values as a linear space to encode all of Unicode as fixed-size characters

UTF-8 is popular because it looks like ASCII for characters < 0x80, so it encodes English in 1 byte. I also does not have NULL bytes, so it doesn't waste space and can be processed in a manner similar to usual string processing.
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 16912719
Oh - forgor to say: we're not being hard on you, only typing quickly so it may come across as blunt. No offence meant - only trying to help! :-)
0
 
LVL 8

Author Comment

by:plq
ID: 16912791
Yes, I saved the problematic ASP file as unicode and now its working.

Now I've just got to do 299 more. yyyyyuuukk actually 338 more, bigger yuuk

Maybe I'll write a vb program to do it instead !

I'll keep this Question open just until the conversion is done.

thanks for your help
0
 
LVL 8

Author Comment

by:plq
ID: 16913277
Excellent. with a bit of swish global replace in vs2005 I now have all the pages saved as unicode without writing any programs to do it.

Thanks
0

Featured Post

Instantly Create Instructional Tutorials

Contextual Guidance at the moment of need helps your employees adopt to new software or processes instantly. Boost knowledge retention and employee engagement step-by-step with one easy solution.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
Because your company can’t afford for you to make SEO mistakes, you’ll want to ensure you’re taking the right steps each and every time you post a new piece of content. This list of optimization do’s and don’ts can help you become an SEO wizard.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will get a basic understanding of what section 508 compliance can entail, learn about skip navigation links, alt text, transcripts, and font size controls.

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question