Solved

UTF-8 character set does not support the ¬  character (logical not sign) in ASP pages

Posted on 2006-06-15
10
527 Views
Last Modified: 2008-01-09
I am moving a product over to unicode from all ascii/varchar data.

I changed all pages to have

Response.Charset = "UTF-8"

and

<meta http-equiv="content-type" content="text/html;charset=utf-8">

But we have made extensive use throughout the product of the ¬ character as a separator. When I debug the following line under UTF8

            sHTMLTree = dc.GetPage("HTMLTree", "1", "CURRENTLIST", 0, 0, 0, 0, "", 0, 0, "MainMenu", 2, "0¬-1¬1,")

I see this..

            sHTMLTree = dc.GetPage("HTMLTree", "1", "CURRENTLIST", 0, 0, 0, 0, "", 0, 0, "MainMenu", 2, "0-11,")

So the IIS process has removed all ¬ characters.

Is there a charset we can use that would support this character ? Most other characters like dollar, hash(pound in us) etc have been used so we dont really know an alternative ?
0
Comment
Question by:plq
  • 5
  • 4
10 Comments
 
LVL 8

Author Comment

by:plq
ID: 16912205
By the way, if I change my browser from ISO to unicode, the logical not sign (even on this question html page) changes to a question mark

&#172
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 16912435
The UTF-8 encoding definitely supports this character: the problem isn't in UTF-8. Are you certain that the thing is missing and hasn't been filtered out by the debug display? The NOT sign is U+00AC, or 0xC2 ,0xAC in UTF-8. Another thing to watch out for is whether the original source has been stored correctly - perhaps the editor has stripped out the character? Of course, you may have included the character as its native Windows encoding, instead of its UTF value, which is meaningless.

When you change your browser from ISO (which ISO? I presume you mean ISO-8859-1 or Latin-1) to Unicode, the reason it doesn't display is because you have included the character as its Windows CP1252 value, not as UTF8, i.e. this page isn't Unicode, so it can't display as Unicode.
0
 
LVL 9

Accepted Solution

by:
smidgie82 earned 250 total points
ID: 16912502
The ¬ character isn't part of the ASCII-7 character set (rather, the extended ASCII, also known as ISO-8859-1).  As such, if you just change the character set to render in without ensuring that the encoding on disk is properly updated to match, it will cause problems with that character (and any character with index above 127 in the character set).  This should solve the problem:

Back up all your code first.  Then, copy the existing code (viewed as ISO-8859-1) into Notepad.  Make sure it displays the way you want it to. Now do a "Save As," and select "UTF-8" under the "Encoding" drop-down menu.  Save it over itself.  You'll need to repeat this for every file, which could be a major task depending on the size of your site, but I don't know of any faster automated way to do it.  After you get done, the file encoding should match the display character set, and you shouldn't see any more problems.
0
 
LVL 15

Assisted Solution

by:bpmurray
bpmurray earned 250 total points
ID: 16912600
Actually, ASCII is only 7-bit (ISO-646). It NEVER has 8 bits and there is actually no such thing as extended ASCII. The usual confusion is that ANSI and ISO-8859-1 and Windows 1252 are the same. In fact, ANSI Latin-1 and ISO 8859-1 are identical, and the C1 region is not populated. However, Microsoft have seen fit to put characters into the range 0x80-0x9F. However, the NOT sign *is* part of 8859-1 and has the value 0xAC.

The bug here is that plg is including the NOT character as is, as the value 0xAC, in his code. This is *NOT* UTF-8. Therefore, it's being ignored.
0
 
LVL 8

Author Comment

by:plq
ID: 16912608
OK go easy on me as I'm not totally clear on encoding just yet.

The dc object is written in vb.net - that dc.GetPage function parameter gets the value "0-11" instead of "0¬-1¬1" so the memory behind the scenes has definitely been modified. When I remove the meta tag the ¬ characters are preserved and get passed through to vb.

So, lets go back to basics. I create a test.asp with the following content.

<meta http-equiv="content-type" content="text/html;charset=utf-8">
<%
      Response.Write "Hello¬World"
%>

.. and that worked fine.

So now I'm going to paste this into the product to see where ¬ starts getting ignored. I will come back in a short while...
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 8

Author Comment

by:plq
ID: 16912645
Right. .. crossed comments. I think I understand this.

The 300 asp pages are being maintained in vb2005 so I think it will give me an option to upgrade the whole lot to unicode if I just paste a bit of unicode in there.. I will report back on this.
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 16912706
Just be careful when you refer to "Unicode". There are a number of representations of this:

   *  UTF-8: this is 1-3 bytes to encode all of Unicode
   *  UTF-16: this uses 16-bit values (unsigned short) to encode the Basic Multilingual Plane (BMP), using "surrogates" to address non-BMP characters, resulting in 2 x 16-bit values per character for non-BMP chars
   *  UCS-2: like UTF-16, this uses 16-bit values, but does not support surrogates, i.e. it only supports characters in the BMP
   *  UTF-32: this uses 32-bit values as a linear space to encode all of Unicode as fixed-size characters

UTF-8 is popular because it looks like ASCII for characters < 0x80, so it encodes English in 1 byte. I also does not have NULL bytes, so it doesn't waste space and can be processed in a manner similar to usual string processing.
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 16912719
Oh - forgor to say: we're not being hard on you, only typing quickly so it may come across as blunt. No offence meant - only trying to help! :-)
0
 
LVL 8

Author Comment

by:plq
ID: 16912791
Yes, I saved the problematic ASP file as unicode and now its working.

Now I've just got to do 299 more. yyyyyuuukk actually 338 more, bigger yuuk

Maybe I'll write a vb program to do it instead !

I'll keep this Question open just until the conversion is done.

thanks for your help
0
 
LVL 8

Author Comment

by:plq
ID: 16913277
Excellent. with a bit of swish global replace in vs2005 I now have all the pages saved as unicode without writing any programs to do it.

Thanks
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Any business that wants to seriously grow needs to keep the needs and desires of an international audience of their websites in mind. Making a website friendly to international users isn’t prohibitively expensive and can provide an incredible return…
Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
This tutorial walks through the best practices in adding a local business to Google Maps including how to properly search for duplicates, marker placement, and inputing business details. Login to your Google Account, then search for "Google Mapmaker…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now