UTF-8 character set does not support the ¬  character (logical not sign) in ASP pages

Posted on 2006-06-15
Last Modified: 2008-01-09
I am moving a product over to unicode from all ascii/varchar data.

I changed all pages to have

Response.Charset = "UTF-8"


<meta http-equiv="content-type" content="text/html;charset=utf-8">

But we have made extensive use throughout the product of the ¬ character as a separator. When I debug the following line under UTF8

            sHTMLTree = dc.GetPage("HTMLTree", "1", "CURRENTLIST", 0, 0, 0, 0, "", 0, 0, "MainMenu", 2, "0¬-1¬1,")

I see this..

            sHTMLTree = dc.GetPage("HTMLTree", "1", "CURRENTLIST", 0, 0, 0, 0, "", 0, 0, "MainMenu", 2, "0-11,")

So the IIS process has removed all ¬ characters.

Is there a charset we can use that would support this character ? Most other characters like dollar, hash(pound in us) etc have been used so we dont really know an alternative ?
Question by:plq
  • 5
  • 4

Author Comment

ID: 16912205
By the way, if I change my browser from ISO to unicode, the logical not sign (even on this question html page) changes to a question mark

LVL 15

Expert Comment

ID: 16912435
The UTF-8 encoding definitely supports this character: the problem isn't in UTF-8. Are you certain that the thing is missing and hasn't been filtered out by the debug display? The NOT sign is U+00AC, or 0xC2 ,0xAC in UTF-8. Another thing to watch out for is whether the original source has been stored correctly - perhaps the editor has stripped out the character? Of course, you may have included the character as its native Windows encoding, instead of its UTF value, which is meaningless.

When you change your browser from ISO (which ISO? I presume you mean ISO-8859-1 or Latin-1) to Unicode, the reason it doesn't display is because you have included the character as its Windows CP1252 value, not as UTF8, i.e. this page isn't Unicode, so it can't display as Unicode.

Accepted Solution

smidgie82 earned 250 total points
ID: 16912502
The ¬ character isn't part of the ASCII-7 character set (rather, the extended ASCII, also known as ISO-8859-1).  As such, if you just change the character set to render in without ensuring that the encoding on disk is properly updated to match, it will cause problems with that character (and any character with index above 127 in the character set).  This should solve the problem:

Back up all your code first.  Then, copy the existing code (viewed as ISO-8859-1) into Notepad.  Make sure it displays the way you want it to. Now do a "Save As," and select "UTF-8" under the "Encoding" drop-down menu.  Save it over itself.  You'll need to repeat this for every file, which could be a major task depending on the size of your site, but I don't know of any faster automated way to do it.  After you get done, the file encoding should match the display character set, and you shouldn't see any more problems.
LVL 15

Assisted Solution

bpmurray earned 250 total points
ID: 16912600
Actually, ASCII is only 7-bit (ISO-646). It NEVER has 8 bits and there is actually no such thing as extended ASCII. The usual confusion is that ANSI and ISO-8859-1 and Windows 1252 are the same. In fact, ANSI Latin-1 and ISO 8859-1 are identical, and the C1 region is not populated. However, Microsoft have seen fit to put characters into the range 0x80-0x9F. However, the NOT sign *is* part of 8859-1 and has the value 0xAC.

The bug here is that plg is including the NOT character as is, as the value 0xAC, in his code. This is *NOT* UTF-8. Therefore, it's being ignored.

Author Comment

ID: 16912608
OK go easy on me as I'm not totally clear on encoding just yet.

The dc object is written in - that dc.GetPage function parameter gets the value "0-11" instead of "0¬-1¬1" so the memory behind the scenes has definitely been modified. When I remove the meta tag the ¬ characters are preserved and get passed through to vb.

So, lets go back to basics. I create a test.asp with the following content.

<meta http-equiv="content-type" content="text/html;charset=utf-8">
      Response.Write "Hello¬World"

.. and that worked fine.

So now I'm going to paste this into the product to see where ¬ starts getting ignored. I will come back in a short while...
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.


Author Comment

ID: 16912645
Right. .. crossed comments. I think I understand this.

The 300 asp pages are being maintained in vb2005 so I think it will give me an option to upgrade the whole lot to unicode if I just paste a bit of unicode in there.. I will report back on this.
LVL 15

Expert Comment

ID: 16912706
Just be careful when you refer to "Unicode". There are a number of representations of this:

   *  UTF-8: this is 1-3 bytes to encode all of Unicode
   *  UTF-16: this uses 16-bit values (unsigned short) to encode the Basic Multilingual Plane (BMP), using "surrogates" to address non-BMP characters, resulting in 2 x 16-bit values per character for non-BMP chars
   *  UCS-2: like UTF-16, this uses 16-bit values, but does not support surrogates, i.e. it only supports characters in the BMP
   *  UTF-32: this uses 32-bit values as a linear space to encode all of Unicode as fixed-size characters

UTF-8 is popular because it looks like ASCII for characters < 0x80, so it encodes English in 1 byte. I also does not have NULL bytes, so it doesn't waste space and can be processed in a manner similar to usual string processing.
LVL 15

Expert Comment

ID: 16912719
Oh - forgor to say: we're not being hard on you, only typing quickly so it may come across as blunt. No offence meant - only trying to help! :-)

Author Comment

ID: 16912791
Yes, I saved the problematic ASP file as unicode and now its working.

Now I've just got to do 299 more. yyyyyuuukk actually 338 more, bigger yuuk

Maybe I'll write a vb program to do it instead !

I'll keep this Question open just until the conversion is done.

thanks for your help

Author Comment

ID: 16913277
Excellent. with a bit of swish global replace in vs2005 I now have all the pages saved as unicode without writing any programs to do it.


Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
PHP extract HTML using DOMXpath from nested tables 8 49
WEB Farm 6 64
responsive divs setup - what am I doing wrong 2 67
Divi Them Help with Full Width header 20 24
Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
Developer portfolios can be a bit of an enigma—how do you present yourself to employers without burying them in lines of code?  A modern portfolio is more than just work samples, it’s also a statement of how you work.
Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
The is a quite short video tutorial. In this video, I'm going to show you how to create self-host WordPress blog with free hosting service.

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now