?
Solved

UTF-8 character set does not support the ¬  character (logical not sign) in ASP pages

Posted on 2006-06-15
10
Medium Priority
?
567 Views
Last Modified: 2008-01-09
I am moving a product over to unicode from all ascii/varchar data.

I changed all pages to have

Response.Charset = "UTF-8"

and

<meta http-equiv="content-type" content="text/html;charset=utf-8">

But we have made extensive use throughout the product of the ¬ character as a separator. When I debug the following line under UTF8

            sHTMLTree = dc.GetPage("HTMLTree", "1", "CURRENTLIST", 0, 0, 0, 0, "", 0, 0, "MainMenu", 2, "0¬-1¬1,")

I see this..

            sHTMLTree = dc.GetPage("HTMLTree", "1", "CURRENTLIST", 0, 0, 0, 0, "", 0, 0, "MainMenu", 2, "0-11,")

So the IIS process has removed all ¬ characters.

Is there a charset we can use that would support this character ? Most other characters like dollar, hash(pound in us) etc have been used so we dont really know an alternative ?
0
Comment
Question by:plq
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
10 Comments
 
LVL 8

Author Comment

by:plq
ID: 16912205
By the way, if I change my browser from ISO to unicode, the logical not sign (even on this question html page) changes to a question mark

&#172
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 16912435
The UTF-8 encoding definitely supports this character: the problem isn't in UTF-8. Are you certain that the thing is missing and hasn't been filtered out by the debug display? The NOT sign is U+00AC, or 0xC2 ,0xAC in UTF-8. Another thing to watch out for is whether the original source has been stored correctly - perhaps the editor has stripped out the character? Of course, you may have included the character as its native Windows encoding, instead of its UTF value, which is meaningless.

When you change your browser from ISO (which ISO? I presume you mean ISO-8859-1 or Latin-1) to Unicode, the reason it doesn't display is because you have included the character as its Windows CP1252 value, not as UTF8, i.e. this page isn't Unicode, so it can't display as Unicode.
0
 
LVL 9

Accepted Solution

by:
smidgie82 earned 1000 total points
ID: 16912502
The ¬ character isn't part of the ASCII-7 character set (rather, the extended ASCII, also known as ISO-8859-1).  As such, if you just change the character set to render in without ensuring that the encoding on disk is properly updated to match, it will cause problems with that character (and any character with index above 127 in the character set).  This should solve the problem:

Back up all your code first.  Then, copy the existing code (viewed as ISO-8859-1) into Notepad.  Make sure it displays the way you want it to. Now do a "Save As," and select "UTF-8" under the "Encoding" drop-down menu.  Save it over itself.  You'll need to repeat this for every file, which could be a major task depending on the size of your site, but I don't know of any faster automated way to do it.  After you get done, the file encoding should match the display character set, and you shouldn't see any more problems.
0
WordPress Tutorial 1: Installation & Setup

WordPress is a very popular option for running your web site and can be used to get your content online quickly for the world to see. This guide will walk you through installing the WordPress server software and the initial setup process.

 
LVL 15

Assisted Solution

by:bpmurray
bpmurray earned 1000 total points
ID: 16912600
Actually, ASCII is only 7-bit (ISO-646). It NEVER has 8 bits and there is actually no such thing as extended ASCII. The usual confusion is that ANSI and ISO-8859-1 and Windows 1252 are the same. In fact, ANSI Latin-1 and ISO 8859-1 are identical, and the C1 region is not populated. However, Microsoft have seen fit to put characters into the range 0x80-0x9F. However, the NOT sign *is* part of 8859-1 and has the value 0xAC.

The bug here is that plg is including the NOT character as is, as the value 0xAC, in his code. This is *NOT* UTF-8. Therefore, it's being ignored.
0
 
LVL 8

Author Comment

by:plq
ID: 16912608
OK go easy on me as I'm not totally clear on encoding just yet.

The dc object is written in vb.net - that dc.GetPage function parameter gets the value "0-11" instead of "0¬-1¬1" so the memory behind the scenes has definitely been modified. When I remove the meta tag the ¬ characters are preserved and get passed through to vb.

So, lets go back to basics. I create a test.asp with the following content.

<meta http-equiv="content-type" content="text/html;charset=utf-8">
<%
      Response.Write "Hello¬World"
%>

.. and that worked fine.

So now I'm going to paste this into the product to see where ¬ starts getting ignored. I will come back in a short while...
0
 
LVL 8

Author Comment

by:plq
ID: 16912645
Right. .. crossed comments. I think I understand this.

The 300 asp pages are being maintained in vb2005 so I think it will give me an option to upgrade the whole lot to unicode if I just paste a bit of unicode in there.. I will report back on this.
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 16912706
Just be careful when you refer to "Unicode". There are a number of representations of this:

   *  UTF-8: this is 1-3 bytes to encode all of Unicode
   *  UTF-16: this uses 16-bit values (unsigned short) to encode the Basic Multilingual Plane (BMP), using "surrogates" to address non-BMP characters, resulting in 2 x 16-bit values per character for non-BMP chars
   *  UCS-2: like UTF-16, this uses 16-bit values, but does not support surrogates, i.e. it only supports characters in the BMP
   *  UTF-32: this uses 32-bit values as a linear space to encode all of Unicode as fixed-size characters

UTF-8 is popular because it looks like ASCII for characters < 0x80, so it encodes English in 1 byte. I also does not have NULL bytes, so it doesn't waste space and can be processed in a manner similar to usual string processing.
0
 
LVL 15

Expert Comment

by:bpmurray
ID: 16912719
Oh - forgor to say: we're not being hard on you, only typing quickly so it may come across as blunt. No offence meant - only trying to help! :-)
0
 
LVL 8

Author Comment

by:plq
ID: 16912791
Yes, I saved the problematic ASP file as unicode and now its working.

Now I've just got to do 299 more. yyyyyuuukk actually 338 more, bigger yuuk

Maybe I'll write a vb program to do it instead !

I'll keep this Question open just until the conversion is done.

thanks for your help
0
 
LVL 8

Author Comment

by:plq
ID: 16913277
Excellent. with a bit of swish global replace in vs2005 I now have all the pages saved as unicode without writing any programs to do it.

Thanks
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There’s a good reason for why it’s called a homepage – it closely resembles that of a physical house and the only real difference is that it’s online. Your website’s homepage is where people come to visit you. It’s the family room of your website wh…
Australian government abolished Visa 457 earlier this April and this article describes how this decision might affect Australian IT scene and IT experts.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Any person in technology especially those working for big companies should at least know about the basics of web accessibility. Believe it or not there are even laws in place that require businesses to provide such means for the disabled and aging p…
Suggested Courses

800 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question