Upper ASCII

singleton
singleton used Ask the Experts™
on
I have a webpage that has some characters that I would call Upper ASCII or extended ASCII, i.e. if you look a character in hex, the upper (x'80') bit is set

Examples are double quote marks that angle left or right, n with a tilde, letters with umlauts, etc

If I look at the page on my HD it displays fine, but if I look at it on the net all of these characters have a black diamond with a question mark

I think there is some statement I can put at the top of my page, to tell it to use the expanded character set, but what is it?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
In HTML set character encoding in the meta tags: http://en.wikipedia.org/wiki/Character_encodings_in_HTML

UTF-8 contains all ASCII

Author

Commented:
I presume that means
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

It does not seem to work
http://donsingleton.org/_islam101.htm is the page
Dave BaldwinFixer of Problems
Most Valuable Expert 2014

Commented:
I changed it to ISO-8859-1 in my browser and the question marks went away.  Character set can be quite confusing.  ASCII is the lowest 7-bit characters.  (x'80') and above is something else in addition to containing the ASCII set.  Unicode and UTF-8 were developed in an attempt to encompass all known languages.  But just changing the character set does not 'translate' the characters because you haven't changed the actual codes that were used.  If the coding and the interpretation don't match, you get funny symbols on the page.
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">

Open in new window

Angular Fundamentals

Learn the fundamentals of Angular 2, a JavaScript framework for developing dynamic single page applications.

Author

Commented:
I made the change you suggest but still get errors. How do I change the characters?
For example the double quote at the start of a line is x'93'

I realize I could change to a lower ascii, i.e. the up and down double quote is x'22'

I could live with it on this webpage, by just changing the characters, but the reason I asked is that I have a friend trying to update http://artsrolla.org/vp/GXmqi/board-of-directors.html and everytime they have a &nbsp; their editor is throwing in a x'3F' and she has no way to stop that, yet she gets the same error.

I found the similar thing on my site, and I have complete control over it, so I thought I would seek an answer to my problem, and maybe it would fix hers

Is it possible I should specify Unicode instead of UTF-8 and if so how, just replace the UTF-8 with Unicode?
Dave BaldwinFixer of Problems
Most Valuable Expert 2014

Commented:
UTF-8 is Unicode on the web.  But your page was not done in UTF-8 originally.  Did you refresh your page in your browser after you made the changes?  If you didn't you may be looking at the old page displayed from cache in your browser.  Use Ctl-F5 to force a page refresh whenever you are making changes.

You have the 'correct' charset in that page but it is not being displayed correctly.  Something is causing to not be interpreted correctly.  Try removing <!--Machine Engine Emilia version 9.2.1--> from the file.

Author

Commented:
Page was written in text editor doing cut and paste from some web pages which added the non 7bit ascii characters I want to display

Yes I refreshed and when it did not work I did a view source to make sure the upload worked.

I dont understand what <!--Machine Engine Emilia version 9.2.1--> means. I dont have that line or see it on view source

Am using Firefox 5 for my browser

Here is the first part of my HTML

<html>
<head>
<title>Islam 101</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
</head>
<body>
Dave BaldwinFixer of Problems
Most Valuable Expert 2014

Commented:
Oh, that was on the other page.  I don't understand.  Your charset is right but something is telling the browser (Firefox 5 for me too) to use UTF-8.  Chrome and IE8 also do it and display correctly if I manually switch to ISO-8859-1.

Author

Commented:
don't understand "if I manually switch"

page has <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> right now

It failed in Firefox 5 both now and when it was UTF-8
Dave BaldwinFixer of Problems
Most Valuable Expert 2014

Commented:
If you look under View -> Character Encoding, you will see that something is switching it to UTF-8 after the page loads.  That is what is causing the problem and I have not been able to figure out what is doing that.  Since you have set it to ISO-8859-1, it should stay that way but it is not.

Author

Commented:
A lot of websites now, especially ones with CSS have DOCTYPE statements on them. Do they possibly affect character encoding?

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
 "http://www.w3.org/TR/html4/loose.dtd">

I agree, I really need to make ISO-8859-1 stick. I did as you suggested, and the page does render correctly with it instead of UTF-8

Author

Commented:
I was going to try to figure out how to add this question to the Firefox Zone, thinking you were saying it was now just a firefox question, but IE8 still puts it in UTF-8, and there switching to Western European fixes it.

Author

Commented:
My idea of Doctype may not fix it, because the Artsrolla page I told you I did not have control of (which is why I am addressing this question in my website), has a <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> at the top of it.
They dont have <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">, but that does not seem to be enough for me so it probably would not help them
Dave BaldwinFixer of Problems
Most Valuable Expert 2014

Commented:
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> is the 4th line in the Artsrolla page and the DOCTYPE is not the problem because it does not affect the 'charset'.  Something that is being loaded After the page is loaded is in UTF-8 and is changing the page encoding.  Either thru javascript or some other mechanism.

The basic way to find out what is causing it is to strip things from a test copy and keep refreshing it until the encoding stops changing.

Author

Commented:
It is in Artsrolla it is because I told my friend that should work, but it does not. But there is a lot I cant control about it

that is why I presented the question based on my http://donsingleton.org/_islam101.htm page because it has a similar problem and I know for certain it has no javascript or anything like that.

It was written in html in a text editor.

If you want I can make a smaller page that does the same sort of thing.

I have tried adding <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> and it did not help.

I added <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> and it did not help

Do you want me to make a very short page?
Dave BaldwinFixer of Problems
Most Valuable Expert 2014

Commented:
No, I just figured it out and it's not something you did.  I downloaded a copy of your page and put it on one of my webservers and it works perfectly and uses the correct character set.  So I checked the webserver that you are running on.  It is Forcing UTF-8 encoding and so is artsrolla.org.  You're probably on the same hosting.

You need to call your hosting company and tell them not to do that.  Their headers include "Content-Type: text/html; charset=UTF-8".  'Friendlier' hosting would just have "Content-Type: text/html;" which is what I see on mine.

Author

Commented:
This is even wierder

I made a short page (actually 2)

http://donsingleton.org/_charset.htm has mainly the characters that gave me problems in http://donsingleton.org/_islam101.htm, but just them and it has a <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

http://donsingleton.org/_charset2.htm is identical but with <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Both display fine in Firefox 5 and the dot is on Unicode (UTF-8) and if I switch ti Western (ISO-8859-1) it messes up

http://donsingleton.org/_islam101.htm has more text, but no javascript, and it has problems at Unicode (UTF-8) and is fixed with Western (ISO-8859-1)

The short file was actually made by cut and pasting from the longer file (plus one line from my friends file. It even has the same title statement as my large file. And it is on the same server (a Linux server)

I am totally confused.

Author

Commented:
DaveBaldwin, then why was my experience (comment above) exactly the opposite on the short _charset file.

http://donsingleton.org/_charset.htm and http://donsingleton.org/_charset2.htm are on the same server you say is forcing UTF-8
Dave BaldwinFixer of Problems
Most Valuable Expert 2014

Commented:
I don't know but I do know that you didn't get what you thought with those two files.  And they are both being forced to UTF-8 on my computer.  I just checked again.  Now you've managed to convert or upload one of them in UTF-8 so it does display properly.  Don't know how you did that.

None of that changes the problem I found.  Your host is overriding what you have in the file.  I think that is wrong.

Author

Commented:
All were uploaded with WinSCP to the same folder on the server; If you want I can upload all of them at the same time

how do the two _charset files display on your computer?  Are they both forced to UTF-8?

What happens when you look at them with Western (ISO-8859-1)?
Dave BaldwinFixer of Problems
Most Valuable Expert 2014

Commented:
I'm no longer interested in the small files because they don't address the original problem.  But yes, they are both forced to UTF-8.  The ONLY resolution to this is to get your hosting company to change that header.  Your 'Islam101.html' file works perfectly on my web server that does not have that problem.  Every other server that I have to work with but one uses "Content-Type: text/html" in the headers.  Note that I am talking about the 'headers' that the server sends Before it sends your page, something you have no control over.

Author

Commented:
The small files seem to me to be significant, because they appear to reverse the problem and they are on the same server, uploaded by the same program.

And talking to my hosting company is difficult, because I own the server.

I had someone help me set it up, and I am trying to get in touch with him, but so far have been unsuccessful.
Dave BaldwinFixer of Problems
Most Valuable Expert 2014

Commented:
You're misleading yourself, there is only one solution.  Those files are not reversing the problem.

Author

Commented:
Do they not both work on your computer?

And if you switch you computer to Western (ISO-8859-1) do they not mess up?
Dave BaldwinFixer of Problems
Most Valuable Expert 2014

Commented:
You're not listening.  You changed the actual file encoding somehow, I can see it in my browser.  Whether they mess up or not will not change the original problem on the server(s).  Unless you can force or convert all of the Actual character encoding (not just the <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />)  in All of your files to UTF-8, there is nothing you can do to fix the original problem on the server.

This site http://www.alanwood.net/ can tell you about Unicode and character sets.  It is not a little problem.

Author

Commented:
How can I have changed the actual file encoding?

They were all written on the same laptop using the same text editor (ultraedit) and uploaded using the same program (winscp) to the root folder of the same website (donsingleton.org) on the same server. No change was made to the server (I own it, and the only program I have used is WinSCP to upload). I can access it with puTTY but I have not.

The files are different lengths, but is that significant?

And if someone magically changed the server why does my http://donsingleton.org/_islam101.htm still have a problem (I just did a refresh).

I realise you think the server is at fault, and perhaps it is. but would you please look at http://donsingleton.org/_charset.htm and http://donsingleton.org/_charset2.htm and see if they appear the same or different, and if they look good, and are showing as UTF-8 what do you get if you go to view / encoding and tell me what happened.

I will then close the question and give you an A. You deserve it for all you have done.
Fixer of Problems
Most Valuable Expert 2014
Commented:
Both of the little files are being served as UTF-8 because your server is forcing it.  And other than the charset declaration, they are identical and encoded as UTF-8.  I can not see thru the internet to see how you did that.  If you understood about character sets, you might know how you did it.  Here is the UltraEdit page on Unicode: http://www.ultraedit.com/support/tutorials_power_tips/ultraedit/unicode.html  Notice the part about automatically detecting Unicode pages.  It could happen simply because you pasted something with Unicode characters into your document.

Your server was not magically changed, nothing here has been magic.

Author

Commented:
I said all along that I pasted characters from websites into a ascii text document, and I asked whether the characters were cprrectly displayed, while they were not in a larger file, and whether using view to select ISO-8859-1 made them fail while it fixed the larger file, but I guess I am not going to get that answer. And I promised to close the question.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial