asked on

UTF-16 Support in CSS

I need to create a CSS document having Double byte characters. All the HTML tags will be in single byte only the data that is to be displayed will be in double bytes. Please specify what encoding i have to specify in the beginning of HTML document and if somebosy has some example do let me know.

Thanks

avner

Capslock

ASKER

I have already used this solution but it didn't worked . IF somebody has any example of an HTML document do specify.

dorward

I believe you need to configure the _server_ to send the correct content type and character set in the HTTP headers - for both HTML and CSS documents.

How you do this depends on the server software you use.

Member_2_547613

Hi Capslock,

at least Mozilla wants the file TO BE saved as UTF-16 as well as the CSS, in case you use any of the content, :before, :after rules/selectors.
This happened to me with a file 'declared' as UTF-8 (using <meta>) but was actually saved as ANSI.
This is the first think you should do especially if the files are loaded from the local drive for testing.

Then check the headers sent from the server with this free online-tool: http://www.delorie.com/web/headers.html

I had to add the encoding in <link type="text/css; encoding=UTF-8"> of the stylesheet as well - this is basically the same as the meta content-type, and to be safe those additional metas:
<meta http-equiv="Content-Style-Type" content="text/css; encoding=UTF-8" />
<meta http-equiv="Content-Script-Type" content="text/javascript; encoding=UTF-8" />
(or UTF-16 in your case)

Using the meta http-equiv "statements" as well as the type attributes in link, script, a etc. were "invented" to avoid configuration changes to the server, and simply because not every server can be configured to know every single MIME-type. Because in the western world iso-8859-1 is the default for all files and works well, they are usually ommitted, and barely mentioned somewhere.
They usually override the HTTP respons sent from the server with those given in the content="" or type="" attributes.

CirTap

Member_2_547613

just checked a couple of my files: the HTML and CSS are sent w/o encoding, so it's the browser's deafult. Unlike PHP pages which by default sends an addition Content-Encoding header with default to "iso-8859-1".
So the METAs and types should work - on the client side.

Good luck

Capslock

ASKER

Well i must explain soemthing , I have a program which works on certain input file and produces an MHTML file. This program runs on Unix so there is no point of saving the file as ANSI or UTF-8. Now when i save an HTML file on NT as UTF-8 i get three byte of header which is 0xEF,0xBB,0xBF. When i change my program to put this header in the mhtml file after the MHTML header that is

Content-Type: multipart/related; boundary="==boundary-1"; type="text/html

Text displayed only to non-MIME-compliant mailers
--==boundary-1
Content-Type: text/html; charset=utf-16
Content-Transfer-Encoding: 16bit

I am able to view my mhtml file with correct UTF-8 data in the Internet Explorer . I have two questions related to this

1) I have data in UTF-8 format why do i need to specify charset as UTF-16 if modifyt this to be utf-8 and Encoding as 8 bit it doesn't work

2) This same file which opens in IE doesnot open in Netscape.

Please help me.

Member_2_547613

> Please help me
I'm trying to :)

First: I'm wondering about your "MHTML file". I never figuered out what this "Microsoft HTML File" is supposed to be and be good for in real-life practice, except there's a New-File-Template for it. I guess it appears if MS Office or FrontPad/FrontPage are installed so Explorer and MSIE may "switch" to the corresponding editor. In MSIE, a HTML file saved from Word (with .html as the extension) will always open with Word, even if you have another default editor for .html files.
This is Windoze specific stuff.

The "magic header" 0xEF,0xBB,0xBF comes from NT-Notepad (or Emacs as well) if the file is saved as UFT-8/UTF-16. The headers of course differ and also indicate the byte-order in that file: big endian/low-endian aka Intel/Motorola format.
AFAIK it's "recommended" for Unicode [XML-]files having this header if no encoding is otherwise given.
So when you file is "labels" with the magic header as UTF-16 BUT does not contain double-byte code, some User-Agents/applications may not recognize the content as what it maybe is: ASCII or ANSI, and they're not required to. It's like havng a GIF file and change the extension to .TIF and expect this will also convert the 'data' in the file.
Same applies to double-byte TEXT files. ANSI, UTF-8 and UTF-16 (Unicode) *ARE* different things!
assume a file with 6 characters only abcdef
ANSI: 6 Bytes
UTF-8: 12 bytes
UTF-16 small-endian: 14 bytes; magic-header FF FE
UTF-16 big-endian: 14 bytes; magic header FE FF

Understanding this, something like
> All the HTML tags will be in single byte only the data
> that is to be displayed will be in double bytes.
is not possible. Yure, you can create such files, but it's like having a BMP file containing some parts having JPG compression. This won't work either.
Infact, if you save a HTML file as UTF-8 or UTF-16, the tags will be encoded in douby-bytes as well. No way to mix them.

Just because MSIE is nice and 'analyzes' the data and finds out the file contains both ANSI and UTF-* although labeled as UTF-8/16 does not mean Netscape and others have to do this as well: they TRUST the header and they are right to do so.
Because MSIE always tries to be so super-smart, it's why many (mail-)viruses work in Windows.

From you last posting I assume you're actually creating an e-mail with a HTML body/content or attachment.
If you DO need double-byte characters in your mail, because there's asia text or alike in it, your mail-builder script must be able to add this "boundry" with the right encoding making this line
Content-Type: multipart/related; boundary="==boundary-1"; type="text/html
produce in
Content-Type: multipart/related; boundary="==boundary-1"; type="text/html; encoding=UTF-8"

and the WHOLE FILE must be in that format incl. tags or conformant applications will ignore it or show stupid things, showing the sencond bytes as blocks or weired special characters.

I had to do sth. in PHP lately when creating a XML file in UTF-8. Every single tag needed to be 'converted' to UFT-8 before I could add the UTF-8 data/string.
The XML and XSLT files started with <?xml version="1.0" encoding="UTF-8"?>, so the XSLT parser EXPECTED the file to BE UTF-8 and got confused about certain "invalid characters" - the parser was MSXML from Microsoft.

It *may* happen that due to the absence of the encoding in this "boundary header" your mail-builder-script assumes UTF-16 because it may also "parse" the file, finds a present Unicode magic-header (you added manually) and adds this to the final boundry item.

So what you have to do is make your attachment/body be UTF-* in total, not just the data. If it's UTF-8 you must skip the magic-header, they're only required for UTF-16 to determine the byte order.

Same will be necessary for an external CSS. As I already said: Netscape WANTS this file to have the same encoding if the CSS adds content to the HTML.

Hope this helps.

CirTap

PS: maybe you provide a URL with the html and css files in an archive (ZIP, TAR, GZ will do) so I can see what data you actually have.

Capslock

ASKER

Hi Cirtap
Thnaks for the good response . Here i am attaching the code of the MHTML Document.

Content-Type: multipart/related; boundary="==boundary-1"; type="text/html

Text displayed only to non-MIME-compliant mailers
--==boundary-1
Content-Type: text/html; charset=utf-16
Content-Transfer-Encoding: 16bit

ï»¿<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
<html>
<head>
<title>HTML bill for customer no. </title>
<meta name="customer-id" content=" ">
<style type="text/css">
</style></head>
<body>
<div style="width:1262">
<p style="top:204;left:24;" class="F0">WORK FIELD</p>
<p style="top:159;left:135;" class="F0">English Name</p>
<p style="top:204;left:147;" class="F0">English Name</p>
<p style="top:121;left:150;" class="F0">English</p>
<p style="top:1103;left:240;" class="F0">End of BILL</p>
<p style="top:122;left:261;" class="F0">English char 1</p>
<p style="top:159;left:304;" class="F0">E</p>
<p style="top:55;left:360;" class="F0">This is a test format to test UTF8</p>
<p style="top:124;left:420;" class="F0">Chinese</p>
<p style="top:159;left:422;" class="F0">åœ°ç”¨</p>
<p style="top:205;left:422;" class="F0">åœ°ç”¨</p>
<p style="top:125;left:552;" class="F0">Chinese char 1</p>
<p style="top:159;left:612;" class="F0">åœ°</p></div>
<div style="width:1262"></div></body></html>

Capslock

ASKER

Hi Cirtap
I checked today if i don't give the Magic header in the MHTML file than also it opens correctly in the IE. The only problem now i have is the MHTML header i use

1) The MHTML file opens correctly in IE and not in Netscape if i use the below header

Content-Type: multipart/related; boundary="==boundary-1"; type="text/html

Text displayed only to non-MIME-compliant mailers
--==boundary-1
Content-Type: text/html; charset=utf-16
Content-Transfer-Encoding: 16bit

2) The MHTML file opens correctly in netscape and not in IE if i use the below header

Content-Type: multipart/related; boundary="==boundary-1"; type="text/html

Text displayed only to non-MIME-compliant mailers
--==boundary-1
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

As u can see the only differece is the charset and the Encoding . Can u help me with this situation.

ASKER CERTIFIED SOLUTION

Member_2_547613

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Member_2_547613

Hi Capslock,

do you need some more info?
Still testing?
Disappointed? =)

Regards,

CirTap

COBOLdinosaur

This question has been classified abandoned. I will make a recommendation to the
moderators on its resolution in a week or two. I appreciate any comments
that would help me to make a recommendation.

<note>
Unless it is clear to me that the question has been answered I will recommend delete. It is possible that a Grade less than A will be given if no expert makes a case for an A grade. It is assumed that any participant not responding to this request is no longer interested in its final disposition.
</note>

If the user does not know how to close the question, the options are here:
https://www.experts-exchange.com/help/closing.jsp

Cd&