Convert Chinese characters to HTML

I have a document sent in by a client with chinese characters. They want me to update their Chinese language website.

How do I convert chinese characters (see uploaded word document file in ee_stuff) to HTML?
Richard KortsAsked:
Who is Participating?
 
AtanAsfalothConnect With a Mentor Commented:
One of the main reasons for Unicode to come into existence was to increase simplicity in dealing with multiple languages. Chinese characters are not so different from Latin characters, we're just not used to them. To a computer a character is just a set of bits, in the case of unicode more bits then in the case of ansi. Fact is that word recognizes a chinese character because of it's unique character value. If Macromedia Homesite uses unicode it will as well. As I don't use it I can't tell. Try copy/pasting. If you see chinese characters popping up everything works fine. If you see nothing happening or you come across unexpected results (question marks, rectangular figures, odd characters) maybe your html editor doesn't use unicode as a default.
0
 
AtanAsfalothCommented:
As far as I know chinese characters are included in the default UTF-8 character set. Just make sure you save your html in unicode and you should be safe just copy/pasting... I have no experience with this however so I wish you the best of luck!
Regards,
Atan Asfaloth
0
 
b0lsc0ttIT ManagerCommented:
rkorts,

What server languages can you use or programming languages?

I have done this various ways but in an ASP script I used a line like ...

      if i < 127 then letter = chr(i) else letter = "&#" & i & ";"

The key is to look at each character and replace it with the html compatible reference.  In the case of chinese characters setting the charset and encoding for the html is important too.  The script line above is a basic example but I can provide more specific help if needed.  I assume you want help making something to do this and not just someone to do it for you.

Let me know if you have any questions or need more information.

b0lsc0tt
0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
b0lsc0ttIT ManagerCommented:
rkorts,

What Atan said is basically right.  Depending on the other content you might be able to get it all done with the right header and encoding.

The file the Asker mentioned above is at https://filedb.experts-exchange.com/incoming/ee-stuff/5566-ChineseSite.zip .

Let me know if you have a question.

b0lsc0tt
0
 
Richard KortsAuthor Commented:
to b0lsc0tt and AtanAsfaloth,

How do I "save your html in unicode". I use Macromedia Homesite as an HTML editor. How do I do that in that? Do I just copy the chinese characters per say from the word document and put them directly into the HTML? Somehow that doesn't sound right. That's EXCATLY how I do English.

PS to b0lsc0tt - That file you referenced is the Word document I uploaded.
0
 
b0lsc0ttIT ManagerCommented:
>> That file you referenced is the Word document I uploaded. <<

I know.  I wanted to provide the link to any expert that needed the file, especially some that may not know what ee-stuff is or means.  I guess my description in the that comment was a little confusing.  Hopefully it is clearer now. :)

If your editor is Homesite then I believe it will allow you to choose the encoding to save the file.  Under Options -> Settings -> File Settings make sure the setting to "Enable non-ANSI file encoding" is checked.  Then as you save the file you should be able to specify the encoding.  That will need to be UTF-8 in this case.

Give that a test and see how it works.  I haven't used Homesite like this so I don't know if that will work or not but it should.  Let us know how it works or if you have a question.

bol
0
 
Richard KortsAuthor Commented:
to b0lsc0tt and AtanAsfaloth,

OK, I see how to save a document in homesite as UTF-8 ( thanks to b0lsc0tt). Is that the right one? It also allows UNICODE and UNICODE Big Endian (no clue what that means).

I'll try it tomorrow with a simple example.

rkorts
0
 
Richard KortsAuthor Commented:
to b0lsc0tt and AtanAsfaloth,

One more thing. You guys are very responsive & very helpful. I'd like to give you 500 points each. Is there a way to do that?

rkorts
0
 
b0lsc0ttIT ManagerCommented:
The UTF-8 encoding that Atan suggested should be the best for this.  Let us know the results and we can try others if needed.  I can tell you about the others if we need to use them.

The max points are 500.  I really appreciate the offer but a split of those points with an A will be reward enough, especially with your thanks. :)  Drop an email to EE as a thanks if you really want to go above and beyond to show appreciation.  Without them we wouldn't have the chance to do this. :)

bol
0
 
Richard KortsAuthor Commented:
To b0lsc0tt,

I opened the main page of the chinese web site in Homesite. The early chinese characters are all represented by a number of ?. For example, the following are a few html lines:

            <ul id="nav">
            <li><a href="/index.html" class="on"><b>??</b></a> </li>
            <li><a href="/pages/company.html"><b>??</b></a>
                  <ul>
                  <li>
                  <div id="navItem"><a href="/pages/team.html">????</a></div>
                  <div id="navItem"><a href="/pages/board.html">???</a></div>
                  <div id="navItem"><a href="/pages/investors.html">?????</a></div>

So I copied the chinese characters from the Word Document & pasted them into th html. They still appear as ??. So I saved the document in UTF-8. I opened it in a browser. The ?? show in the browser. This is going nowhere fast.

 Any suggestions?
0
 
b0lsc0ttConnect With a Mentor IT ManagerCommented:
I am having a hard time too but I am missing some options that I would think I should have.  Check your computer and make sure Windows has the Chinese (or equivalent) language pack installed.  This would be found in the Regional area in Control Panel.  Let me know if you need specifics.  I believe (and I am going from memory a bit here) that not having it is making Windows have a hard time copying the data.

If you want another option and this is just a one time thing then you could use Word to open the file and save the file as a HTML page (filtered).  I hate (yes Hate) Word's html but with the filtered setting it should be a little better.  Also it will read and convert the characters to unicode html entities.  It is an option and might be the easiest with what you have on your computer now.

Let me know what you decide.  If you have a question let me know.  If you will be working with Chinese characters a lot then you should go through the effort of installing the language pack (your OS CD should already have the files).  Also, with Chinese characters you should find out if you need to use Traditional or Simplified.  Let us know if you want to pursue the non-Word option.

bol
0
 
Richard KortsAuthor Commented:
To b0lsc0tt,

It WORKS!!
But it's a pain in the butt.

I did just what you said. The Word document saved as html (filtered) (whatever that means), has the right UTF-8 encoded chinese characters. Here's an example:

<span
     lang=ZH-CN style='font-family:SimSun'><b><span style='color:blue'>&#20135;&#21697;</span></b></span>

I have to copy from the word as HTML version piece by piece, but it seems to work.


0
 
b0lsc0ttIT ManagerCommented:
Using the html entities is probably the safest way to make sure the characters are displayed.  The page encoding isn't an issue in that case.  It is ugly when you look at the source and I really don't like the html Word makes but I'm glad it works for you.

Thanks for the grade, the points and the fun question.

bol
0
 
Richard KortsAuthor Commented:
To b0lsc0tt,

Not much fun for me. But the customer is paying, so I guess I can't complain.

There HAS TO BE an easier way then this.
0
 
Richard KortsAuthor Commented:
To b0lsc0tt,

Is there anyway I can directly email you? I have more information that I have pieced together on this topic.

rkorts
0
 
b0lsc0ttIT ManagerCommented:
My contact info is in my profile.  However if it is an EE question then post the URL here and I can look at it.  If I can help then I will post in it.  I saw one of your new questions but haven't had a chance to reply to it yet.  If it is an EE question then I won't be able to help you by email.  I do some "contract" work on the side and the contact info could be used for that as long as it isn't an EE issue.

Thanks for the interest and I'll be happy to help (if I can) whether it is an EE question or some "project for hire."

bol

p.s.  If it is something that will contribute to this Q then post it here.  I have still been trying to figure out a better way for this to work and am interested in this issue still. :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.