Solved

Convert Chinese characters to HTML

Posted on 2007-11-13
16
1,148 Views
Last Modified: 2016-08-29
I have a document sent in by a client with chinese characters. They want me to update their Chinese language website.

How do I convert chinese characters (see uploaded word document file in ee_stuff) to HTML?
0
Comment
Question by:Richard Korts
  • 7
  • 7
  • 2
16 Comments
 
LVL 5

Expert Comment

by:AtanAsfaloth
Comment Utility
As far as I know chinese characters are included in the default UTF-8 character set. Just make sure you save your html in unicode and you should be safe just copy/pasting... I have no experience with this however so I wish you the best of luck!
Regards,
Atan Asfaloth
0
 
LVL 54

Expert Comment

by:b0lsc0tt
Comment Utility
rkorts,

What server languages can you use or programming languages?

I have done this various ways but in an ASP script I used a line like ...

      if i < 127 then letter = chr(i) else letter = "&#" & i & ";"

The key is to look at each character and replace it with the html compatible reference.  In the case of chinese characters setting the charset and encoding for the html is important too.  The script line above is a basic example but I can provide more specific help if needed.  I assume you want help making something to do this and not just someone to do it for you.

Let me know if you have any questions or need more information.

b0lsc0tt
0
 
LVL 54

Expert Comment

by:b0lsc0tt
Comment Utility
rkorts,

What Atan said is basically right.  Depending on the other content you might be able to get it all done with the right header and encoding.

The file the Asker mentioned above is at https://filedb.experts-exchange.com/incoming/ee-stuff/5566-ChineseSite.zip .

Let me know if you have a question.

b0lsc0tt
0
 

Author Comment

by:Richard Korts
Comment Utility
to b0lsc0tt and AtanAsfaloth,

How do I "save your html in unicode". I use Macromedia Homesite as an HTML editor. How do I do that in that? Do I just copy the chinese characters per say from the word document and put them directly into the HTML? Somehow that doesn't sound right. That's EXCATLY how I do English.

PS to b0lsc0tt - That file you referenced is the Word document I uploaded.
0
 
LVL 5

Accepted Solution

by:
AtanAsfaloth earned 250 total points
Comment Utility
One of the main reasons for Unicode to come into existence was to increase simplicity in dealing with multiple languages. Chinese characters are not so different from Latin characters, we're just not used to them. To a computer a character is just a set of bits, in the case of unicode more bits then in the case of ansi. Fact is that word recognizes a chinese character because of it's unique character value. If Macromedia Homesite uses unicode it will as well. As I don't use it I can't tell. Try copy/pasting. If you see chinese characters popping up everything works fine. If you see nothing happening or you come across unexpected results (question marks, rectangular figures, odd characters) maybe your html editor doesn't use unicode as a default.
0
 
LVL 54

Expert Comment

by:b0lsc0tt
Comment Utility
>> That file you referenced is the Word document I uploaded. <<

I know.  I wanted to provide the link to any expert that needed the file, especially some that may not know what ee-stuff is or means.  I guess my description in the that comment was a little confusing.  Hopefully it is clearer now. :)

If your editor is Homesite then I believe it will allow you to choose the encoding to save the file.  Under Options -> Settings -> File Settings make sure the setting to "Enable non-ANSI file encoding" is checked.  Then as you save the file you should be able to specify the encoding.  That will need to be UTF-8 in this case.

Give that a test and see how it works.  I haven't used Homesite like this so I don't know if that will work or not but it should.  Let us know how it works or if you have a question.

bol
0
 

Author Comment

by:Richard Korts
Comment Utility
to b0lsc0tt and AtanAsfaloth,

OK, I see how to save a document in homesite as UTF-8 ( thanks to b0lsc0tt). Is that the right one? It also allows UNICODE and UNICODE Big Endian (no clue what that means).

I'll try it tomorrow with a simple example.

rkorts
0
 

Author Comment

by:Richard Korts
Comment Utility
to b0lsc0tt and AtanAsfaloth,

One more thing. You guys are very responsive & very helpful. I'd like to give you 500 points each. Is there a way to do that?

rkorts
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 54

Expert Comment

by:b0lsc0tt
Comment Utility
The UTF-8 encoding that Atan suggested should be the best for this.  Let us know the results and we can try others if needed.  I can tell you about the others if we need to use them.

The max points are 500.  I really appreciate the offer but a split of those points with an A will be reward enough, especially with your thanks. :)  Drop an email to EE as a thanks if you really want to go above and beyond to show appreciation.  Without them we wouldn't have the chance to do this. :)

bol
0
 

Author Comment

by:Richard Korts
Comment Utility
To b0lsc0tt,

I opened the main page of the chinese web site in Homesite. The early chinese characters are all represented by a number of ?. For example, the following are a few html lines:

            <ul id="nav">
            <li><a href="/index.html" class="on"><b>??</b></a> </li>
            <li><a href="/pages/company.html"><b>??</b></a>
                  <ul>
                  <li>
                  <div id="navItem"><a href="/pages/team.html">????</a></div>
                  <div id="navItem"><a href="/pages/board.html">???</a></div>
                  <div id="navItem"><a href="/pages/investors.html">?????</a></div>

So I copied the chinese characters from the Word Document & pasted them into th html. They still appear as ??. So I saved the document in UTF-8. I opened it in a browser. The ?? show in the browser. This is going nowhere fast.

 Any suggestions?
0
 
LVL 54

Assisted Solution

by:b0lsc0tt
b0lsc0tt earned 250 total points
Comment Utility
I am having a hard time too but I am missing some options that I would think I should have.  Check your computer and make sure Windows has the Chinese (or equivalent) language pack installed.  This would be found in the Regional area in Control Panel.  Let me know if you need specifics.  I believe (and I am going from memory a bit here) that not having it is making Windows have a hard time copying the data.

If you want another option and this is just a one time thing then you could use Word to open the file and save the file as a HTML page (filtered).  I hate (yes Hate) Word's html but with the filtered setting it should be a little better.  Also it will read and convert the characters to unicode html entities.  It is an option and might be the easiest with what you have on your computer now.

Let me know what you decide.  If you have a question let me know.  If you will be working with Chinese characters a lot then you should go through the effort of installing the language pack (your OS CD should already have the files).  Also, with Chinese characters you should find out if you need to use Traditional or Simplified.  Let us know if you want to pursue the non-Word option.

bol
0
 

Author Comment

by:Richard Korts
Comment Utility
To b0lsc0tt,

It WORKS!!
But it's a pain in the butt.

I did just what you said. The Word document saved as html (filtered) (whatever that means), has the right UTF-8 encoded chinese characters. Here's an example:

<span
     lang=ZH-CN style='font-family:SimSun'><b><span style='color:blue'>&#20135;&#21697;</span></b></span>

I have to copy from the word as HTML version piece by piece, but it seems to work.


0
 
LVL 54

Expert Comment

by:b0lsc0tt
Comment Utility
Using the html entities is probably the safest way to make sure the characters are displayed.  The page encoding isn't an issue in that case.  It is ugly when you look at the source and I really don't like the html Word makes but I'm glad it works for you.

Thanks for the grade, the points and the fun question.

bol
0
 

Author Comment

by:Richard Korts
Comment Utility
To b0lsc0tt,

Not much fun for me. But the customer is paying, so I guess I can't complain.

There HAS TO BE an easier way then this.
0
 

Author Comment

by:Richard Korts
Comment Utility
To b0lsc0tt,

Is there anyway I can directly email you? I have more information that I have pieced together on this topic.

rkorts
0
 
LVL 54

Expert Comment

by:b0lsc0tt
Comment Utility
My contact info is in my profile.  However if it is an EE question then post the URL here and I can look at it.  If I can help then I will post in it.  I saw one of your new questions but haven't had a chance to reply to it yet.  If it is an EE question then I won't be able to help you by email.  I do some "contract" work on the side and the contact info could be used for that as long as it isn't an EE issue.

Thanks for the interest and I'll be happy to help (if I can) whether it is an EE question or some "project for hire."

bol

p.s.  If it is something that will contribute to this Q then post it here.  I have still been trying to figure out a better way for this to work and am interested in this issue still. :)
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

This article describes how to create custom column layout styles for Bootstrap. The article uses 5 columns to illustrate the concept, but the principle can be extended to any number of columns.
Not sure what the best email signature size is? Are you worried about email signature image size? Follow this best practice guide.
In this tutorial viewers will learn how to define a gradient in CSS. Create a new HTML document with an internal stylesheet.: Create a div in CSS and name it Gradient. Define the background as "linear-gradient(to right, #ee3668, black)". Ensure you …
In this tutorial viewers will learn how to embed Flash content in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: "<!DOCTYPE html>": Use the <object> tag to embed Flash content.: To specify that the object is Flash content, d…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now