Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Convert Chinese characters to HTML

Posted on 2007-11-13
16
Medium Priority
?
1,175 Views
Last Modified: 2016-08-29
I have a document sent in by a client with chinese characters. They want me to update their Chinese language website.

How do I convert chinese characters (see uploaded word document file in ee_stuff) to HTML?
0
Comment
Question by:Richard Korts
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 7
  • 2
16 Comments
 
LVL 5

Expert Comment

by:AtanAsfaloth
ID: 20276379
As far as I know chinese characters are included in the default UTF-8 character set. Just make sure you save your html in unicode and you should be safe just copy/pasting... I have no experience with this however so I wish you the best of luck!
Regards,
Atan Asfaloth
0
 
LVL 54

Expert Comment

by:b0lsc0tt
ID: 20276401
rkorts,

What server languages can you use or programming languages?

I have done this various ways but in an ASP script I used a line like ...

      if i < 127 then letter = chr(i) else letter = "&#" & i & ";"

The key is to look at each character and replace it with the html compatible reference.  In the case of chinese characters setting the charset and encoding for the html is important too.  The script line above is a basic example but I can provide more specific help if needed.  I assume you want help making something to do this and not just someone to do it for you.

Let me know if you have any questions or need more information.

b0lsc0tt
0
 
LVL 54

Expert Comment

by:b0lsc0tt
ID: 20276431
rkorts,

What Atan said is basically right.  Depending on the other content you might be able to get it all done with the right header and encoding.

The file the Asker mentioned above is at https://filedb.experts-exchange.com/incoming/ee-stuff/5566-ChineseSite.zip .

Let me know if you have a question.

b0lsc0tt
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:Richard Korts
ID: 20276569
to b0lsc0tt and AtanAsfaloth,

How do I "save your html in unicode". I use Macromedia Homesite as an HTML editor. How do I do that in that? Do I just copy the chinese characters per say from the word document and put them directly into the HTML? Somehow that doesn't sound right. That's EXCATLY how I do English.

PS to b0lsc0tt - That file you referenced is the Word document I uploaded.
0
 
LVL 5

Accepted Solution

by:
AtanAsfaloth earned 1000 total points
ID: 20276629
One of the main reasons for Unicode to come into existence was to increase simplicity in dealing with multiple languages. Chinese characters are not so different from Latin characters, we're just not used to them. To a computer a character is just a set of bits, in the case of unicode more bits then in the case of ansi. Fact is that word recognizes a chinese character because of it's unique character value. If Macromedia Homesite uses unicode it will as well. As I don't use it I can't tell. Try copy/pasting. If you see chinese characters popping up everything works fine. If you see nothing happening or you come across unexpected results (question marks, rectangular figures, odd characters) maybe your html editor doesn't use unicode as a default.
0
 
LVL 54

Expert Comment

by:b0lsc0tt
ID: 20276716
>> That file you referenced is the Word document I uploaded. <<

I know.  I wanted to provide the link to any expert that needed the file, especially some that may not know what ee-stuff is or means.  I guess my description in the that comment was a little confusing.  Hopefully it is clearer now. :)

If your editor is Homesite then I believe it will allow you to choose the encoding to save the file.  Under Options -> Settings -> File Settings make sure the setting to "Enable non-ANSI file encoding" is checked.  Then as you save the file you should be able to specify the encoding.  That will need to be UTF-8 in this case.

Give that a test and see how it works.  I haven't used Homesite like this so I don't know if that will work or not but it should.  Let us know how it works or if you have a question.

bol
0
 

Author Comment

by:Richard Korts
ID: 20276744
to b0lsc0tt and AtanAsfaloth,

OK, I see how to save a document in homesite as UTF-8 ( thanks to b0lsc0tt). Is that the right one? It also allows UNICODE and UNICODE Big Endian (no clue what that means).

I'll try it tomorrow with a simple example.

rkorts
0
 

Author Comment

by:Richard Korts
ID: 20276749
to b0lsc0tt and AtanAsfaloth,

One more thing. You guys are very responsive & very helpful. I'd like to give you 500 points each. Is there a way to do that?

rkorts
0
 
LVL 54

Expert Comment

by:b0lsc0tt
ID: 20276824
The UTF-8 encoding that Atan suggested should be the best for this.  Let us know the results and we can try others if needed.  I can tell you about the others if we need to use them.

The max points are 500.  I really appreciate the offer but a split of those points with an A will be reward enough, especially with your thanks. :)  Drop an email to EE as a thanks if you really want to go above and beyond to show appreciation.  Without them we wouldn't have the chance to do this. :)

bol
0
 

Author Comment

by:Richard Korts
ID: 20277277
To b0lsc0tt,

I opened the main page of the chinese web site in Homesite. The early chinese characters are all represented by a number of ?. For example, the following are a few html lines:

            <ul id="nav">
            <li><a href="/index.html" class="on"><b>??</b></a> </li>
            <li><a href="/pages/company.html"><b>??</b></a>
                  <ul>
                  <li>
                  <div id="navItem"><a href="/pages/team.html">????</a></div>
                  <div id="navItem"><a href="/pages/board.html">???</a></div>
                  <div id="navItem"><a href="/pages/investors.html">?????</a></div>

So I copied the chinese characters from the Word Document & pasted them into th html. They still appear as ??. So I saved the document in UTF-8. I opened it in a browser. The ?? show in the browser. This is going nowhere fast.

 Any suggestions?
0
 
LVL 54

Assisted Solution

by:b0lsc0tt
b0lsc0tt earned 1000 total points
ID: 20277995
I am having a hard time too but I am missing some options that I would think I should have.  Check your computer and make sure Windows has the Chinese (or equivalent) language pack installed.  This would be found in the Regional area in Control Panel.  Let me know if you need specifics.  I believe (and I am going from memory a bit here) that not having it is making Windows have a hard time copying the data.

If you want another option and this is just a one time thing then you could use Word to open the file and save the file as a HTML page (filtered).  I hate (yes Hate) Word's html but with the filtered setting it should be a little better.  Also it will read and convert the characters to unicode html entities.  It is an option and might be the easiest with what you have on your computer now.

Let me know what you decide.  If you have a question let me know.  If you will be working with Chinese characters a lot then you should go through the effort of installing the language pack (your OS CD should already have the files).  Also, with Chinese characters you should find out if you need to use Traditional or Simplified.  Let us know if you want to pursue the non-Word option.

bol
0
 

Author Comment

by:Richard Korts
ID: 20281621
To b0lsc0tt,

It WORKS!!
But it's a pain in the butt.

I did just what you said. The Word document saved as html (filtered) (whatever that means), has the right UTF-8 encoded chinese characters. Here's an example:

<span
     lang=ZH-CN style='font-family:SimSun'><b><span style='color:blue'>&#20135;&#21697;</span></b></span>

I have to copy from the word as HTML version piece by piece, but it seems to work.


0
 
LVL 54

Expert Comment

by:b0lsc0tt
ID: 20282171
Using the html entities is probably the safest way to make sure the characters are displayed.  The page encoding isn't an issue in that case.  It is ugly when you look at the source and I really don't like the html Word makes but I'm glad it works for you.

Thanks for the grade, the points and the fun question.

bol
0
 

Author Comment

by:Richard Korts
ID: 20282653
To b0lsc0tt,

Not much fun for me. But the customer is paying, so I guess I can't complain.

There HAS TO BE an easier way then this.
0
 

Author Comment

by:Richard Korts
ID: 20294904
To b0lsc0tt,

Is there anyway I can directly email you? I have more information that I have pieced together on this topic.

rkorts
0
 
LVL 54

Expert Comment

by:b0lsc0tt
ID: 20295719
My contact info is in my profile.  However if it is an EE question then post the URL here and I can look at it.  If I can help then I will post in it.  I saw one of your new questions but haven't had a chance to reply to it yet.  If it is an EE question then I won't be able to help you by email.  I do some "contract" work on the side and the contact info could be used for that as long as it isn't an EE issue.

Thanks for the interest and I'll be happy to help (if I can) whether it is an EE question or some "project for hire."

bol

p.s.  If it is something that will contribute to this Q then post it here.  I have still been trying to figure out a better way for this to work and am interested in this issue still. :)
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Ready to improve network connectivity? Watch this webinar to learn how SD-WANs and a one-click instant connect tool can boost provisions, deployment, and management of your cloud connection.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Use these top 10 tips to master the art of email signature design. Create an email signature design that will easily wow recipients, promote your brand and highlight your professionalism.
Is your Office 365 signature not working the way you want it to? Are signature updates taking up too much of your time? Let's run through the most common problems that an IT administrator can encounter when dealing with Office 365 email signatures.
The viewer will receive an overview of the basics of CSS showing inline styles. In the head tags set up your style tags: (CODE) Reference the nav tag and set your properties.: (CODE) Set the reference for the UL element and styles for it to ensu…
The viewer will learn the benefit of using external CSS files and the relationship between class and ID selectors. Create your external css file by saving it as style.css then set up your style tags: (CODE) Reference the nav tag and set your prop…

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question