Main Topics
Browse All TopicsHi,
I need to convert a widechar string stored in CString object to convert to UTF8 encoded string. In the snippet is source code which I use now, but it doesn't work as expected. It works fine with ASCII characters, but when using national characters like
,a,é... it converts wrong. It converts only the first half of a string ie. when converting 'éáíú' the result is only 'éá'. In combination with standard ASCII it looks like this:
abcdefg
~~ -> abcdefg
(~~ gets cut)
or
abc
aa -> abc
(aa gets cut)
If you could help me I would really appreciate it. Thank you very much!
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
I'm developing an online editing of HTML source using mozilla's gecko engine. I type something into a text editor (CRichEditView) and it appears in the embedded browser. Everything works fine except of this conversion. I can see there that these special national characters are not converted well. It displays one char instead of two, two instead of four etc. I've seen in the debuger that every special char consist of two weird chars after conversion. I really don't understand this well and probably I'm not able to explain it so it would be clear.
First I get the HTML source and convert it from UTF8 to windows ANSI using MultiByteToWideChar (see the snippet). Than I edit the source in text editor and want to pass it back to the browser. Now I need to convert it back to UTF8 (the example I provided in question post), but the special characters don't work.
@StraySod: are you kidding us?
Check GetLastError, you have a problem with the buffer size or something like that. CP_ACP is a kind of default for these functions.
Here you will find an opposite opinion:
http://www.experts
Here you will find the code and explanations:
http://www.ex
>>>> int sourceLength = cs.GetLength();
>>>> char* translated = new char[sourceLength + 1];
I assume your output buffer is too short. UTF-8 is 8-bit. Nevertheless it converts ANSI characters beyond ASCII (7-bit) to a multi-byte sequence beginning with a & and ending with a ;. Try
char* translated = new char[sourceLength + 1024]; // don't be stingy
and use the CP_UTF8. If that doesn't work you could try to first convert from wide national chars to ANSI, then from ANSI to UNICODE and finally to UTF-8. You also could try to using a UNICODE font like Lucida UNICODE instead of your current font.
If things still are wrong, check your original 'wide string' by looking at the hex bytes and post them here.
itsmeandnobodyelse, welcome to the club. :)
If the source string is ANSI with characters code above 127, it is only 2 steps:
1. MultiByteToWideChar with CP_ACP
2. WideCharToMultiByte with CP_UTF8
I think, you proposed a very long way:
1. WideCharToMultiByte with CP_ACP (or with UTF8?)
2. MultiByteToWideChar with CP_ACP (or with UTF8?)
3. WideCharToMultiByte with CP_UTF8
I think for this question everything should work with WideCharToMultiByte and CP_UTF8. The problem is in the buffer size.
UTF8 is a multibyte but each character can be 1 or 2 or 3 or 4 bytes. Right?
Here is a boring explanation about MultiByteToWideChar:
http:/
Business Accounts
Answer for Membership
by: StraySodPosted on 2009-09-11 at 06:28:18ID: 25309179
almost all special characters in my question were censored, now it looks silly in parts where the examples are mentioned. If you couldn't understand my question, please let me know, I will try to provide example somewhere else and post a link here.