?
Solved

Convert a C program to Unicode

Posted on 2006-04-21
12
Medium Priority
?
382 Views
Last Modified: 2010-04-01
I need some directional help on how to convert a C program to unicode. The program has no UI and pulls information from a PC to write to a text file, as a sort of discovery process. However when it runs on Japanese or Russian machines it doesnt work properly. The spec roughly is...

- All incoming strings (from registry, windows api, wmi, WNet functions etc) should be able to cope with unicode
- The program should still work without installation on clean build Windows 95 or later
- No reference to MFC or other DLLs that might not be there is allowed

The existing program is about 4000 lines long, with most output strings set to char*, and uses CRT funcs like strncat, memset, malloc, free, memcpy, strtok, strlen, etc

1. What string types should I use instead of char / char * ? I would like to standardise on a single string type throughout the program if poss
2. Should I continue to use malloc and just multiple all the string lengths by 2 ?
3. What string functions should I use in place of those CRT functions.

At the moment I think some of the incoming data to the program (e.g. from WMI) is probably already in unicode or BSTR form. Some functions seem to declare incoming data as BSTR, some as WCHAR_T. It has intermediate calls to wcstombs, SysAllocString and the like

I would also like to know if    

        L"Win32_ComputerSystem"

is actually a unicode string or some other kind of string.

As you can probably tell my biggest problem is only basic knowledge of C programming with unicode so the more help and direction you can give me the better...

And it would be nice if I can wrap up the string conversion into functions away from the core logic so any suggestions there would be appreciated. ..

thanks in advance
0
Comment
Question by:plq
  • 5
  • 3
  • 2
  • +1
12 Comments
 
LVL 8

Expert Comment

by:mrblue
ID: 16507456
1. Use TCHAR, LPCTSTR, LPTSTR types. They will evaluate according to char, const char *, char * or to wchar, const wchar*,  wchar * types depending on _UNICODE constant is defined or not.
2. You can use also new operator like:
LPTSTR lpszText = new TCHAR[100];
and it will do what's necessary
3. There are special macroes for each (almost each) function.
Instead of strlen() use _tcslen() which wil be replaced by strlen() or wcslen() depending on _UNICODE constant is defined or not.
And so on (read MSDN to find correct macroes names).
0
 
LVL 14

Expert Comment

by:hoomanv
ID: 16507465
could be helpful
http://icu.sourceforge.net/
0
 
LVL 8

Expert Comment

by:mrblue
ID: 16507547
L"Win32_ComputerSystem" is unicode string
BSTR is also unicode string but it contains additionally counter of characters in string (berore the first character maybe but I am not sure now) so it can contain '\0' character in the middle if I remember well. I guess this is used for so called marshalling (COM interfaces) when parameters are passed crossing process or machine boundary.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 5

Assisted Solution

by:Dragon_Krome
Dragon_Krome earned 400 total points
ID: 16507579
You need to do a little bit of research on internationalisation issues before you start, so you don't run into trouble.

http://en.wikipedia.org/wiki/Internationalization   this page provides more links to valuable resources reffering to this matter.

http://www.i18nfaq.com/
0
 
LVL 8

Expert Comment

by:mrblue
ID: 16507584
There are also macroes for converting between ANSI, UNICODE strings like T2, T2BSTR, A2W and so on. Search MSDN for "String Conversion Macroes"
0
 
LVL 8

Author Comment

by:plq
ID: 16507723
Thanks for the feedback so far..

So I have to #define _UNICODE at the top.. ok. And then change all my char * and char declarations to be TCHAR etc as above .. ok too. And I'm OK with digging out or rewriting replacement functions for the CRT functions currently in use.

Looking around the web it seems that a Unicode character in Japan might have a different meaning to a unicode character in Russia even if they have the same two byte character code? So I presume I need to output the LOCALE as well so the receiving backend can know what locale the incoming data is in ???

Also can you recommend functions to replace sprintf ?

And finally this is our code to write a file containing what up to now has been char data:

SetErrorMode(SEM_NOOPENFILEERRORBOX | SEM_FAILCRITICALERRORS);
HANDLE hFile = CreateFile(obj.outputfile, GENERIC_WRITE, FILE_SHARE_WRITE, NULL, TRUNCATE_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (hFile == INVALID_HANDLE_VALUE)
{
      hFile = CreateFile(mtpcacmdline.outputfile, GENERIC_WRITE, FILE_SHARE_WRITE, NULL, CREATE_NEW, FILE_ATTRIBUTE_NORMAL, NULL);
}
if (hFile != INVALID_HANDLE_VALUE)
{
      WriteFile(hFile, sbuffer, strlen(sbuffer), &dwBytesWritten, NULL);
      SetEndOfFile(hFile);
      CloseHandle(hFile);
}

Is WriteFile likely to workOK with UNICODE data, and does "CreateFile" need to specify that the file is a unicode file ?

thanks
0
 
LVL 5

Expert Comment

by:Dragon_Krome
ID: 16507746
0
 
LVL 14

Expert Comment

by:hoomanv
ID: 16507785
> Also can you recommend functions to replace sprintf ?
if you use ICU library as I have described above it provides all these functionalities
http://icu.sourceforge.net/apiref/icu4c/ustdio_8h.html  ---> for unicode I/O like sprintf
http://icu.sourceforge.net/apiref/icu4c/ustring_8h.html  ---> for Strings and Character Iteration like strlen , strcat
0
 
LVL 8

Accepted Solution

by:
mrblue earned 1600 total points
ID: 16508339
use _stprintf() macro (just find in MSDN sprintf() description and below you will find what's necessary - _stprintf() macro)

I also think (I hope that I am not wrong) that LOCALE is only for multi byte strings (1 byte per character) where the same code can mean different characters. However so far I was convinced that this is not the case for UNICODE strings where we have 2 bytes per char (65,xxx total character which should be enough for every character). Let me know if I am wrong :)

"CreateFile()" works fine with binary data so it will also work fine with UNICODE which is special case of binary data.
0
 
LVL 8

Author Comment

by:plq
ID: 16508554
Excellent responses - thank you

I am sure there will be a few more in this TA from me in the next few days (hours)

thanks very much
0
 
LVL 8

Author Comment

by:plq
ID: 16508575
One more quickie - anything wrong with saying TCHAR * instead of LPTSTR - or does that sound silly to you ?
0
 
LVL 8

Expert Comment

by:mrblue
ID: 16508641
No, it is exactly the same but frakly saying I've never used TCHAR * for LPTSTR ;)
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
Container Orchestration platforms empower organizations to scale their apps at an exceptional rate. This is the reason numerous innovation-driven companies are moving apps to an appropriated datacenter wide platform that empowers them to scale at a …
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question