Solved

How to Convert String Encoding in C Language

Posted on 2007-03-19
13
674 Views
Last Modified: 2012-08-14
Dear Experts,

I have a string that is entered from the command line as ISO8859-6 (Arabic on Unix) and i want to send it to another server as (Windows-1256) , so a conversion inside my C application is required from ISO8859-6 to Windows-1256 , can you please help me on how to change the encoding of the strings in C. I know that this thing is very easy in Java & C# . but how to do it In C ?

Thanks,
0
Comment
Question by:BinaryTree
  • 5
  • 4
  • 3
  • +1
13 Comments
 
LVL 4

Expert Comment

by:chip3d
ID: 18747943
can you also use C++ code?

With C++ you can use the locale object to convert from codepage to wide character:

...
char c = 0xD4;
locale loc("arabic");
wchar_t  wc = std::use_facet<std::ctype<wchar_t> > (loc).widen(c);
...


regards
0
 
LVL 86

Expert Comment

by:jkr
ID: 18748472
Are you thinking of

size_t len = wcslen(pwszArabic);
char* pszLatin = new char[len + 1];

WideCharToMultiByte(1252,0,pwszArabic,len,pszLatin,len,NULL,FALSE);

?
0
 
LVL 4

Expert Comment

by:chip3d
ID: 18748688
to covert from ISO8859-6 to wide character, use MultiByteToWideChar:

size_t len = strlen(ISO88596_string);
wchar_t* wide_string = new wchar_t[len + 1];

MultiByteToWideChar(28596,0,ISO88596_string,len,wide_string,len);

then use jkr's code to convert to Windows-1256...
0
 

Author Comment

by:BinaryTree
ID: 18754800
Dear All,

All the proposed solution belongs to C++ , but what about C ? i am using C not C++

Thanks
0
 
LVL 4

Expert Comment

by:chip3d
ID: 18755895
Hi BinaryTree,

WideCharToMultiByte and MultiByteToWideChar should work with a windows C compiler...
Example:

/////////////////////////////////////////

#include <windows.h>
#include <stdio.h>

void convert(const char* iso88596, char* win1256, int size)
{
    wchar_t* ws = (wchar_t*)malloc((size+1)*sizeof(wchar_t));

    MultiByteToWideChar(28596,0,iso88596,size,ws,size);
    WideCharToMultiByte(1256,0,ws,size,win1256,size,NULL,FALSE);

    free((void*)ws);
}


int main()
{
    const char* arabic = "hello \xE0\n";
    char win[32] = {0};

    printf(arabic);
    convert(arabic,win,strlen(arabic));
    printf(win);

    return 0;
}

////////////////////////////////////////////

for Unix C i do not know any conversion functions. So you could also write your own mapping function like

void convert_mapping(const char* iso88596, char* win1256, int size)
{
     // mapping of the upper 128 characters from ISO88596 to windows-1256 (other part is same)
    static char toWin1256[128] = {...};

    int i = 0;
    for (; i != size; ++i)
    {
        if ((unsigned char)iso88596[i] > 127)
            win1256[i] = toWin1256[(unsigned char)iso88596[i]-128];
        else win1256[i] = iso88596[i];
    }
    win1256[size] = 0;
}

the tables for the mapping you can find at:
http://en.wikipedia.org/wiki/Windows-1256
http://en.wikipedia.org/wiki/ISO_8859-6

regards
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 18756221
I don't think that converting to UNICODE and back would give any benefit. Note, ISO8859-6 is a single-char code page what means it has 256 codes 0 .. 255. Codes from 0 to 127 (ASCII) are identical to WIN-1256 beside that the digits (0 ...9) were printed differently if you have an arabic locale. But the codes are identical so that moving a text with ASCII only from a ISO8859-6 to a Windows-1256 prints good on both sides. Codes from 128 .. to 255 are differently. ISO8859-6  has arabic letters while WIN1256 has umlauts, currency signs, greek characters and more...  However, there is no translation from an arabic character to a ANSI character (WIN-1256) and it doesn't help that you transfer it to UNICODE before and back. Only the first part (transfer to UNICODE) gives a valid result. So if you want to send non-ASCII arabic letters to a non-Arabic Windows system you have to send UNICODE characters. Then they can print it on any device capable to print UNICODE characters (arabic letters included). if you have ASCII only you can send it without any conversion.

Regards, Alex
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 

Author Comment

by:BinaryTree
ID: 18762955
Dear Alex,

What I know from your reply is that it is impossible to do what I am looking for!  OK , what about converting the C application to Library (API) That can be called via Java application (call the my functions in C code)

Regards
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 18763109
>>>> I know from your reply is that it is impossible to do what I am looking for

I am not sure if we understand both the same.

Look if you have a text using arabic letters which were read from right to left, there is no way to show that text in Windows 1256 or any other single-byte codepage which contains no arabic letters.  Even if you change the codepage at the receiver's side to ISO8859-6 they most likely see non-printables cause the hardware and operation system has not the appropriate fonts. However, if you convert it to UNICODE text any Windows PC can print your text with all letters when using a UNICODE font, e. g. like LUCIDA UNICODE, which is part of any Windows installation. I don't know on UNIX but I am pretty sure that there is a means to print all UNICODE text as well, with or without arabic letters.

>>>> what about converting the C application to Library (API)
It isn't a technical problem (which can be solved by using another technic) but a logical one. The only way to show an arabic text with the Windows-1256 code page on a single-byte character set, would be a phonetic translation. For that you would need a conversion that transfers one or more arabic characters to one or more ANSI characters so that both sounds (nearly) equivalent when spoken by a person that can speek arabic as well as english. Of course there exist programs that can make such a conversion, e. g. for the purpose to write a dictionary. But for your case I don't think it is an alternative.
The only senseful thing I see is to convert to UNICODE and ship that.

Regards, Alex
0
 
LVL 4

Accepted Solution

by:
chip3d earned 250 total points
ID: 18763442

>>Look if you have a text using arabic letters which were read from right to left, there is no way to show that text in Windows 1256 or any other single-byte codepage which contains no arabic letters.

The windows 1256 codepage contains arabic letters, but it is not compatible with the ISO 8859-6. A conversion is possible.

But itsmeandnobodyelse is right about using unicode, better to convert your iso-string to a unicode-string and send that...

regards
0
 
LVL 39

Assisted Solution

by:itsmeandnobodyelse
itsmeandnobodyelse earned 250 total points
ID: 18763717
>>>> The windows 1256 codepage contains arabic letters
Oh, I mixed it up with codepage 1252.

Sorry for that BinaryTree. If you don't want to go the UNICODE way - what surely is recommended - you may try the solution chip3d and jkr have posted above. If the codepage contains arabic letters none of my above arguments against converting to it was valid.

Regards, Alex
0
 

Author Comment

by:BinaryTree
ID: 18787696
Thanks for your replies guys , as I understand from you that it is impossible to do it in my case (since I have to deliver the txt as Windows-1256 not as Unicode !) OK I wll split the points.

Regards.

0
 
LVL 4

Expert Comment

by:chip3d
ID: 18787726
>> Thanks for your replies guys , as I understand from you that it is impossible to do it in my case (since I have to deliver the txt as Windows-1256 not as Unicode !)

It is possible... To use unicode was just a suggestion. If you can convert your string on a windows system, you can use the first solution:
void convert(const char* iso88596, char* win1256, int size)... with WideCharToMultiByte and MultiByteToWideChar

If you have to convert your string on a unix system you could use your own mapping function. I posted an example as second solution:
void convert_mapping(const char* iso88596, char* win1256, int size)...
Well, the function is not complete, cuz it was only an example on how it could be done. So the map has still to be initialized correctly. But you can find the mapping information if you compare the two character tables and complete the function:
http://en.wikipedia.org/wiki/Windows-1256
http://en.wikipedia.org/wiki/ISO_8859-6

If you need help to complete the mapping by your own, feel free to ask...

regards
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 18787758
chip3d is right, it is possible. It was my fault that I mixed up 1256 with 1252. 1256 contains arabic letters while 1252 does not.

Try the following (taken from jkr and chip3d)

char* convert8859To1256(char* pszStringToConvert)
{
      int len = strlen(pszStringToConvert);  // get string length
      wchar_t* pwsz = new wchar_t[len+1];       // allocate storage for temporary UNICODE string
      MultiByteToWideChar(28596,0, pszStringToConvert, len, pwsz, len);
      WideCharToMultiByte(1252,0, pwsz, len, pszStringToConvert, len, NULL, FALSE);
      return pszStringToConvert;
}

Regards, Alex

0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Suggested Solutions

The greatest common divisor (gcd) of two positive integers is their largest common divisor. Let's consider two numbers 12 and 20. The divisors of 12 are 1, 2, 3, 4, 6, 12 The divisors of 20 are 1, 2, 4, 5, 10 20 The highest number among the c…
Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode (http://en.wikipedia.org/wiki/Unicode)? They will have you believe that Unicode requires you to use…
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use nested-loops in the C programming language.
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now