How to Convert String Encoding in C Language

Dear Experts,

I have a string that is entered from the command line as ISO8859-6 (Arabic on Unix) and i want to send it to another server as (Windows-1256) , so a conversion inside my C application is required from ISO8859-6 to Windows-1256 , can you please help me on how to change the encoding of the strings in C. I know that this thing is very easy in Java & C# . but how to do it In C ?

Thanks,
BinaryTreeAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

chip3dCommented:
can you also use C++ code?

With C++ you can use the locale object to convert from codepage to wide character:

...
char c = 0xD4;
locale loc("arabic");
wchar_t  wc = std::use_facet<std::ctype<wchar_t> > (loc).widen(c);
...


regards
0
jkrCommented:
Are you thinking of

size_t len = wcslen(pwszArabic);
char* pszLatin = new char[len + 1];

WideCharToMultiByte(1252,0,pwszArabic,len,pszLatin,len,NULL,FALSE);

?
0
chip3dCommented:
to covert from ISO8859-6 to wide character, use MultiByteToWideChar:

size_t len = strlen(ISO88596_string);
wchar_t* wide_string = new wchar_t[len + 1];

MultiByteToWideChar(28596,0,ISO88596_string,len,wide_string,len);

then use jkr's code to convert to Windows-1256...
0
HTML5 and CSS3 Fundamentals

Build a website from the ground up by first learning the fundamentals of HTML5 and CSS3, the two popular programming languages used to present content online. HTML deals with fonts, colors, graphics, and hyperlinks, while CSS describes how HTML elements are to be displayed.

BinaryTreeAuthor Commented:
Dear All,

All the proposed solution belongs to C++ , but what about C ? i am using C not C++

Thanks
0
chip3dCommented:
Hi BinaryTree,

WideCharToMultiByte and MultiByteToWideChar should work with a windows C compiler...
Example:

/////////////////////////////////////////

#include <windows.h>
#include <stdio.h>

void convert(const char* iso88596, char* win1256, int size)
{
    wchar_t* ws = (wchar_t*)malloc((size+1)*sizeof(wchar_t));

    MultiByteToWideChar(28596,0,iso88596,size,ws,size);
    WideCharToMultiByte(1256,0,ws,size,win1256,size,NULL,FALSE);

    free((void*)ws);
}


int main()
{
    const char* arabic = "hello \xE0\n";
    char win[32] = {0};

    printf(arabic);
    convert(arabic,win,strlen(arabic));
    printf(win);

    return 0;
}

////////////////////////////////////////////

for Unix C i do not know any conversion functions. So you could also write your own mapping function like

void convert_mapping(const char* iso88596, char* win1256, int size)
{
     // mapping of the upper 128 characters from ISO88596 to windows-1256 (other part is same)
    static char toWin1256[128] = {...};

    int i = 0;
    for (; i != size; ++i)
    {
        if ((unsigned char)iso88596[i] > 127)
            win1256[i] = toWin1256[(unsigned char)iso88596[i]-128];
        else win1256[i] = iso88596[i];
    }
    win1256[size] = 0;
}

the tables for the mapping you can find at:
http://en.wikipedia.org/wiki/Windows-1256
http://en.wikipedia.org/wiki/ISO_8859-6

regards
0
itsmeandnobodyelseCommented:
I don't think that converting to UNICODE and back would give any benefit. Note, ISO8859-6 is a single-char code page what means it has 256 codes 0 .. 255. Codes from 0 to 127 (ASCII) are identical to WIN-1256 beside that the digits (0 ...9) were printed differently if you have an arabic locale. But the codes are identical so that moving a text with ASCII only from a ISO8859-6 to a Windows-1256 prints good on both sides. Codes from 128 .. to 255 are differently. ISO8859-6  has arabic letters while WIN1256 has umlauts, currency signs, greek characters and more...  However, there is no translation from an arabic character to a ANSI character (WIN-1256) and it doesn't help that you transfer it to UNICODE before and back. Only the first part (transfer to UNICODE) gives a valid result. So if you want to send non-ASCII arabic letters to a non-Arabic Windows system you have to send UNICODE characters. Then they can print it on any device capable to print UNICODE characters (arabic letters included). if you have ASCII only you can send it without any conversion.

Regards, Alex
0
BinaryTreeAuthor Commented:
Dear Alex,

What I know from your reply is that it is impossible to do what I am looking for!  OK , what about converting the C application to Library (API) That can be called via Java application (call the my functions in C code)

Regards
0
itsmeandnobodyelseCommented:
>>>> I know from your reply is that it is impossible to do what I am looking for

I am not sure if we understand both the same.

Look if you have a text using arabic letters which were read from right to left, there is no way to show that text in Windows 1256 or any other single-byte codepage which contains no arabic letters.  Even if you change the codepage at the receiver's side to ISO8859-6 they most likely see non-printables cause the hardware and operation system has not the appropriate fonts. However, if you convert it to UNICODE text any Windows PC can print your text with all letters when using a UNICODE font, e. g. like LUCIDA UNICODE, which is part of any Windows installation. I don't know on UNIX but I am pretty sure that there is a means to print all UNICODE text as well, with or without arabic letters.

>>>> what about converting the C application to Library (API)
It isn't a technical problem (which can be solved by using another technic) but a logical one. The only way to show an arabic text with the Windows-1256 code page on a single-byte character set, would be a phonetic translation. For that you would need a conversion that transfers one or more arabic characters to one or more ANSI characters so that both sounds (nearly) equivalent when spoken by a person that can speek arabic as well as english. Of course there exist programs that can make such a conversion, e. g. for the purpose to write a dictionary. But for your case I don't think it is an alternative.
The only senseful thing I see is to convert to UNICODE and ship that.

Regards, Alex
0
chip3dCommented:

>>Look if you have a text using arabic letters which were read from right to left, there is no way to show that text in Windows 1256 or any other single-byte codepage which contains no arabic letters.

The windows 1256 codepage contains arabic letters, but it is not compatible with the ISO 8859-6. A conversion is possible.

But itsmeandnobodyelse is right about using unicode, better to convert your iso-string to a unicode-string and send that...

regards
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
itsmeandnobodyelseCommented:
>>>> The windows 1256 codepage contains arabic letters
Oh, I mixed it up with codepage 1252.

Sorry for that BinaryTree. If you don't want to go the UNICODE way - what surely is recommended - you may try the solution chip3d and jkr have posted above. If the codepage contains arabic letters none of my above arguments against converting to it was valid.

Regards, Alex
0
BinaryTreeAuthor Commented:
Thanks for your replies guys , as I understand from you that it is impossible to do it in my case (since I have to deliver the txt as Windows-1256 not as Unicode !) OK I wll split the points.

Regards.

0
chip3dCommented:
>> Thanks for your replies guys , as I understand from you that it is impossible to do it in my case (since I have to deliver the txt as Windows-1256 not as Unicode !)

It is possible... To use unicode was just a suggestion. If you can convert your string on a windows system, you can use the first solution:
void convert(const char* iso88596, char* win1256, int size)... with WideCharToMultiByte and MultiByteToWideChar

If you have to convert your string on a unix system you could use your own mapping function. I posted an example as second solution:
void convert_mapping(const char* iso88596, char* win1256, int size)...
Well, the function is not complete, cuz it was only an example on how it could be done. So the map has still to be initialized correctly. But you can find the mapping information if you compare the two character tables and complete the function:
http://en.wikipedia.org/wiki/Windows-1256
http://en.wikipedia.org/wiki/ISO_8859-6

If you need help to complete the mapping by your own, feel free to ask...

regards
0
itsmeandnobodyelseCommented:
chip3d is right, it is possible. It was my fault that I mixed up 1256 with 1252. 1256 contains arabic letters while 1252 does not.

Try the following (taken from jkr and chip3d)

char* convert8859To1256(char* pszStringToConvert)
{
      int len = strlen(pszStringToConvert);  // get string length
      wchar_t* pwsz = new wchar_t[len+1];       // allocate storage for temporary UNICODE string
      MultiByteToWideChar(28596,0, pszStringToConvert, len, pwsz, len);
      WideCharToMultiByte(1252,0, pwsz, len, pszStringToConvert, len, NULL, FALSE);
      return pszStringToConvert;
}

Regards, Alex

0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C++

From novice to tech pro — start learning today.