Solved

Mutibyte characters -> Unicode

Posted on 2001-07-24
19
989 Views
Last Modified: 2007-12-19
Hi,

I need to convert CString (VC++) to String (Java). This CString is encoded in multibyte character set format (MBSC), I can use the function MultiByteToWideChar() to convert MBSC to Unicode in VC++, but when I construct a new java.lang.String from this converted Unicode string through the JNI's function NewString(), the return String is not Unicode.

What's different between Unicode in VC++ and Java? How can I convert Unicode/MBCS correctly from VC++ to Java and vice versa?

Thanks,
Tommi
0
Comment
Question by:tommi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
19 Comments
 
LVL 92

Expert Comment

by:objects
ID: 6311642
> when I construct a new java.lang.String

Exactly what string are you passing to the String ctor?

> the return String is not Unicode.

All java String's are unicode.
What do you mean when u say nor Unicode?



0
 

Author Comment

by:tommi
ID: 6311750
> when I construct a new java.lang.String

Exactly what string are you passing to the String ctor?

The String I got after running MultiByteToWideChar()

> the return String is not Unicode.

All java String's are unicode.
What do you mean when u say nor Unicode?

It's my unclear explain ;-) You are correct! Java's String is Unicode. I mean the function NewString() should recognize my VC++ Unicode and convert it to String correctly, but it seems to copy byte->byte only, as you know UTF-16 use 16bits. So do I need express this String in format \uXXXX again in Java? And how can I do it?

Thanks,
Tommi
0
 
LVL 92

Expert Comment

by:objects
ID: 6311968
NewString expects a pointer to a Unicode string.

The conversion should look something like this:


unsigned short buf[1024];

MultiByteToWideChar(
CP_ACP,               // code page
MB_ERR_INVALID_CHARS, // options
s,                    // string to map
-1,                   // null-terminated
&buf[0],              // wide-character buffer address
sizeof(buf)           // buffer size
);

// Create java string
jstring result = env->NewString(&buf[0], strlen(s));


I'll have a closer look at the problem tomorrow.

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:tommi
ID: 6315650
I certainly did it. But I don't know what the difference in Unicode format between VC++ and Java. That's why converted String wasn't my Unicode CString.

Regards,
Tommi
0
 
LVL 92

Expert Comment

by:objects
ID: 6315663
> I certainly did it.

Are you saying your code is the same as above?
Is it possible for you to post your code?

> But I don't know what the difference in Unicode format
> between VC++ and Java.

One would hope there is no difference as Unicode is a standard.


One thig to try would be to pick a small test string, and print out the actual underlying resulting bytes after each of the conversion step.
This may give u a better idea of what's happening.
0
 

Author Comment

by:tommi
ID: 6316172
I looked at Java documents and found that JNI's NewString()"Constructs a new java.lang.String object from an array of UTF-8 characters." The VC++ string I passed to is encoded in "platform's default character encoding" (it's not UTF-8, I don't know what its format is though). That's different ;-)

How do I solve it? Java's String.getBytes() "convert this String into bytes according to the platform's default character encoding, storing the result into a new byte array". And that's what I need ;-), then I pass it to JNI as a ByteArray byte[], it will be recognized by VC++. I do the same when I return VC++ string to Java, no need to run MultiByteToWideChar() and use the Java contructor String(byte[])to convert.

I post my testing code here and hope it helps someone who encounter the same problem as mine ;-) I'm testing in Japanese Windows.

Thanks objects, anyway.

Tommi

//Java code:
class Testing {
     static {
          System.loadLibrary("MyDLL");
     }

     public native byte[] test(byte[] string);

     static void main(String [] args) {
          Testing prn = new Testing();

          String teststring = "\u3088\u3046" + "A";
          System.out.println("in java1: "+teststring);
          byte[] bytes = prn.test(teststring.getBytes());
          System.out.println("in java2: "+new String(bytes));
     }
}

//VC++ code:
NIEXPORT jbyteArray JNICALL Java_Modem_test
  (JNIEnv * env, jobject obj, jbyteArray byteArray)
{
     jbyte *bytes = env->GetByteArrayElements(byteArray, NULL);
     int len = env->GetArrayLength(byteArray);

     char*  string = new char[len+1];
     strcpy(string, (const char*)bytes);
     bytes[len]='\0';
     printf("\nin C++: %s\n", string);

     jbyteArray array = env->NewByteArray(len);
     env->SetByteArrayRegion(array, 0, len, (signed char*)string);

     delete [] string;

     return array;
}
0
 

Author Comment

by:tommi
ID: 6316200
I missed one line before returning the array ;-)

   //.....
   env->ReleaseByteArrayElements(byteArray, bytes, 0); //added
   return array;
}
0
 
LVL 92

Expert Comment

by:objects
ID: 6316237
> "Constructs a new java.lang.String object
> from an array of UTF-8 characters."

Where's that quote from?

My understanding is that NewString constructs a new java.lang.String object from an array of Unicode characters. And that the JNI uses UTF-8 strings to represent various string types.

They may mean the same thing, every time I look at this stuff it takes me a while to get my head around how it all works.


> Thanks objects, anyway.

No worries, as long as your problem is solved I'm happy.

Though I'm a bit confused about how your solution converts CString to Java String. I guess you have a routine that goes the other way that wasn't posted.

All the best :)



0
 

Author Comment

by:tommi
ID: 6316329
> "Constructs a new java.lang.String object
> from an array of UTF-8 characters."

Where's that quote from?

http://java.sun.com/j2se/1.3/docs/guide/jni/spec/functions.doc.html#5386
Then search "NewStringUTF"

> Thanks objects, anyway.

No worries, as long as your problem is solved I'm happy.

Yeah... I'm happy too!

Though I'm a bit confused about how your solution converts CString to Java String. I guess you have
a routine that goes the other way that wasn't posted.

No, I posted them all. I convert CString->String in two lines:
    jbyteArray array = env->NewByteArray(len);
    env->SetByteArrayRegion(array, 0, len, (signed char*)string);

For example: Need to convert CString VCString;
     unsigned int len = _mbstrlen((LPCTSTR)VCString);
     jbyteArray array = env->NewByteArray(len);
     env->SetByteArrayRegion(array, 0, len, (signed char *)(LPCTSTR)VCString);

     return array; //----> Java String (need to new String(byte[]) in Java code)

All the best :)

You too ;-)

Tommi
0
 
LVL 92

Expert Comment

by:objects
ID: 6316362
Ahhhh, NewStringUTF(), not NewString().
You've been saying NewString.
One creates from UTF8, the other from Unicode.

mick :)

0
 

Author Comment

by:tommi
ID: 6316405
> "Constructs a new java.lang.String object
> from an array of UTF-8 characters."

Where's that quote from?

http://java.sun.com/j2se/1.3/docs/guide/jni/spec/functions.doc.html#5386
Then search "NewStringUTF"

> Thanks objects, anyway.

No worries, as long as your problem is solved I'm happy.

Yeah... I'm happy too!

Though I'm a bit confused about how your solution converts CString to Java String. I guess you have
a routine that goes the other way that wasn't posted.

No, I posted them all. I convert CString->String in two lines:
    jbyteArray array = env->NewByteArray(len);
    env->SetByteArrayRegion(array, 0, len, (signed char*)string);

For example: Need to convert CString VCString;
     unsigned int len = _mbstrlen((LPCTSTR)VCString);
     jbyteArray array = env->NewByteArray(len);
     env->SetByteArrayRegion(array, 0, len, (signed char *)(LPCTSTR)VCString);

     return array; //----> Java String (need to new String(byte[]) in Java code)

All the best :)

You too ;-)

Tommi
0
 

Author Comment

by:tommi
ID: 6316433
Ahhhh, NewStringUTF(), not NewString().
You've been saying NewString.
One creates from UTF8, the other from Unicode.

Yep! Even though, these UTF8 and Unicode functions can't recognize VC++ String correctly (the platform's default character encoding). You can test to see it.

;-))
0
 
LVL 92

Expert Comment

by:objects
ID: 6316472
I believe you, I was just confused when you said NewString needed a UTF8 array as input.

I think CStrings can use Unicode, or MBCS depending on how the application is built. Sounds like yours are Unicode, thats why the MultiByteToWideChar wasn't working.
ie. you already had unicode.
0
 

Author Comment

by:tommi
ID: 6316544
Right, it depends on defining _MSBC or _UNICODE, the CString will be MBSC or UNICODE respectively. But can't not define both of them. I'm using _MSBC ;-) and MultiByteToWideChar() is suitable for this case.
Are you trying to take advantage of NewString() or NewStringUTF()?
0
 
LVL 92

Expert Comment

by:objects
ID: 6316563
> Are you trying to take advantage of NewString()
> or NewStringUTF()?

Not trying either, just trying to help you and understand what was going on :-)
0
 

Author Comment

by:tommi
ID: 6316591
It's quite fun to talk with you ;-) I'm still openning this question to wait for your help ;-)
0
 
LVL 92

Expert Comment

by:objects
ID: 6320260
0
 
LVL 5

Expert Comment

by:vemul
ID: 7669501
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:
- To be PAQ'ed and points NOT refunded
Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER !

vemul
Cleanup Volunteer
0
 

Accepted Solution

by:
SpideyMod earned 0 total points
ID: 7714453
per recommendation

SpideyMod
Community Support Moderator @Experts Exchange
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
runtime exception 2 67
How to convert from xls to xlsx using java 7 159
Setup Eclipse for Andriod development 2 19
Selenium WebElement Login button findElement 9 18
By the end of 1980s, object oriented programming using languages like C++, Simula69 and ObjectPascal gained momentum. It looked like programmers finally found the perfect language. C++ successfully combined the object oriented principles of Simula w…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.
Suggested Courses

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question