Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Mutibyte characters -> Unicode

Posted on 2001-07-24
19
Medium Priority
?
1,005 Views
Last Modified: 2007-12-19
Hi,

I need to convert CString (VC++) to String (Java). This CString is encoded in multibyte character set format (MBSC), I can use the function MultiByteToWideChar() to convert MBSC to Unicode in VC++, but when I construct a new java.lang.String from this converted Unicode string through the JNI's function NewString(), the return String is not Unicode.

What's different between Unicode in VC++ and Java? How can I convert Unicode/MBCS correctly from VC++ to Java and vice versa?

Thanks,
Tommi
0
Comment
Question by:tommi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
19 Comments
 
LVL 92

Expert Comment

by:objects
ID: 6311642
> when I construct a new java.lang.String

Exactly what string are you passing to the String ctor?

> the return String is not Unicode.

All java String's are unicode.
What do you mean when u say nor Unicode?



0
 

Author Comment

by:tommi
ID: 6311750
> when I construct a new java.lang.String

Exactly what string are you passing to the String ctor?

The String I got after running MultiByteToWideChar()

> the return String is not Unicode.

All java String's are unicode.
What do you mean when u say nor Unicode?

It's my unclear explain ;-) You are correct! Java's String is Unicode. I mean the function NewString() should recognize my VC++ Unicode and convert it to String correctly, but it seems to copy byte->byte only, as you know UTF-16 use 16bits. So do I need express this String in format \uXXXX again in Java? And how can I do it?

Thanks,
Tommi
0
 
LVL 92

Expert Comment

by:objects
ID: 6311968
NewString expects a pointer to a Unicode string.

The conversion should look something like this:


unsigned short buf[1024];

MultiByteToWideChar(
CP_ACP,               // code page
MB_ERR_INVALID_CHARS, // options
s,                    // string to map
-1,                   // null-terminated
&buf[0],              // wide-character buffer address
sizeof(buf)           // buffer size
);

// Create java string
jstring result = env->NewString(&buf[0], strlen(s));


I'll have a closer look at the problem tomorrow.

0
The top UI technologies you need to be aware of

An important part of the job as a front-end developer is to stay up to date and in contact with new tools, trends and workflows. That’s why you cannot miss this upcoming webinar to explore the latest trends in UI technologies!

 

Author Comment

by:tommi
ID: 6315650
I certainly did it. But I don't know what the difference in Unicode format between VC++ and Java. That's why converted String wasn't my Unicode CString.

Regards,
Tommi
0
 
LVL 92

Expert Comment

by:objects
ID: 6315663
> I certainly did it.

Are you saying your code is the same as above?
Is it possible for you to post your code?

> But I don't know what the difference in Unicode format
> between VC++ and Java.

One would hope there is no difference as Unicode is a standard.


One thig to try would be to pick a small test string, and print out the actual underlying resulting bytes after each of the conversion step.
This may give u a better idea of what's happening.
0
 

Author Comment

by:tommi
ID: 6316172
I looked at Java documents and found that JNI's NewString()"Constructs a new java.lang.String object from an array of UTF-8 characters." The VC++ string I passed to is encoded in "platform's default character encoding" (it's not UTF-8, I don't know what its format is though). That's different ;-)

How do I solve it? Java's String.getBytes() "convert this String into bytes according to the platform's default character encoding, storing the result into a new byte array". And that's what I need ;-), then I pass it to JNI as a ByteArray byte[], it will be recognized by VC++. I do the same when I return VC++ string to Java, no need to run MultiByteToWideChar() and use the Java contructor String(byte[])to convert.

I post my testing code here and hope it helps someone who encounter the same problem as mine ;-) I'm testing in Japanese Windows.

Thanks objects, anyway.

Tommi

//Java code:
class Testing {
     static {
          System.loadLibrary("MyDLL");
     }

     public native byte[] test(byte[] string);

     static void main(String [] args) {
          Testing prn = new Testing();

          String teststring = "\u3088\u3046" + "A";
          System.out.println("in java1: "+teststring);
          byte[] bytes = prn.test(teststring.getBytes());
          System.out.println("in java2: "+new String(bytes));
     }
}

//VC++ code:
NIEXPORT jbyteArray JNICALL Java_Modem_test
  (JNIEnv * env, jobject obj, jbyteArray byteArray)
{
     jbyte *bytes = env->GetByteArrayElements(byteArray, NULL);
     int len = env->GetArrayLength(byteArray);

     char*  string = new char[len+1];
     strcpy(string, (const char*)bytes);
     bytes[len]='\0';
     printf("\nin C++: %s\n", string);

     jbyteArray array = env->NewByteArray(len);
     env->SetByteArrayRegion(array, 0, len, (signed char*)string);

     delete [] string;

     return array;
}
0
 

Author Comment

by:tommi
ID: 6316200
I missed one line before returning the array ;-)

   //.....
   env->ReleaseByteArrayElements(byteArray, bytes, 0); //added
   return array;
}
0
 
LVL 92

Expert Comment

by:objects
ID: 6316237
> "Constructs a new java.lang.String object
> from an array of UTF-8 characters."

Where's that quote from?

My understanding is that NewString constructs a new java.lang.String object from an array of Unicode characters. And that the JNI uses UTF-8 strings to represent various string types.

They may mean the same thing, every time I look at this stuff it takes me a while to get my head around how it all works.


> Thanks objects, anyway.

No worries, as long as your problem is solved I'm happy.

Though I'm a bit confused about how your solution converts CString to Java String. I guess you have a routine that goes the other way that wasn't posted.

All the best :)



0
 

Author Comment

by:tommi
ID: 6316329
> "Constructs a new java.lang.String object
> from an array of UTF-8 characters."

Where's that quote from?

http://java.sun.com/j2se/1.3/docs/guide/jni/spec/functions.doc.html#5386
Then search "NewStringUTF"

> Thanks objects, anyway.

No worries, as long as your problem is solved I'm happy.

Yeah... I'm happy too!

Though I'm a bit confused about how your solution converts CString to Java String. I guess you have
a routine that goes the other way that wasn't posted.

No, I posted them all. I convert CString->String in two lines:
    jbyteArray array = env->NewByteArray(len);
    env->SetByteArrayRegion(array, 0, len, (signed char*)string);

For example: Need to convert CString VCString;
     unsigned int len = _mbstrlen((LPCTSTR)VCString);
     jbyteArray array = env->NewByteArray(len);
     env->SetByteArrayRegion(array, 0, len, (signed char *)(LPCTSTR)VCString);

     return array; //----> Java String (need to new String(byte[]) in Java code)

All the best :)

You too ;-)

Tommi
0
 
LVL 92

Expert Comment

by:objects
ID: 6316362
Ahhhh, NewStringUTF(), not NewString().
You've been saying NewString.
One creates from UTF8, the other from Unicode.

mick :)

0
 

Author Comment

by:tommi
ID: 6316405
> "Constructs a new java.lang.String object
> from an array of UTF-8 characters."

Where's that quote from?

http://java.sun.com/j2se/1.3/docs/guide/jni/spec/functions.doc.html#5386
Then search "NewStringUTF"

> Thanks objects, anyway.

No worries, as long as your problem is solved I'm happy.

Yeah... I'm happy too!

Though I'm a bit confused about how your solution converts CString to Java String. I guess you have
a routine that goes the other way that wasn't posted.

No, I posted them all. I convert CString->String in two lines:
    jbyteArray array = env->NewByteArray(len);
    env->SetByteArrayRegion(array, 0, len, (signed char*)string);

For example: Need to convert CString VCString;
     unsigned int len = _mbstrlen((LPCTSTR)VCString);
     jbyteArray array = env->NewByteArray(len);
     env->SetByteArrayRegion(array, 0, len, (signed char *)(LPCTSTR)VCString);

     return array; //----> Java String (need to new String(byte[]) in Java code)

All the best :)

You too ;-)

Tommi
0
 

Author Comment

by:tommi
ID: 6316433
Ahhhh, NewStringUTF(), not NewString().
You've been saying NewString.
One creates from UTF8, the other from Unicode.

Yep! Even though, these UTF8 and Unicode functions can't recognize VC++ String correctly (the platform's default character encoding). You can test to see it.

;-))
0
 
LVL 92

Expert Comment

by:objects
ID: 6316472
I believe you, I was just confused when you said NewString needed a UTF8 array as input.

I think CStrings can use Unicode, or MBCS depending on how the application is built. Sounds like yours are Unicode, thats why the MultiByteToWideChar wasn't working.
ie. you already had unicode.
0
 

Author Comment

by:tommi
ID: 6316544
Right, it depends on defining _MSBC or _UNICODE, the CString will be MBSC or UNICODE respectively. But can't not define both of them. I'm using _MSBC ;-) and MultiByteToWideChar() is suitable for this case.
Are you trying to take advantage of NewString() or NewStringUTF()?
0
 
LVL 92

Expert Comment

by:objects
ID: 6316563
> Are you trying to take advantage of NewString()
> or NewStringUTF()?

Not trying either, just trying to help you and understand what was going on :-)
0
 

Author Comment

by:tommi
ID: 6316591
It's quite fun to talk with you ;-) I'm still openning this question to wait for your help ;-)
0
 
LVL 92

Expert Comment

by:objects
ID: 6320260
0
 
LVL 5

Expert Comment

by:vemul
ID: 7669501
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:
- To be PAQ'ed and points NOT refunded
Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER !

vemul
Cleanup Volunteer
0
 

Accepted Solution

by:
SpideyMod earned 0 total points
ID: 7714453
per recommendation

SpideyMod
Community Support Moderator @Experts Exchange
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

By the end of 1980s, object oriented programming using languages like C++, Simula69 and ObjectPascal gained momentum. It looked like programmers finally found the perfect language. C++ successfully combined the object oriented principles of Simula w…
Java had always been an easily readable and understandable language.  Some relatively recent changes in the language seem to be changing this pretty fast, and anyone that had not seen any Java code for the last 5 years will possibly have issues unde…
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
Suggested Courses

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question