Solved

Mutibyte characters -> Unicode

Posted on 2001-07-24
19
965 Views
Last Modified: 2007-12-19
Hi,

I need to convert CString (VC++) to String (Java). This CString is encoded in multibyte character set format (MBSC), I can use the function MultiByteToWideChar() to convert MBSC to Unicode in VC++, but when I construct a new java.lang.String from this converted Unicode string through the JNI's function NewString(), the return String is not Unicode.

What's different between Unicode in VC++ and Java? How can I convert Unicode/MBCS correctly from VC++ to Java and vice versa?

Thanks,
Tommi
0
Comment
Question by:tommi
19 Comments
 
LVL 92

Expert Comment

by:objects
Comment Utility
> when I construct a new java.lang.String

Exactly what string are you passing to the String ctor?

> the return String is not Unicode.

All java String's are unicode.
What do you mean when u say nor Unicode?



0
 

Author Comment

by:tommi
Comment Utility
> when I construct a new java.lang.String

Exactly what string are you passing to the String ctor?

The String I got after running MultiByteToWideChar()

> the return String is not Unicode.

All java String's are unicode.
What do you mean when u say nor Unicode?

It's my unclear explain ;-) You are correct! Java's String is Unicode. I mean the function NewString() should recognize my VC++ Unicode and convert it to String correctly, but it seems to copy byte->byte only, as you know UTF-16 use 16bits. So do I need express this String in format \uXXXX again in Java? And how can I do it?

Thanks,
Tommi
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
NewString expects a pointer to a Unicode string.

The conversion should look something like this:


unsigned short buf[1024];

MultiByteToWideChar(
CP_ACP,               // code page
MB_ERR_INVALID_CHARS, // options
s,                    // string to map
-1,                   // null-terminated
&buf[0],              // wide-character buffer address
sizeof(buf)           // buffer size
);

// Create java string
jstring result = env->NewString(&buf[0], strlen(s));


I'll have a closer look at the problem tomorrow.

0
 

Author Comment

by:tommi
Comment Utility
I certainly did it. But I don't know what the difference in Unicode format between VC++ and Java. That's why converted String wasn't my Unicode CString.

Regards,
Tommi
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
> I certainly did it.

Are you saying your code is the same as above?
Is it possible for you to post your code?

> But I don't know what the difference in Unicode format
> between VC++ and Java.

One would hope there is no difference as Unicode is a standard.


One thig to try would be to pick a small test string, and print out the actual underlying resulting bytes after each of the conversion step.
This may give u a better idea of what's happening.
0
 

Author Comment

by:tommi
Comment Utility
I looked at Java documents and found that JNI's NewString()"Constructs a new java.lang.String object from an array of UTF-8 characters." The VC++ string I passed to is encoded in "platform's default character encoding" (it's not UTF-8, I don't know what its format is though). That's different ;-)

How do I solve it? Java's String.getBytes() "convert this String into bytes according to the platform's default character encoding, storing the result into a new byte array". And that's what I need ;-), then I pass it to JNI as a ByteArray byte[], it will be recognized by VC++. I do the same when I return VC++ string to Java, no need to run MultiByteToWideChar() and use the Java contructor String(byte[])to convert.

I post my testing code here and hope it helps someone who encounter the same problem as mine ;-) I'm testing in Japanese Windows.

Thanks objects, anyway.

Tommi

//Java code:
class Testing {
     static {
          System.loadLibrary("MyDLL");
     }

     public native byte[] test(byte[] string);

     static void main(String [] args) {
          Testing prn = new Testing();

          String teststring = "\u3088\u3046" + "A";
          System.out.println("in java1: "+teststring);
          byte[] bytes = prn.test(teststring.getBytes());
          System.out.println("in java2: "+new String(bytes));
     }
}

//VC++ code:
NIEXPORT jbyteArray JNICALL Java_Modem_test
  (JNIEnv * env, jobject obj, jbyteArray byteArray)
{
     jbyte *bytes = env->GetByteArrayElements(byteArray, NULL);
     int len = env->GetArrayLength(byteArray);

     char*  string = new char[len+1];
     strcpy(string, (const char*)bytes);
     bytes[len]='\0';
     printf("\nin C++: %s\n", string);

     jbyteArray array = env->NewByteArray(len);
     env->SetByteArrayRegion(array, 0, len, (signed char*)string);

     delete [] string;

     return array;
}
0
 

Author Comment

by:tommi
Comment Utility
I missed one line before returning the array ;-)

   //.....
   env->ReleaseByteArrayElements(byteArray, bytes, 0); //added
   return array;
}
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
> "Constructs a new java.lang.String object
> from an array of UTF-8 characters."

Where's that quote from?

My understanding is that NewString constructs a new java.lang.String object from an array of Unicode characters. And that the JNI uses UTF-8 strings to represent various string types.

They may mean the same thing, every time I look at this stuff it takes me a while to get my head around how it all works.


> Thanks objects, anyway.

No worries, as long as your problem is solved I'm happy.

Though I'm a bit confused about how your solution converts CString to Java String. I guess you have a routine that goes the other way that wasn't posted.

All the best :)



0
 

Author Comment

by:tommi
Comment Utility
> "Constructs a new java.lang.String object
> from an array of UTF-8 characters."

Where's that quote from?

http://java.sun.com/j2se/1.3/docs/guide/jni/spec/functions.doc.html#5386
Then search "NewStringUTF"

> Thanks objects, anyway.

No worries, as long as your problem is solved I'm happy.

Yeah... I'm happy too!

Though I'm a bit confused about how your solution converts CString to Java String. I guess you have
a routine that goes the other way that wasn't posted.

No, I posted them all. I convert CString->String in two lines:
    jbyteArray array = env->NewByteArray(len);
    env->SetByteArrayRegion(array, 0, len, (signed char*)string);

For example: Need to convert CString VCString;
     unsigned int len = _mbstrlen((LPCTSTR)VCString);
     jbyteArray array = env->NewByteArray(len);
     env->SetByteArrayRegion(array, 0, len, (signed char *)(LPCTSTR)VCString);

     return array; //----> Java String (need to new String(byte[]) in Java code)

All the best :)

You too ;-)

Tommi
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 92

Expert Comment

by:objects
Comment Utility
Ahhhh, NewStringUTF(), not NewString().
You've been saying NewString.
One creates from UTF8, the other from Unicode.

mick :)

0
 

Author Comment

by:tommi
Comment Utility
> "Constructs a new java.lang.String object
> from an array of UTF-8 characters."

Where's that quote from?

http://java.sun.com/j2se/1.3/docs/guide/jni/spec/functions.doc.html#5386
Then search "NewStringUTF"

> Thanks objects, anyway.

No worries, as long as your problem is solved I'm happy.

Yeah... I'm happy too!

Though I'm a bit confused about how your solution converts CString to Java String. I guess you have
a routine that goes the other way that wasn't posted.

No, I posted them all. I convert CString->String in two lines:
    jbyteArray array = env->NewByteArray(len);
    env->SetByteArrayRegion(array, 0, len, (signed char*)string);

For example: Need to convert CString VCString;
     unsigned int len = _mbstrlen((LPCTSTR)VCString);
     jbyteArray array = env->NewByteArray(len);
     env->SetByteArrayRegion(array, 0, len, (signed char *)(LPCTSTR)VCString);

     return array; //----> Java String (need to new String(byte[]) in Java code)

All the best :)

You too ;-)

Tommi
0
 

Author Comment

by:tommi
Comment Utility
Ahhhh, NewStringUTF(), not NewString().
You've been saying NewString.
One creates from UTF8, the other from Unicode.

Yep! Even though, these UTF8 and Unicode functions can't recognize VC++ String correctly (the platform's default character encoding). You can test to see it.

;-))
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
I believe you, I was just confused when you said NewString needed a UTF8 array as input.

I think CStrings can use Unicode, or MBCS depending on how the application is built. Sounds like yours are Unicode, thats why the MultiByteToWideChar wasn't working.
ie. you already had unicode.
0
 

Author Comment

by:tommi
Comment Utility
Right, it depends on defining _MSBC or _UNICODE, the CString will be MBSC or UNICODE respectively. But can't not define both of them. I'm using _MSBC ;-) and MultiByteToWideChar() is suitable for this case.
Are you trying to take advantage of NewString() or NewStringUTF()?
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
> Are you trying to take advantage of NewString()
> or NewStringUTF()?

Not trying either, just trying to help you and understand what was going on :-)
0
 

Author Comment

by:tommi
Comment Utility
It's quite fun to talk with you ;-) I'm still openning this question to wait for your help ;-)
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
0
 
LVL 5

Expert Comment

by:vemul
Comment Utility
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:
- To be PAQ'ed and points NOT refunded
Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER !

vemul
Cleanup Volunteer
0
 

Accepted Solution

by:
SpideyMod earned 0 total points
Comment Utility
per recommendation

SpideyMod
Community Support Moderator @Experts Exchange
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Suggested Solutions

After being asked a question last year, I went into one of my moods where I did some research and code just for the fun and learning of it all.  Subsequently, from this journey, I put together this article on "Range Searching Using Visual Basic.NET …
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now