Link to home
Create AccountLog in
C++

C++

--

Questions

--

Followers

Top Experts

Avatar of G00fy
G00fy

Convert ASCII string to UTF8 string
Title explains it all ;)

I want to convert a string like "ûüâäç" etc to UTF8 encoding...

How can I do this with an easy C(++) function?

Zero AI Policy

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


Avatar of AxterAxter🇺🇸

Your question does not match the title of your question.

Do you want to convert ASCII to UTF8?
Or do you want to convert UNICODE to UTF8?

ASKER CERTIFIED SOLUTION
Avatar of AxterAxter🇺🇸

Link to home
membership
Log in or create a free account to see answer.
Signing up is free and takes 30 seconds. No credit card required.
Create Account

Avatar of AxterAxter🇺🇸

The C/C++ mbstowcs function can be used to convert an ANSI string to UNICODE.

mbstowcs is more portable then MultiByteToWideChar, and should work on any C/C++ compliant compiler

Avatar of AxterAxter🇺🇸

To convert UNICODE string to ANSI string, check out the following link:

http://www.axter.com/faq/topic.asp?TOPIC_ID=63&FORUM_ID=4&CAT_ID=9

Reward 1Reward 2Reward 3Reward 4Reward 5Reward 6

EARN REWARDS FOR ASKING, ANSWERING, AND MORE.

Earn free swag for participating on the platform.


Avatar of AxterAxter🇺🇸

For VC++, if you want to convert an ASCII to a UTF8 you could use MultiByteToWideChar and then use WideCharToMultiByte.

Use the MultiByteToWideChar to convert ASCII to UNICODE, and then use WideCharToMultiByte to conver from UNICODE to UTF8.

Avatar of G00fyG00fy

ASKER

No, what I meant is to convert an ansii string to utf8 encoding...

so it means convert the 'ü' character (char -4) to utf8 (-62 -81 if I remember correctly)?

[btw, is it logical I did see emails coming in with replies from you, but that I didn't see the posts itself?]

Avatar of G00fyG00fy

ASKER

Isn't there an easier way then WC2MB & MB2WC ?

That works ... But :S It's so slow (I mean there SHOULD be something like a 3 lines function or so)

Free T-shirt

Get a FREE t-shirt when you ask your first question.

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


Avatar of AxterAxter🇺🇸

>>[btw, is it logical I did see emails coming in with replies from you, but that I didn't see the posts itself?]

You have to click on the link to Experts-Exchange, to see the reply.


>>so it means convert the 'ü' character (char -4) to utf8 (-62 -81 if I remember correctly)?

Did you try the functions I posted?

FYI:
'ü' is not an ASCII character.

Where are you getting this character from?  How is it introduced into your code?

Avatar of AxterAxter🇺🇸

>>That works ... But :S It's so slow (I mean there SHOULD be something like a 3 lines function or so)

What do you mean it's slow?
How do you know it's slow?
Did you do a bench mark test?

Can you post your code?

Avatar of G00fyG00fy

ASKER

I tried it, it works, but when importing like 20k lines from an ascii file, this is getting too slow for me...

the characters come to me via an ascii file...
I read line per line, parse it & then I convert for example the names of the people in it to UTF8-encoding... (actually all the non-numeric fields are being converted).

And then I need it to submit it to SQLite, which is compiled in UTF8-mode

Reward 1Reward 2Reward 3Reward 4Reward 5Reward 6

EARN REWARDS FOR ASKING, ANSWERING, AND MORE.

Earn free swag for participating on the platform.


Avatar of G00fyG00fy

ASKER

char* lijn; // here is something inside I need to convert
wchar_t * lijn2 = new wchar_t[strlen(lijn)+1]
MultiByteToWideChar(CP_ACP, 0, lijn, strlen(lijn), lijn2,  strlen(lijn));
delete [] lijn;
lijn = new char[wcslen(lijn2)*3+1] // ugly yes :p
WideCharToMultiByte(CP_UTF8, 0, lijn2, wcslen(lijn2), lijn, wcslen(lijn2)*3, 0, NULL);

--> was something like that ... already ditched it
(currently going via wxWindows methods)
wxString test( lijn, wxConvLibc );
test.mb_str( wxConvUTF8 );

works OK for me ... But this also is ways too slow :(

Avatar of AxterAxter🇺🇸

>>I tried it, it works, but when importing like 20k lines from an ascii file, this is getting too slow for me...

Again, how do you know it's slow?
Did you run any type of valid test to see if it is slow?

If so, please explain.

This method should not impact your code, since the real bottle neck will be in reading the file.

Do a test with the function calls, and compare it to running your code without the function calls.  I would be very surprise if you could measure a significant difference.

Avatar of AxterAxter🇺🇸

>>works OK for me ... But this also is ways too slow :(

Please post your method for testing speed.

Free T-shirt

Get a FREE t-shirt when you ask your first question.

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


Avatar of AxterAxter🇺🇸

Why are you using UTF8 instead of wide string (UNICODE)?

SOLUTION
Link to home
membership
Log in or create a free account to see answer.
Signing up is free and takes 30 seconds. No credit card required.

Avatar of G00fyG00fy

ASKER

wxStopWatch sw;
wxMessageBox( wxString::Format( "Time elapsed: %ldms", sw.Time() ) );

This stopwatch starts before the file being read in, and stops after the file is read in...

It takes +- 5.6s to read in the file via wxString, via the other calls it takes 7.2s ...

Not a huge difference, but I think the real bottleneck is when assigning the memory for the second string ...

Avatar of G00fyG00fy

ASKER

Checked them all out:

      wchar_t * lijn2 = new wchar_t[MAX_BUFFER_LENGTH];
      MultiByteToWideChar(CP_ACP, 0, abuffer, strlen(abuffer)+1, lijn2,  MAX_BUFFER_LENGTH);
      WideCharToMultiByte(CP_UTF8, 0, lijn2, wcslen(lijn2)+1, abuffer, MAX_BUFFER_LENGTH, 0, NULL);

==> 1200ms <-> 1300ms

  *lijn2++ = (char)(192 + (((unsigned char)lijn[current_number]) / 64));
  *lijn2++ = (char)(128 + (((unsigned char)lijn[current_number]) % 64));
==> 1046ms <-> 1000ms


      wxString test( abuffer, wxConvLibc );
      strcpy(abuffer, test.mb_str( wxConvUTF8 ) );
==> 1360ms <-> 2703ms

Reward 1Reward 2Reward 3Reward 4Reward 5Reward 6

EARN REWARDS FOR ASKING, ANSWERING, AND MORE.

Earn free swag for participating on the platform.


Avatar of G00fyG00fy

ASKER

PS: I took the writing to the database out of it, so it would be faster
C++

C++

--

Questions

--

Followers

Top Experts

C++ is an intermediate-level general-purpose programming language, not to be confused with C or C#. It was developed as a set of extensions to the C programming language to improve type-safety and add support for automatic resource management, object-orientation, generic programming, and exception handling, among other features.