Solved

UTF-8, string and wstring

Posted on 2014-12-10
5
202 Views
Last Modified: 2014-12-12
Hi Experts,

I'm starting a new library and I'd like to use UTF-8.  It seems to be good enough for the majority of the world, so that's what I'll go with.  One of the libraries that my library stores UTF-8 with std::string.  Is there a function in the standard library that converts to and from string and wstring while only ever storing UTF-8?

I'm thinking I'll set up this conversion right when I have to use the string with this other library.  When reading from the other library, I'll convert to wstring.  When writing to it, I'll give it wstring converted to string.

Also, are there any caveats here?

Thanks,
Mike
0
Comment
Question by:thready
  • 3
  • 2
5 Comments
 
LVL 33

Assisted Solution

by:sarabande
sarabande earned 500 total points
ID: 40493336
the standard has no direct conversion (as far as I know) but windows has by converting to utf16 and back to utf8.

you may use the following function:

bool ConvertUtf8ToAnsi(const char * strIn, char strOut[], int sizOut)
{
    bool bok = false;
    int	 len = (int)strlen(strIn);
    wchar_t * pwsz = new wchar_t[len+1];

    int newlen = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, strIn, len, pwsz, len+1);
    if (newlen > 0)
    {
        // you should pass a buffer with 100 extra bytes for safe conversion
        newlen = WideCharToMultiByte(CP_ACP, 0, pwsz, newlen, strOut, sizOut, "?", NULL);
        if (newlen > 0)
        {
            bok = true;
        }
    }
    delete [] pwsz;

    if (bok == false)
    {
        //DWORD dwError = GetLastError();
        //std::cout << dwError << ", Conversion Utf8 to Ansi failed" << std::endl;
    }

    return bok;
}

Open in new window


Sara
0
 
LVL 33

Accepted Solution

by:
sarabande earned 500 total points
ID: 40493342
sorry, I see that you want the opposite and convert from ansi to utf8. simply exchange the CP_UTF8 and CP_APC in the conversion calls and pass a buffer that is at least twice as big as the input string.

Sara
0
 
LVL 1

Author Comment

by:thready
ID: 40495422
Hi Sara, maybe what I'm saying doesn't make sense.  I have a UTF-8 encoded string, which obviously I cannot access individual characters with without looking at the ranges myself.  This is given to me by this library I am using.  Now, I don't want to change the encoding, I just want to store this UTF-8 string into a wstring instead, so that I can access individual characters with it.  Firs this make sense?  Why would it be incorrect to create my wstring like so?

string s =  [some UTF-8 encoded string];
wstring w(s.begin(), s.end());

Thanks again!
Mike
0
 
LVL 33

Assisted Solution

by:sarabande
sarabande earned 500 total points
ID: 40495666
utf-8 and utf-16 are much different. utf-16 has two bytes for each character, regardless whether it was an ascii character or a special Arabic or Chines letter. utf-8 is a multi-byte character set which uses 1 byte for ascii and 2- 4 characters for any other character. so beside of the ascii characters (code 0 ... 127 decimal) there is no commonness between the codes and any translation from one to another needs to perform a conversion which is not trivial.

nevertheless there are a lot of libraries available which could do that. one of the oldest is the winapi where you could call MultiByteToWideChar(CP_UTF8, ...) for doing the translation.

Sara
0
 
LVL 1

Author Closing Comment

by:thready
ID: 40495672
Thank you Sara
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: SunnyDark
This article's goal is to present you with an easy to use XML wrapper for C++ and also present some interesting techniques that you might use with MS C++. The reason I built this class is to ease the pain of using XML files with C++, since there is…
For a while now I'v been searching for a circular progress control, much like the one you get when first starting your Silverlight application. I found a couple that were written in WPF and there were a few written in Silverlight, but all appeared o…
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question