Solved

UTF-8, string and wstring

Posted on 2014-12-10
5
215 Views
Last Modified: 2014-12-12
Hi Experts,

I'm starting a new library and I'd like to use UTF-8.  It seems to be good enough for the majority of the world, so that's what I'll go with.  One of the libraries that my library stores UTF-8 with std::string.  Is there a function in the standard library that converts to and from string and wstring while only ever storing UTF-8?

I'm thinking I'll set up this conversion right when I have to use the string with this other library.  When reading from the other library, I'll convert to wstring.  When writing to it, I'll give it wstring converted to string.

Also, are there any caveats here?

Thanks,
Mike
0
Comment
Question by:thready
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 34

Assisted Solution

by:sarabande
sarabande earned 500 total points
ID: 40493336
the standard has no direct conversion (as far as I know) but windows has by converting to utf16 and back to utf8.

you may use the following function:

bool ConvertUtf8ToAnsi(const char * strIn, char strOut[], int sizOut)
{
    bool bok = false;
    int	 len = (int)strlen(strIn);
    wchar_t * pwsz = new wchar_t[len+1];

    int newlen = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, strIn, len, pwsz, len+1);
    if (newlen > 0)
    {
        // you should pass a buffer with 100 extra bytes for safe conversion
        newlen = WideCharToMultiByte(CP_ACP, 0, pwsz, newlen, strOut, sizOut, "?", NULL);
        if (newlen > 0)
        {
            bok = true;
        }
    }
    delete [] pwsz;

    if (bok == false)
    {
        //DWORD dwError = GetLastError();
        //std::cout << dwError << ", Conversion Utf8 to Ansi failed" << std::endl;
    }

    return bok;
}

Open in new window


Sara
0
 
LVL 34

Accepted Solution

by:
sarabande earned 500 total points
ID: 40493342
sorry, I see that you want the opposite and convert from ansi to utf8. simply exchange the CP_UTF8 and CP_APC in the conversion calls and pass a buffer that is at least twice as big as the input string.

Sara
0
 
LVL 1

Author Comment

by:thready
ID: 40495422
Hi Sara, maybe what I'm saying doesn't make sense.  I have a UTF-8 encoded string, which obviously I cannot access individual characters with without looking at the ranges myself.  This is given to me by this library I am using.  Now, I don't want to change the encoding, I just want to store this UTF-8 string into a wstring instead, so that I can access individual characters with it.  Firs this make sense?  Why would it be incorrect to create my wstring like so?

string s =  [some UTF-8 encoded string];
wstring w(s.begin(), s.end());

Thanks again!
Mike
0
 
LVL 34

Assisted Solution

by:sarabande
sarabande earned 500 total points
ID: 40495666
utf-8 and utf-16 are much different. utf-16 has two bytes for each character, regardless whether it was an ascii character or a special Arabic or Chines letter. utf-8 is a multi-byte character set which uses 1 byte for ascii and 2- 4 characters for any other character. so beside of the ascii characters (code 0 ... 127 decimal) there is no commonness between the codes and any translation from one to another needs to perform a conversion which is not trivial.

nevertheless there are a lot of libraries available which could do that. one of the oldest is the winapi where you could call MultiByteToWideChar(CP_UTF8, ...) for doing the translation.

Sara
0
 
LVL 1

Author Closing Comment

by:thready
ID: 40495672
Thank you Sara
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What my article will show is if you ever had to do processing to a listbox without being able to just select all the items in it. My software Visual Studio 2008 crystal report v11 My issue was I wanted to add crystal report to a form and show…
After several hours of googling I could not gather any information on this topic. There are several ways of controlling the USB port connected to any storage device. The best example of that is by changing the registry value of "HKEY_LOCAL_MACHINE\S…
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question