Solved

UTF-8, string and wstring

Posted on 2014-12-10
5
192 Views
Last Modified: 2014-12-12
Hi Experts,

I'm starting a new library and I'd like to use UTF-8.  It seems to be good enough for the majority of the world, so that's what I'll go with.  One of the libraries that my library stores UTF-8 with std::string.  Is there a function in the standard library that converts to and from string and wstring while only ever storing UTF-8?

I'm thinking I'll set up this conversion right when I have to use the string with this other library.  When reading from the other library, I'll convert to wstring.  When writing to it, I'll give it wstring converted to string.

Also, are there any caveats here?

Thanks,
Mike
0
Comment
Question by:thready
  • 3
  • 2
5 Comments
 
LVL 32

Assisted Solution

by:sarabande
sarabande earned 500 total points
Comment Utility
the standard has no direct conversion (as far as I know) but windows has by converting to utf16 and back to utf8.

you may use the following function:

bool ConvertUtf8ToAnsi(const char * strIn, char strOut[], int sizOut)
{
    bool bok = false;
    int	 len = (int)strlen(strIn);
    wchar_t * pwsz = new wchar_t[len+1];

    int newlen = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, strIn, len, pwsz, len+1);
    if (newlen > 0)
    {
        // you should pass a buffer with 100 extra bytes for safe conversion
        newlen = WideCharToMultiByte(CP_ACP, 0, pwsz, newlen, strOut, sizOut, "?", NULL);
        if (newlen > 0)
        {
            bok = true;
        }
    }
    delete [] pwsz;

    if (bok == false)
    {
        //DWORD dwError = GetLastError();
        //std::cout << dwError << ", Conversion Utf8 to Ansi failed" << std::endl;
    }

    return bok;
}

Open in new window


Sara
0
 
LVL 32

Accepted Solution

by:
sarabande earned 500 total points
Comment Utility
sorry, I see that you want the opposite and convert from ansi to utf8. simply exchange the CP_UTF8 and CP_APC in the conversion calls and pass a buffer that is at least twice as big as the input string.

Sara
0
 
LVL 1

Author Comment

by:thready
Comment Utility
Hi Sara, maybe what I'm saying doesn't make sense.  I have a UTF-8 encoded string, which obviously I cannot access individual characters with without looking at the ranges myself.  This is given to me by this library I am using.  Now, I don't want to change the encoding, I just want to store this UTF-8 string into a wstring instead, so that I can access individual characters with it.  Firs this make sense?  Why would it be incorrect to create my wstring like so?

string s =  [some UTF-8 encoded string];
wstring w(s.begin(), s.end());

Thanks again!
Mike
0
 
LVL 32

Assisted Solution

by:sarabande
sarabande earned 500 total points
Comment Utility
utf-8 and utf-16 are much different. utf-16 has two bytes for each character, regardless whether it was an ascii character or a special Arabic or Chines letter. utf-8 is a multi-byte character set which uses 1 byte for ascii and 2- 4 characters for any other character. so beside of the ascii characters (code 0 ... 127 decimal) there is no commonness between the codes and any translation from one to another needs to perform a conversion which is not trivial.

nevertheless there are a lot of libraries available which could do that. one of the oldest is the winapi where you could call MultiByteToWideChar(CP_UTF8, ...) for doing the translation.

Sara
0
 
LVL 1

Author Closing Comment

by:thready
Comment Utility
Thank you Sara
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
As more and more people are shifting to the latest .Net frameworks, the windows presentation framework is gaining importance by the day. Many people are now turning to WPF controls to provide a rich user experience. I have been using WPF controls fo…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now