Reading/writing wide character files with wifstream/wofstream

I've just tried the following code with VC 7.1 on Windoze and GC 3.2 on Linux and I generate a file of three bytes in both environments:
--------8<--------
#include <iostream>
#include <fstream>
#include <string>

int main()
{
      const char filename[] = "three_wchars.txt";
      std::wofstream file(filename);
      if (!file) {
            std::cerr << "Error: Unable to create " << filename << '\n';;
            return 1;
      }
      std::wstring wstr(L"abc");
      file << wstr;
      file.close();
}
--------8<--------
I was expecting to get a 6 byte file.

What's going on?
LVL 17
rstaveleyAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

rstaveleyAuthor Commented:
If you've got an answer, you'll probably be able to sort out theblip's question at http:/Q_20796791.html too.
0
rstaveleyAuthor Commented:
Looks like DanRollins found what I'm after here....

http:/Q_20318167.html#7124067
0
rstaveleyAuthor Commented:
...but I can't get it to work.

Here's my best shot at re-implementing his approach to create a 3 character file, which I was hoping to show up as a 6 character file (but I still get 3 characters on GC 3.2 anc VC 7.1):
--------8<--------
#include <iostream>
#include <fstream>
#include <string>
#include <locale>

class Simple_codecvt : public std::codecvt<wchar_t,char,mbstate_t> {
public:
        typedef wchar_t _E;
        typedef char _To;
        typedef mbstate_t _St;
        explicit Simple_codecvt(size_t _R = 0) : std::codecvt<wchar_t,char,mbstate_t>(_R) {}
protected:
        virtual result do_in(_St& _State,const _To *_F1,const _To *_L1,const _To *& _Mid1,_E *_F2,_E *_L2,_E *& _Mid2) const {return noconv;}
        virtual result do_out(_St& _State,const _E *_F1,const _E *_L1,const _E *& _Mid1,_To *_F2, _To *_L2,_To *& _Mid2) const {return noconv;}
        virtual result do_unshift(_St& _State,_To *_F2, _To *_L2,_To *& _Mid2) const {return noconv;}
        virtual int do_length(_St& _State, const _To *_F1,const _To *_L1, size_t _N2) const throw() {return (_N2 < (size_t)(_L1-_F1)?_N2 :_L1 - _F1);}
        virtual bool do_always_noconv() const throw() {return true;}
        virtual int do_max_length() const throw() {return 2;}
        virtual int do_encoding() const throw() {return 2;}
};

int main()
{
        try {
                std::locale loc(std::locale::classic(),new Simple_codecvt);
                std::wofstream file;
                file.imbue(loc);
                file.open("three_wchars.txt");
                if (!file) {
                        std::cerr << "Error: Unable to create file\n";
                        return 1;
                }

                //std::wstring wstr(L"abc");
                //file << wstr /*<< std::endl*/;

                file << L"123";

                file.close();
        }
        catch (std::exception e) {
                std::cerr << "Exception: " << e.what() << std::endl;
        }
}
--------8<--------
0
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

DanRollinsCommented:
I think I figured it out.  The code I provided works, but this doesn't even compile.  That could be part of the problem.

-- Dan
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
rstaveleyAuthor Commented:
Thanks for looking at it, Dan. Sorry to drag this out of the archives!

This compiles for me on VC 7.1, when yours didn't. However, it doesn't do what I want it to do :-)

Presumably yours was VC 6.0 (because of the date)??

I'll try yours on VC 6.0, which I should still have hereabouts.
0
rstaveleyAuthor Commented:
Yes... yours works for VC6.

The _ADDFAC macro isn't supported on VC7.1. My attempt to implement it was probably what was wrong.
0
rstaveleyAuthor Commented:
Ah.... I'd put a w_char in the external representation.

The following compiles and works in VC 7.1 and VC 6.0. Unsurprisingly, however, bearing in mind the leading underscore, the _ADDFAC macro isn't supported by GCC. I wonder what the portable way of doing this is?
--------8<--------
#include <iostream>
#include <fstream>
#include <string>
#include <locale>

class Simple_codecvt : public std::codecvt<wchar_t,char,mbstate_t> {
public:
        typedef wchar_t _E;
        typedef char _To;
        typedef mbstate_t _St;
        explicit Simple_codecvt(size_t _R = 0) : std::codecvt<wchar_t,char,mbstate_t>(_R) {}
protected:
        virtual result do_in(_St& _State,const _To *_F1,const _To *_L1,const _To *& _Mid1,_E *_F2,_E *_L2,_E *& _Mid2) const {return noconv;}
        virtual result do_out(_St& _State,const _E *_F1,const _E *_L1,const _E *& _Mid1,_To *_F2, _To *_L2,_To *& _Mid2) const {return noconv;}
        virtual result do_unshift(_St& _State,_To *_F2, _To *_L2,_To *& _Mid2) const {return noconv;}
        virtual int do_length(_St& _State, const _To *_F1,const _To *_L1, size_t _N2) const throw() {return (_N2 < (size_t)(_L1-_F1)?_N2 :_L1 - _F1);}
        virtual bool do_always_noconv() const throw() {return true;}
        virtual int do_max_length() const throw() {return 2;}
        virtual int do_encoding() const throw() {return 2;}
};

int main()
{
        try {
                std::locale loc = std::_ADDFAC(std::locale::classic(),new Simple_codecvt);
                std::wofstream file;
                file.imbue(loc);
                file.open("three_wchars.txt",std::ios::trunc|std::ios::binary);
                if (!file) {
                        std::cerr << "Error: Unable to create file\n";
                        return 1;
                }

                std::wstring wstr(L"abc");
                file << wstr;
                file.close();
        }
        catch (std::exception e) {
                std::cerr << "Exception: " << e.what() << std::endl;
        }
        return 0;
}
--------8<--------
0
DanRollinsCommented:
Thanks for the points.  
I tried unscrambling that STL mess to figure out what _ADDFAC does, but once they get into the locale support, it's just a hopeless jumble.  That's the main reason I hate STL.  I can't imagine that code so obfuscated could be anything close to efficient.

If I want to output 32 bytes, I just output them :)

-- Dan
0
rstaveleyAuthor Commented:
It beats me why there isn't a simple mechanism to specify the encoding of the stream that you are about to open for reading/writing in STL. It feels very wrong that wofstream needs to be opened in binary mode and for all the double-back somersaults to do UTF-16-ish file I/O. The need to use implementation specific macros like _ADDFAC... humph.

Having said that, many thanks for finding a way, Dan :-)
0
AxterCommented:
hmmm....

Did I miss something?

What was the answer?
0
rstaveleyAuthor Commented:
> What was the answer?

The answer to the question was that woftream has two representations of the character. There is the internal representation, which we all know and love; this is wchar_t (16 bits on VC/Windoze and 32 bits on GCC/Linux). There is, however also an external representation - the representation that get written to or read from file - and that was what threw me.

DanRollins's codecvt posting last year pointed me towards the embarrassingly virginal pages of my IOStreams and Locales for Dummies reference, which told me that IOStreams use an external representation, which is typically compact and appropriate for your locale. My locale is en_GB, which means that wftstream uses an 8-bit representation for characters. That's why file << wstr only generated a 3 character file on my systems.

The rest of this communication is a reflection of the fact that it is hard for the likes of fools like me to apply a facet to my locale to get it to behave differently from the default implementation. The need to use non-portable macros like _ADDFAC in VC6/VC7, which means that the code isn't portable to GCC, is like reading "There be dragons" on an ancient map.

Simple_codecvt isn't portable. I haven't been able to test it, but it should surely need to have the following modification to be portable:

        virtual int do_max_length() const throw() {return sizeof(wchar_t);} /* Not necessarily 2 */
        virtual int do_encoding() const throw() {return sizeof(wchar_t);}

The reason why I haven't been able to test it is that I haven't found out the portable (or indeed GCC-specific) equivalent of _ADDFAC and therefore don't really know how to add this facet to the locale in GCC.

It would be nice to think that a codecvt class could be written to allow UTF-16 files to be read/written by wfstream http:/Q_20796791.html#9752719, but I can't see how the BOM (byte order marker) at the beginning of the stream would be elegantly handled by codecvt.

The question was answered, but there are quite a few questions that ought to follow up from here...  :-)
0
AxterCommented:
Thanks for the clarification.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C++

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.