Solved

Reading/writing wide character files with wifstream/wofstream

Posted on 2003-11-13
12
1,601 Views
Last Modified: 2007-12-19
I've just tried the following code with VC 7.1 on Windoze and GC 3.2 on Linux and I generate a file of three bytes in both environments:
--------8<--------
#include <iostream>
#include <fstream>
#include <string>

int main()
{
      const char filename[] = "three_wchars.txt";
      std::wofstream file(filename);
      if (!file) {
            std::cerr << "Error: Unable to create " << filename << '\n';;
            return 1;
      }
      std::wstring wstr(L"abc");
      file << wstr;
      file.close();
}
--------8<--------
I was expecting to get a 6 byte file.

What's going on?
0
Comment
Question by:rstaveley
  • 8
  • 2
  • 2
12 Comments
 
LVL 17

Author Comment

by:rstaveley
ID: 9741890
If you've got an answer, you'll probably be able to sort out theblip's question at http:/Q_20796791.html too.
0
 
LVL 17

Author Comment

by:rstaveley
ID: 9746372
Looks like DanRollins found what I'm after here....

http:/Q_20318167.html#7124067
0
 
LVL 17

Author Comment

by:rstaveley
ID: 9747020
...but I can't get it to work.

Here's my best shot at re-implementing his approach to create a 3 character file, which I was hoping to show up as a 6 character file (but I still get 3 characters on GC 3.2 anc VC 7.1):
--------8<--------
#include <iostream>
#include <fstream>
#include <string>
#include <locale>

class Simple_codecvt : public std::codecvt<wchar_t,char,mbstate_t> {
public:
        typedef wchar_t _E;
        typedef char _To;
        typedef mbstate_t _St;
        explicit Simple_codecvt(size_t _R = 0) : std::codecvt<wchar_t,char,mbstate_t>(_R) {}
protected:
        virtual result do_in(_St& _State,const _To *_F1,const _To *_L1,const _To *& _Mid1,_E *_F2,_E *_L2,_E *& _Mid2) const {return noconv;}
        virtual result do_out(_St& _State,const _E *_F1,const _E *_L1,const _E *& _Mid1,_To *_F2, _To *_L2,_To *& _Mid2) const {return noconv;}
        virtual result do_unshift(_St& _State,_To *_F2, _To *_L2,_To *& _Mid2) const {return noconv;}
        virtual int do_length(_St& _State, const _To *_F1,const _To *_L1, size_t _N2) const throw() {return (_N2 < (size_t)(_L1-_F1)?_N2 :_L1 - _F1);}
        virtual bool do_always_noconv() const throw() {return true;}
        virtual int do_max_length() const throw() {return 2;}
        virtual int do_encoding() const throw() {return 2;}
};

int main()
{
        try {
                std::locale loc(std::locale::classic(),new Simple_codecvt);
                std::wofstream file;
                file.imbue(loc);
                file.open("three_wchars.txt");
                if (!file) {
                        std::cerr << "Error: Unable to create file\n";
                        return 1;
                }

                //std::wstring wstr(L"abc");
                //file << wstr /*<< std::endl*/;

                file << L"123";

                file.close();
        }
        catch (std::exception e) {
                std::cerr << "Exception: " << e.what() << std::endl;
        }
}
--------8<--------
0
Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

 
LVL 49

Accepted Solution

by:
DanRollins earned 125 total points
ID: 9749102
I think I figured it out.  The code I provided works, but this doesn't even compile.  That could be part of the problem.

-- Dan
0
 
LVL 17

Author Comment

by:rstaveley
ID: 9749149
Thanks for looking at it, Dan. Sorry to drag this out of the archives!

This compiles for me on VC 7.1, when yours didn't. However, it doesn't do what I want it to do :-)

Presumably yours was VC 6.0 (because of the date)??

I'll try yours on VC 6.0, which I should still have hereabouts.
0
 
LVL 17

Author Comment

by:rstaveley
ID: 9749192
Yes... yours works for VC6.

The _ADDFAC macro isn't supported on VC7.1. My attempt to implement it was probably what was wrong.
0
 
LVL 17

Author Comment

by:rstaveley
ID: 9749822
Ah.... I'd put a w_char in the external representation.

The following compiles and works in VC 7.1 and VC 6.0. Unsurprisingly, however, bearing in mind the leading underscore, the _ADDFAC macro isn't supported by GCC. I wonder what the portable way of doing this is?
--------8<--------
#include <iostream>
#include <fstream>
#include <string>
#include <locale>

class Simple_codecvt : public std::codecvt<wchar_t,char,mbstate_t> {
public:
        typedef wchar_t _E;
        typedef char _To;
        typedef mbstate_t _St;
        explicit Simple_codecvt(size_t _R = 0) : std::codecvt<wchar_t,char,mbstate_t>(_R) {}
protected:
        virtual result do_in(_St& _State,const _To *_F1,const _To *_L1,const _To *& _Mid1,_E *_F2,_E *_L2,_E *& _Mid2) const {return noconv;}
        virtual result do_out(_St& _State,const _E *_F1,const _E *_L1,const _E *& _Mid1,_To *_F2, _To *_L2,_To *& _Mid2) const {return noconv;}
        virtual result do_unshift(_St& _State,_To *_F2, _To *_L2,_To *& _Mid2) const {return noconv;}
        virtual int do_length(_St& _State, const _To *_F1,const _To *_L1, size_t _N2) const throw() {return (_N2 < (size_t)(_L1-_F1)?_N2 :_L1 - _F1);}
        virtual bool do_always_noconv() const throw() {return true;}
        virtual int do_max_length() const throw() {return 2;}
        virtual int do_encoding() const throw() {return 2;}
};

int main()
{
        try {
                std::locale loc = std::_ADDFAC(std::locale::classic(),new Simple_codecvt);
                std::wofstream file;
                file.imbue(loc);
                file.open("three_wchars.txt",std::ios::trunc|std::ios::binary);
                if (!file) {
                        std::cerr << "Error: Unable to create file\n";
                        return 1;
                }

                std::wstring wstr(L"abc");
                file << wstr;
                file.close();
        }
        catch (std::exception e) {
                std::cerr << "Exception: " << e.what() << std::endl;
        }
        return 0;
}
--------8<--------
0
 
LVL 49

Expert Comment

by:DanRollins
ID: 9750165
Thanks for the points.  
I tried unscrambling that STL mess to figure out what _ADDFAC does, but once they get into the locale support, it's just a hopeless jumble.  That's the main reason I hate STL.  I can't imagine that code so obfuscated could be anything close to efficient.

If I want to output 32 bytes, I just output them :)

-- Dan
0
 
LVL 17

Author Comment

by:rstaveley
ID: 9750296
It beats me why there isn't a simple mechanism to specify the encoding of the stream that you are about to open for reading/writing in STL. It feels very wrong that wofstream needs to be opened in binary mode and for all the double-back somersaults to do UTF-16-ish file I/O. The need to use implementation specific macros like _ADDFAC... humph.

Having said that, many thanks for finding a way, Dan :-)
0
 
LVL 30

Expert Comment

by:Axter
ID: 9752590
hmmm....

Did I miss something?

What was the answer?
0
 
LVL 17

Author Comment

by:rstaveley
ID: 9753379
> What was the answer?

The answer to the question was that woftream has two representations of the character. There is the internal representation, which we all know and love; this is wchar_t (16 bits on VC/Windoze and 32 bits on GCC/Linux). There is, however also an external representation - the representation that get written to or read from file - and that was what threw me.

DanRollins's codecvt posting last year pointed me towards the embarrassingly virginal pages of my IOStreams and Locales for Dummies reference, which told me that IOStreams use an external representation, which is typically compact and appropriate for your locale. My locale is en_GB, which means that wftstream uses an 8-bit representation for characters. That's why file << wstr only generated a 3 character file on my systems.

The rest of this communication is a reflection of the fact that it is hard for the likes of fools like me to apply a facet to my locale to get it to behave differently from the default implementation. The need to use non-portable macros like _ADDFAC in VC6/VC7, which means that the code isn't portable to GCC, is like reading "There be dragons" on an ancient map.

Simple_codecvt isn't portable. I haven't been able to test it, but it should surely need to have the following modification to be portable:

        virtual int do_max_length() const throw() {return sizeof(wchar_t);} /* Not necessarily 2 */
        virtual int do_encoding() const throw() {return sizeof(wchar_t);}

The reason why I haven't been able to test it is that I haven't found out the portable (or indeed GCC-specific) equivalent of _ADDFAC and therefore don't really know how to add this facet to the locale in GCC.

It would be nice to think that a codecvt class could be written to allow UTF-16 files to be read/written by wfstream http:/Q_20796791.html#9752719, but I can't see how the BOM (byte order marker) at the beginning of the stream would be elegantly handled by codecvt.

The question was answered, but there are quite a few questions that ought to follow up from here...  :-)
0
 
LVL 30

Expert Comment

by:Axter
ID: 9754477
Thanks for the clarification.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Unlike C#, C++ doesn't have native support for sealing classes (so they cannot be sub-classed). At the cost of a virtual base class pointer it is possible to implement a pseudo sealing mechanism The trick is to virtually inherit from a base class…
Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.

789 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question