Solved

Reading/writing wide character files with wifstream/wofstream

Posted on 2003-11-13
12
1,591 Views
Last Modified: 2007-12-19
I've just tried the following code with VC 7.1 on Windoze and GC 3.2 on Linux and I generate a file of three bytes in both environments:
--------8<--------
#include <iostream>
#include <fstream>
#include <string>

int main()
{
      const char filename[] = "three_wchars.txt";
      std::wofstream file(filename);
      if (!file) {
            std::cerr << "Error: Unable to create " << filename << '\n';;
            return 1;
      }
      std::wstring wstr(L"abc");
      file << wstr;
      file.close();
}
--------8<--------
I was expecting to get a 6 byte file.

What's going on?
0
Comment
Question by:rstaveley
  • 8
  • 2
  • 2
12 Comments
 
LVL 17

Author Comment

by:rstaveley
ID: 9741890
If you've got an answer, you'll probably be able to sort out theblip's question at http:/Q_20796791.html too.
0
 
LVL 17

Author Comment

by:rstaveley
ID: 9746372
Looks like DanRollins found what I'm after here....

http:/Q_20318167.html#7124067
0
 
LVL 17

Author Comment

by:rstaveley
ID: 9747020
...but I can't get it to work.

Here's my best shot at re-implementing his approach to create a 3 character file, which I was hoping to show up as a 6 character file (but I still get 3 characters on GC 3.2 anc VC 7.1):
--------8<--------
#include <iostream>
#include <fstream>
#include <string>
#include <locale>

class Simple_codecvt : public std::codecvt<wchar_t,char,mbstate_t> {
public:
        typedef wchar_t _E;
        typedef char _To;
        typedef mbstate_t _St;
        explicit Simple_codecvt(size_t _R = 0) : std::codecvt<wchar_t,char,mbstate_t>(_R) {}
protected:
        virtual result do_in(_St& _State,const _To *_F1,const _To *_L1,const _To *& _Mid1,_E *_F2,_E *_L2,_E *& _Mid2) const {return noconv;}
        virtual result do_out(_St& _State,const _E *_F1,const _E *_L1,const _E *& _Mid1,_To *_F2, _To *_L2,_To *& _Mid2) const {return noconv;}
        virtual result do_unshift(_St& _State,_To *_F2, _To *_L2,_To *& _Mid2) const {return noconv;}
        virtual int do_length(_St& _State, const _To *_F1,const _To *_L1, size_t _N2) const throw() {return (_N2 < (size_t)(_L1-_F1)?_N2 :_L1 - _F1);}
        virtual bool do_always_noconv() const throw() {return true;}
        virtual int do_max_length() const throw() {return 2;}
        virtual int do_encoding() const throw() {return 2;}
};

int main()
{
        try {
                std::locale loc(std::locale::classic(),new Simple_codecvt);
                std::wofstream file;
                file.imbue(loc);
                file.open("three_wchars.txt");
                if (!file) {
                        std::cerr << "Error: Unable to create file\n";
                        return 1;
                }

                //std::wstring wstr(L"abc");
                //file << wstr /*<< std::endl*/;

                file << L"123";

                file.close();
        }
        catch (std::exception e) {
                std::cerr << "Exception: " << e.what() << std::endl;
        }
}
--------8<--------
0
 
LVL 49

Accepted Solution

by:
DanRollins earned 125 total points
ID: 9749102
I think I figured it out.  The code I provided works, but this doesn't even compile.  That could be part of the problem.

-- Dan
0
 
LVL 17

Author Comment

by:rstaveley
ID: 9749149
Thanks for looking at it, Dan. Sorry to drag this out of the archives!

This compiles for me on VC 7.1, when yours didn't. However, it doesn't do what I want it to do :-)

Presumably yours was VC 6.0 (because of the date)??

I'll try yours on VC 6.0, which I should still have hereabouts.
0
 
LVL 17

Author Comment

by:rstaveley
ID: 9749192
Yes... yours works for VC6.

The _ADDFAC macro isn't supported on VC7.1. My attempt to implement it was probably what was wrong.
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 17

Author Comment

by:rstaveley
ID: 9749822
Ah.... I'd put a w_char in the external representation.

The following compiles and works in VC 7.1 and VC 6.0. Unsurprisingly, however, bearing in mind the leading underscore, the _ADDFAC macro isn't supported by GCC. I wonder what the portable way of doing this is?
--------8<--------
#include <iostream>
#include <fstream>
#include <string>
#include <locale>

class Simple_codecvt : public std::codecvt<wchar_t,char,mbstate_t> {
public:
        typedef wchar_t _E;
        typedef char _To;
        typedef mbstate_t _St;
        explicit Simple_codecvt(size_t _R = 0) : std::codecvt<wchar_t,char,mbstate_t>(_R) {}
protected:
        virtual result do_in(_St& _State,const _To *_F1,const _To *_L1,const _To *& _Mid1,_E *_F2,_E *_L2,_E *& _Mid2) const {return noconv;}
        virtual result do_out(_St& _State,const _E *_F1,const _E *_L1,const _E *& _Mid1,_To *_F2, _To *_L2,_To *& _Mid2) const {return noconv;}
        virtual result do_unshift(_St& _State,_To *_F2, _To *_L2,_To *& _Mid2) const {return noconv;}
        virtual int do_length(_St& _State, const _To *_F1,const _To *_L1, size_t _N2) const throw() {return (_N2 < (size_t)(_L1-_F1)?_N2 :_L1 - _F1);}
        virtual bool do_always_noconv() const throw() {return true;}
        virtual int do_max_length() const throw() {return 2;}
        virtual int do_encoding() const throw() {return 2;}
};

int main()
{
        try {
                std::locale loc = std::_ADDFAC(std::locale::classic(),new Simple_codecvt);
                std::wofstream file;
                file.imbue(loc);
                file.open("three_wchars.txt",std::ios::trunc|std::ios::binary);
                if (!file) {
                        std::cerr << "Error: Unable to create file\n";
                        return 1;
                }

                std::wstring wstr(L"abc");
                file << wstr;
                file.close();
        }
        catch (std::exception e) {
                std::cerr << "Exception: " << e.what() << std::endl;
        }
        return 0;
}
--------8<--------
0
 
LVL 49

Expert Comment

by:DanRollins
ID: 9750165
Thanks for the points.  
I tried unscrambling that STL mess to figure out what _ADDFAC does, but once they get into the locale support, it's just a hopeless jumble.  That's the main reason I hate STL.  I can't imagine that code so obfuscated could be anything close to efficient.

If I want to output 32 bytes, I just output them :)

-- Dan
0
 
LVL 17

Author Comment

by:rstaveley
ID: 9750296
It beats me why there isn't a simple mechanism to specify the encoding of the stream that you are about to open for reading/writing in STL. It feels very wrong that wofstream needs to be opened in binary mode and for all the double-back somersaults to do UTF-16-ish file I/O. The need to use implementation specific macros like _ADDFAC... humph.

Having said that, many thanks for finding a way, Dan :-)
0
 
LVL 30

Expert Comment

by:Axter
ID: 9752590
hmmm....

Did I miss something?

What was the answer?
0
 
LVL 17

Author Comment

by:rstaveley
ID: 9753379
> What was the answer?

The answer to the question was that woftream has two representations of the character. There is the internal representation, which we all know and love; this is wchar_t (16 bits on VC/Windoze and 32 bits on GCC/Linux). There is, however also an external representation - the representation that get written to or read from file - and that was what threw me.

DanRollins's codecvt posting last year pointed me towards the embarrassingly virginal pages of my IOStreams and Locales for Dummies reference, which told me that IOStreams use an external representation, which is typically compact and appropriate for your locale. My locale is en_GB, which means that wftstream uses an 8-bit representation for characters. That's why file << wstr only generated a 3 character file on my systems.

The rest of this communication is a reflection of the fact that it is hard for the likes of fools like me to apply a facet to my locale to get it to behave differently from the default implementation. The need to use non-portable macros like _ADDFAC in VC6/VC7, which means that the code isn't portable to GCC, is like reading "There be dragons" on an ancient map.

Simple_codecvt isn't portable. I haven't been able to test it, but it should surely need to have the following modification to be portable:

        virtual int do_max_length() const throw() {return sizeof(wchar_t);} /* Not necessarily 2 */
        virtual int do_encoding() const throw() {return sizeof(wchar_t);}

The reason why I haven't been able to test it is that I haven't found out the portable (or indeed GCC-specific) equivalent of _ADDFAC and therefore don't really know how to add this facet to the locale in GCC.

It would be nice to think that a codecvt class could be written to allow UTF-16 files to be read/written by wfstream http:/Q_20796791.html#9752719, but I can't see how the BOM (byte order marker) at the beginning of the stream would be elegantly handled by codecvt.

The question was answered, but there are quite a few questions that ought to follow up from here...  :-)
0
 
LVL 30

Expert Comment

by:Axter
ID: 9754477
Thanks for the clarification.
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now