• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 840
  • Last Modified:

C++ iostream UTF-16 file i/o with CRLF translation

I want to read and write utf-16 files which use CR LF line separators (L"\r\n"). Using C++ (Microsoft Visual Studio 2010) iostreams. I want every L"\n" written to the stream to be translated to L"\r\n" transparently. Using the codecvt_utf16 locale facet requires to open the fstream in ios::binary mode, losing the usual text mode \n to \r\n translation.

std::wofstream wofs;
wofs.open("try_utf16.txt", std::ios::binary);
wofs.imbue(
    std::locale(
        wofs.getloc(),
        new std::codecvt_utf16<wchar_t, 0x10ffff, std::generate_header>));
wofs << L"Hi!\n"; // i want a '\r' to be inserted before the '\n' in the output file
wofs.close();

Open in new window


I want a solution without needing extra libraries like BOOST.
0
j66st
Asked:
j66st
  • 10
  • 7
1 Solution
 
sarabandeCommented:
is there a reason why you didn't add a '\r' before the '\n'?

00000000 FE FF 00 48 00 69 00 21 00 0D 00 0A             ...H.i.!....

Open in new window

looks pretty well when I do so.

Sara
0
 
j66stAuthor Commented:
Of course, but i have a big program with a lot of string resources and code that writes to a generic ostream with only '\n' separators. So I want the '\r\n' to '\n' translation to happen behind the ostream interface.
0
 
j66stAuthor Commented:
Hint: I would think it should be possible by a customized basic_filebuf, overloading the overflow or underflow method. Or else by a custom codecvt class.

I am not familiar with this stuff, so I hope to find an expert who did this before, or who can point me to sample code.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
j66stAuthor Commented:
OK, increasing points, if that may help. I thought it would be an issue that many have come across.
0
 
sarabandeCommented:
why not simply use a static member of some suitable class of type const std::wstring  which holds the crlf pair?

// global.h
...
class Global
{
public:
    static const std::wstring  lf;
    ....

// global.cpp
...
const std::wstring  Global::lf(L"\r\n");

Open in new window


you could use the wendl wherever a linefeed was needed, like

wofs << L"Hi!" << Global::lf; 

Open in new window


overloading of filebuf or codecvt class is a difficult thing where I don't have a really good idea how you could replace wide linefeed character by a wide carriage-return/linefeed string without negative side effects. you should see that those classes are internal classes which don't have information of the class objects they were used from.

it also makes less sense to derive from wofstream and have new template functions for operator<< regarding wide text input from wchar_t * or wstring because operator<< functions have wostream& as writeable argument and not wofstream& (or your derived class). because of that you would need to provide overloads of operator<< for all possible operands, and not only for the wide text types. otherwise code like

mywofstream mwofs;
mwofs << 123<< L"abc\n" << "a " ;

Open in new window

would not use  your template function

wostream& operator<<(mywofstream& os, const wchar_t * & wsz) 

Open in new window

because the above statement resolves to

operator<<(operator<<(operator<<(mwofs,  "a "), "L"abc\n"), 123);

Open in new window

and the most inner call returns a wostream& and not a mywofstream&.

however, you could provide an overload of

std::wostream& operator<<(std::wostream& os, const std::wstring & ws)
{
      std::wstring s;
      size_t pos1, pos2 = 0;
      while ((pos1 = ws.find(L'\n', pos2)) != std::wstring::npos)
      {
            s += ws.substr(pos2, pos1-pos2);
            s += L"\r\n";
            pos2 = pos1+1;
      }
      s += ws.substr(pos2);

      os.write(s.c_str(), s.length()*sizeof(wchar_t));
      return os;   
}

Open in new window

which should do the job for wstring (and similar for const wchar_t*) however, you should consider that the overload would be used for any call of operator<< the operands fit, not only for your file operations. if that is an issue you would need to derive from wofstream nevertheless and use that mywofstream class instead of wofstream. then the operator<< overloads could check the given stream by means of a dynamic_cast whether it "is a" mywofstream and does the replacements only for that.

Sara
0
 
j66stAuthor Commented:
Defining a simple constant for the "\r\n" string is not a solution, as I said before. The \n characters are embedded in existing string resources and functions generating dynamic output that I cannot change. Overloading the << operator is not sufficient, it would not cover output via the putc and write methods.

Overloading of codecvt looks complicated.

To me overloading of basic_filebuf seems the way to go. I don't consider this an "internal class"; it is properly documented, and made to be overloaded. What would be the negative side effects you are talking about? Perhaps the unget/putback methods need some extra attention?

This will also need a custom "mywofstream" because there is no basic_ofstream constructor accepting an alternate streambuf. So the mywofstream constructor will then call the basic_ostream constructor with "mywfilebuf".
0
 
sarabandeCommented:
This will also need a custom "mywofstream" because there is no basic_ofstream constructor accepting an alternate streambuf.

you found the answer why an overload of basic_filebuf is not the way to go yourself. with "internal class" I didn't mean that it is not accessible but that the creation and usage of the class is outside of your influence. moreover, basic_filebuf has some specials, for example that it always uses type char for template argument regardless of the type the stream class is associated to. see comment in the remarks section of msdn to basic_filebuf:

Objects of type basic_filebuf are created with an internal buffer of type char * regardless of the char_type specified by the type parameter Elem. This means that a Unicode string (containing wchar_t characters) will be converted to an ANSI string (containing char characters) before it is written to the internal buffer.

basic_filebuf has some constructors and operator members which all are not virtual and rarely can be replaced all by non-virtual calls in your overloads. the only virtual function suitable for your purpose is the setbuf member function but it is not likely that setbuf is the only function that assigns a text buffer to a  stream.

in my opinion, you should try to overload operator<<(std::ostream &, std::wstring) and operator<<(std::ostream&, const wchar_t*) as described before what should work with little efforts.

Sara
0
 
j66stAuthor Commented:
in my opinion, you should try to overload operator<<(std::ostream &, std::wstring) and operator<<(std::ostream&, const wchar_t*) as described
I don't think I was clear enough. I want a myofstream class that can be passed to any existing function like
void Greet(std::wostream& wos) 
{ wos << L"Hi!\n"; wos.putc(L'\n'); wos.write(L"Bye!\n", 5); }

Open in new window

The function does not know about '\r' conventions nor about any myofstream, so it won't call any custom global operator<<(myofstream&, ...) function. This can only be solved within the myofstream class.

it always uses type char for template argument regardless of the type the stream class is associated to. see comment in the remarks section of msdn to basic_filebuf
An example on that same msdn page shows that this can be solved by a pubsetbuf method call. This could be done in the myofstream constructor.

Thanks for your insights so far, maybe you are right, but I'm not convinced yet that it is overly complicated to make a std::ostream-derived myofstream with a myfilebuf. I'm too busy right now with more urgent matters to give it a try.
0
 
sarabandeCommented:
my suggested solution was to provide overloads of operator<< (wostream &, wstring) and  operator<< (wostream &, wchar_t *). it is an overload of the stl provided operators. it would work for all stream operations done by using <<. the only precondition is that the overloaded functions are available for the compiler.

it doesn't work on write function, though.

the myofstream is optional, in case you want to be able to do the operation only for file operations where a myofstream was involved.

Sara
0
 
j66stAuthor Commented:
I think it boils down to using a custom stream buffer, attaching a custom wide character buffer by calling pubsetbuf and overloading the virtual overflow method, let it insert a '\r' before every '\n' that comes in. Like this:
	virtual int_type overflow(int_type ch = traits_type::eof())
	{
		if (ch == '\n') {
			int_type iRet = BASE::overflow('\r');
			if (iRet != traits_type::not_eof('\r')) return iRet;
		}
		return BASE::overflow(ch);
	}

Open in new window

The wcrlfofstream derives from std::wostream and its constructor initializes the base class with a new wcrlf_filebuf object.
The stream can then be used by any functions that can write to a generic std::wostream.
 
I did a small test and it seems to work fine. Am I overlooking any issues?
0
 
j66stAuthor Commented:
Here is the complete code:
#include <iostream>
#include <fstream>

class wcrlf_filebuf : public std::basic_filebuf<wchar_t>
{
	typedef std::basic_filebuf<wchar_t> BASE;
	wchar_t awch[128];
	bool bBomWritten;
public:
	wcrlf_filebuf() 
		: bBomWritten(false)
	{ memset(awch, 0, sizeof awch); }

	wcrlf_filebuf(const wchar_t *wszFilespec, std::ios_base::open_mode _Mode = std::ios_base::out) 
		: bBomWritten(false)
	{
		memset(awch, 0, sizeof awch);
		BASE::open(wszFilespec, _Mode | std::ios_base::binary);
		pubsetbuf(awch, _countof(awch));
	}

	wcrlf_filebuf *open(const wchar_t *wszFilespec, std::ios_base::open_mode _Mode = std::ios_base::out)
	{	
		BASE::open(wszFilespec, _Mode | std::ios_base::binary);
		pubsetbuf(awch, _countof(awch));
		return this;
	}

	virtual int_type overflow(int_type ch = traits_type::eof())
	{
		if (!bBomWritten) {
			bBomWritten = true;
			int_type iRet = BASE::overflow(0xfeff);
			if (iRet != traits_type::not_eof(0xfeff)) return iRet;
		}
		if (ch == '\n') {
			int_type iRet = BASE::overflow('\r');
			if (iRet != traits_type::not_eof('\r')) return iRet;
		}
		return BASE::overflow(ch);
	}
};

class wcrlfofstream : public std::wostream
{
	typedef std::wostream BASE;
public:
	wcrlfofstream(const wchar_t *wszFilespec, std::ios_base::open_mode _Mode = std::ios_base::out) : std::wostream(new wcrlf_filebuf(wszFilespec, _Mode))
	{}

	wcrlf_filebuf* rdbuf()
	{
		return dynamic_cast<wcrlf_filebuf*>(std::wostream::rdbuf());
	}

	void close()
	{
		rdbuf()->close();
	}
};

Open in new window

Comments are welcome.
0
 
sarabandeCommented:
you should try whether statements like

wcrlfofstream wstrm(L"xxxx.txt", std::ios::binary | std::ios::out);
wstrm << L"xyz\n" << 12345 << L"abc\n";

Open in new window


do work.

in the <ios> and <ostream> headers file there are no virtual functions beside of the destructors. i would have assumed the rdbuf and close functions you provided were not called. but maybe i didn't have elaborated your approach deeply enough.

Sara
0
 
j66stAuthor Commented:
Yes, I tried your example, it works as expected.
std::basic_filebuf is defined in file <fstream>, std::basic_streambuf in <streambuf> and they contain many virtual methods.
My solution works because the overflow method is declared virtual in the basic_streambuf template class. Every character which is leaving the buffer is passed through this method.

Alternatively, we could filter characters entering the buffer by overloading the virtual xsputn method. But that solution would be incomplete because the single character basic_ostream::put method is not virtual and will bypass xsputn.
0
 
sarabandeCommented:
there are a lot of comments to the question which are worth to be kept in the knowledgebase. you should orderly close the question instead and accept your own comment as the solution (0 points).

Sara
0
 
j66stAuthor Commented:
That's exactly what I did: I clicked on my own comment #a39679318 to accept it as a solution, and added a comment to explain. Status is now "Close request pending", I understand that others get a few days to object. But I am not familiar with how these official procedures work on EE so please tell me if I did wrong.
0
 
sarabandeCommented:
no, you did right. sorry, I misread the closing comment as a request for deletion.

Moderator, please delete my objection.

Sara
0
 
j66stAuthor Commented:
I think I clearly explained my goal: providing LF to CR LF expansion in wide character mode behind the basic_ostream interface. sarabande first seemed not to understand my goal, then thought it was not possible, and gave some advice for a workaround. (Sara, thanks for your time, though).
I finally decided to dig into the library sourcecode and I think I found a properly working solution myself now.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 10
  • 7
Tackle projects and never again get stuck behind a technical roadblock.
Join Now