yasimplicity
asked on
converting a unicode file to an ascii file..n vice versa?
How to read a unicode text file then convert it into ascii file
and vice versa..?
How to read a unicode file into an ascii string ;
or read an ascii file into a unicode string ;
and all using standard c++
thanks...
and vice versa..?
How to read a unicode file into an ascii string ;
or read an ascii file into a unicode string ;
and all using standard c++
thanks...
What have you tried so far?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
what kind of unicode text file do you have?
utf8, utf16 o utf32? Or just windows unicode (ucs2)?
utf8, utf16 o utf32? Or just windows unicode (ucs2)?
jkr,
Does that really work? I thought wfstreams expected characters to be narrow in their external representation and were only wchar_t when in RAM. If I write to a wofstream I see narrow characters on disk.
Here's some struggling I did with this a couple of years ago http:/Q_20797258.html
Does that really work? I thought wfstreams expected characters to be narrow in their external representation and were only wchar_t when in RAM. If I write to a wofstream I see narrow characters on disk.
Here's some struggling I did with this a couple of years ago http:/Q_20797258.html
yes, if u use wostream, the codecvt trait will narrow the wchar_t before writing... As long as you have have values below 255, everything is written to the file, as soon as there is a value above like arabic letters, the stream will stop an set the failbit.
you need to change the default codecvt trait with a trait that perform no conversion just as described in the article of you prev post.
But carefull with cross plattform apps. Most windows compilers are using a wchar_t as 16Bit type and for Linux normally 32Bit...
Or just read the document with a bytestream and perform a reinterpret_cast, or conversion to internal type. This way you could also read the multybyte formats like UTF8 and UTF16. Source for conversion you can get from unicode.org: ftp://www.unicode.org/Public/PROGRAMS/CVTUTF/
For full unicodesupport you can use the ICU lib from IBM...
you need to change the default codecvt trait with a trait that perform no conversion just as described in the article of you prev post.
But carefull with cross plattform apps. Most windows compilers are using a wchar_t as 16Bit type and for Linux normally 32Bit...
Or just read the document with a bytestream and perform a reinterpret_cast, or conversion to internal type. This way you could also read the multybyte formats like UTF8 and UTF16. Source for conversion you can get from unicode.org: ftp://www.unicode.org/Public/PROGRAMS/CVTUTF/
For full unicodesupport you can use the ICU lib from IBM...
The C++ standard library leaves this whole area as an exercise for the reader. Java provides fundamental differentiation beetween character and byte streams - see http://java.sun.com/docs/books/tutorial/i18n/text/stream.html.
ASKER
mr jkr:
your code does not convert any thing
unicode still unicode
ansi is still ansi
your code does not convert any thing
unicode still unicode
ansi is still ansi
yasimplicity, that's because wfstreams have an external (= on file) representation that uses narrow characters. I recommend that you look at my codecvt example at http:/Q_20797258.html#9749822 and read my conclusion at the bottom of that thread (that example isn't portable because it makes assumptions about sizeof(wchar_t). As confirmed by chip3d, you need to prevent it from converting the wide characters back to and from narrow ones when it is written/read to/from disk. That's where a "do nothing" codecvt trait is needed, which you need to imbue the stream with.
Beware that a codecvt trait does not handle the BOM (= byte order marker), and you'll need to write/read that from the UNICODE stream do make it a recognisable UNICODE text file.
I'm rusty in this area. It was a few years back that I had a struggle with it and wasn't entirely comfortable with my conclusion. You'll find that the codecvt is usable, however, as long as you bear in mind the need to apply/strip your own bye order marker.
Beware that a codecvt trait does not handle the BOM (= byte order marker), and you'll need to write/read that from the UNICODE stream do make it a recognisable UNICODE text file.
I'm rusty in this area. It was a few years back that I had a struggle with it and wasn't entirely comfortable with my conclusion. You'll find that the codecvt is usable, however, as long as you bear in mind the need to apply/strip your own bye order marker.
I've just dug up the following code snippet from a project in which I had to load a UNICODE UTF-16LE file. I'd figured out how to imbue streams properly with locales when I wrote this, which wasn't the case in the links I directed you to, which use a funky Microsoft-specific macro. This example also handles the BOM properly - though not portably.
You might find this useful, but beware that it isn't portable, because it makes assumptions that wchar_t is 16-bit (Windows) rather than 32-bit (Linux), because it was designed to work with the UTF-16LE files you get generated by MSXML.
What does it do? Nothing much! It reads a UTF-16LE file (unicode.xml) line by line and writes the content to a UTF-16LE file (unicode_2.xml) line by line, but with a few simple edits you could use it to convert to and from ANSI.
This compiles and works with MS VC 7.1 and compiles but won't work with GCC 3.2+ on Linux, because of the sizeof wchar_t assumption. If you can get this into shape for Linux, let me know. It would be nice to make this portable.
--------8<--------
#include <iostream>
#include <fstream>
#include <string>
#include <locale>
#include <list>
#include <algorithm>
#include <iterator>
namespace noconv {
class codecvt : public std::codecvt<wchar_t,char, mbstate_t> {
public:
explicit codecvt(size_t refs = 0) : std::codecvt<wchar_t,char, mbstate_t> (refs) {}
protected:
virtual result do_in(
state_type& state
,const extern_type *from_begin
,const extern_type *from_end
,const extern_type *&from_next
,intern_type *to_begin
,intern_type *to_end
,intern_type *&to_next
) const
{
return noconv;
}
/* Here's where we convert from the internal representation to the external
representation written to disk */
virtual result do_out(
state_type& state
,const intern_type *from_begin
,const intern_type *from_end
,const intern_type *&from_next
,extern_type *to_begin
,extern_type *to_end
,extern_type *&to_next
) const
{
const intern_type *src = from_begin;
const intern_type *src_end = from_end;
intern_type *dst = reinterpret_cast<intern_ty pe*>(to_be gin);
intern_type *dst_end = reinterpret_cast<intern_ty pe*>(to_en d);
while (dst+1 <= dst_end && src < src_end)
*dst++ = *src++;
from_next = src;
to_next = reinterpret_cast<extern_ty pe*>(dst);
return ok;
}
virtual result do_unshift(
state_type& state
,extern_type *to_begin
,extern_type *to_end
,extern_type *&to_next
) const
{
return noconv;
}
virtual int do_length(
state_type& state
, const extern_type *from_begin
,const extern_type *from_end
,size_t max_internal_chars
) const throw()
{
return std::min(max_internal_char s,size_t(f rom_end-fr om_begin)) ;
}
/* Never converts? Not true for us. */
virtual bool do_always_noconv() const throw()
{
return false;
}
/* Max extern_type for one intern_type */
virtual int do_max_length() const throw()
{
return sizeof(intern_type);
}
/* do_encoding returns one of the following:
-1, if the external representation of a character uses
a stateful encoding
a constant number representing the maximum width in externT
elements used to represent a character in a fixed-width encoding
0, if the external representation of the characters in the
character set uses a variable size encoding
*/
virtual int do_encoding() const throw()
{
return sizeof(intern_type);
}
};
} // noconv namespace
int main()
{
std::list<std::wstring> contents;
try {
std::locale loc(std::locale::classic() ,new noconv::codecvt);
std::wifstream fin;
fin.imbue(loc);
fin.open("unicode.xml");
if (!fin) {
std::cerr << "Error: Unable to open fin\n";
return 2;
}
std::wstring wstr;
wchar_t signature;
fin.read(&signature,1);
// Little-endian reading
if (signature != 0xfeff)
return std::cerr << "Error: File is not UTF-16 UNICODE\n",3;
bool shown = false;
while (getline(fin,wstr)) {
if (!shown) {
std::wcout << L'"' << wstr << L'"' << L'\n';
shown = true;
}
contents.push_back(wstr);
}
fin.close();
std::wofstream fout;
fout.imbue(loc);
fout.open("unicode_2.xml") ;
if (!fout) {
std::cerr << "Error: Unable to create fout\n";
return 2;
}
signature = 0xfeff;
fout.write(&signature,1);
//copy(contents.begin(),co ntents.end (),std::os tream_iter ator<std:: wstring,wc har_t>(fou t,L"\n"));
copy(contents.begin(),cont ents.end() ,std::ostr eam_iterat or<std::ws tring,wcha r_t>(fout) );
fout.close();
}
catch (std::exception e) {
std::cerr << "Exception: " << e.what() << std::endl;
}
}
--------8<--------
You might find this useful, but beware that it isn't portable, because it makes assumptions that wchar_t is 16-bit (Windows) rather than 32-bit (Linux), because it was designed to work with the UTF-16LE files you get generated by MSXML.
What does it do? Nothing much! It reads a UTF-16LE file (unicode.xml) line by line and writes the content to a UTF-16LE file (unicode_2.xml) line by line, but with a few simple edits you could use it to convert to and from ANSI.
This compiles and works with MS VC 7.1 and compiles but won't work with GCC 3.2+ on Linux, because of the sizeof wchar_t assumption. If you can get this into shape for Linux, let me know. It would be nice to make this portable.
--------8<--------
#include <iostream>
#include <fstream>
#include <string>
#include <locale>
#include <list>
#include <algorithm>
#include <iterator>
namespace noconv {
class codecvt : public std::codecvt<wchar_t,char,
public:
explicit codecvt(size_t refs = 0) : std::codecvt<wchar_t,char,
protected:
virtual result do_in(
state_type& state
,const extern_type *from_begin
,const extern_type *from_end
,const extern_type *&from_next
,intern_type *to_begin
,intern_type *to_end
,intern_type *&to_next
) const
{
return noconv;
}
/* Here's where we convert from the internal representation to the external
representation written to disk */
virtual result do_out(
state_type& state
,const intern_type *from_begin
,const intern_type *from_end
,const intern_type *&from_next
,extern_type *to_begin
,extern_type *to_end
,extern_type *&to_next
) const
{
const intern_type *src = from_begin;
const intern_type *src_end = from_end;
intern_type *dst = reinterpret_cast<intern_ty
intern_type *dst_end = reinterpret_cast<intern_ty
while (dst+1 <= dst_end && src < src_end)
*dst++ = *src++;
from_next = src;
to_next = reinterpret_cast<extern_ty
return ok;
}
virtual result do_unshift(
state_type& state
,extern_type *to_begin
,extern_type *to_end
,extern_type *&to_next
) const
{
return noconv;
}
virtual int do_length(
state_type& state
, const extern_type *from_begin
,const extern_type *from_end
,size_t max_internal_chars
) const throw()
{
return std::min(max_internal_char
}
/* Never converts? Not true for us. */
virtual bool do_always_noconv() const throw()
{
return false;
}
/* Max extern_type for one intern_type */
virtual int do_max_length() const throw()
{
return sizeof(intern_type);
}
/* do_encoding returns one of the following:
-1, if the external representation of a character uses
a stateful encoding
a constant number representing the maximum width in externT
elements used to represent a character in a fixed-width encoding
0, if the external representation of the characters in the
character set uses a variable size encoding
*/
virtual int do_encoding() const throw()
{
return sizeof(intern_type);
}
};
} // noconv namespace
int main()
{
std::list<std::wstring> contents;
try {
std::locale loc(std::locale::classic()
std::wifstream fin;
fin.imbue(loc);
fin.open("unicode.xml");
if (!fin) {
std::cerr << "Error: Unable to open fin\n";
return 2;
}
std::wstring wstr;
wchar_t signature;
fin.read(&signature,1);
// Little-endian reading
if (signature != 0xfeff)
return std::cerr << "Error: File is not UTF-16 UNICODE\n",3;
bool shown = false;
while (getline(fin,wstr)) {
if (!shown) {
std::wcout << L'"' << wstr << L'"' << L'\n';
shown = true;
}
contents.push_back(wstr);
}
fin.close();
std::wofstream fout;
fout.imbue(loc);
fout.open("unicode_2.xml")
if (!fout) {
std::cerr << "Error: Unable to create fout\n";
return 2;
}
signature = 0xfeff;
fout.write(&signature,1);
//copy(contents.begin(),co
copy(contents.begin(),cont
fout.close();
}
catch (std::exception e) {
std::cerr << "Exception: " << e.what() << std::endl;
}
}
--------8<--------
ASKER
doesn't ascii have a BOM??
No. Try this:
--------8<--------
#include <fstream>
int main()
{
std::ofstream fout("hello.txt");
fout << "Hello";
fout.close();
system("dir hello.txt");
system("type hello.txt");
}
--------8<--------
Your text file has 5 bytes in it, wit each byte corresponding to a character in the string "Hello".
Now try this:
--------8<--------
#include <iostream>
#include <fstream>
#include <string>
#include <locale>
namespace noconv {
class codecvt : public std::codecvt<wchar_t,char, mbstate_t> {
public:
explicit codecvt(size_t refs = 0) : std::codecvt<wchar_t,char, mbstate_t> (refs) {}
protected:
virtual result do_in(
state_type& state
,const extern_type *from_begin
,const extern_type *from_end
,const extern_type *&from_next
,intern_type *to_begin
,intern_type *to_end
,intern_type *&to_next
) const
{
return noconv;
}
/* Here's where we convert from the internal representation to the external
representation written to disk */
virtual result do_out(
state_type& state
,const intern_type *from_begin
,const intern_type *from_end
,const intern_type *&from_next
,extern_type *to_begin
,extern_type *to_end
,extern_type *&to_next
) const
{
const intern_type *src = from_begin;
const intern_type *src_end = from_end;
intern_type *dst = reinterpret_cast<intern_ty pe*>(to_be gin);
intern_type *dst_end = reinterpret_cast<intern_ty pe*>(to_en d);
while (dst+1 <= dst_end && src < src_end)
*dst++ = *src++;
from_next = src;
to_next = reinterpret_cast<extern_ty pe*>(dst);
return ok;
}
virtual result do_unshift(
state_type& state
,extern_type *to_begin
,extern_type *to_end
,extern_type *&to_next
) const
{
return noconv;
}
virtual int do_length(
state_type& state
, const extern_type *from_begin
,const extern_type *from_end
,size_t max_internal_chars
) const throw()
{
return std::min(max_internal_char s,size_t(f rom_end-fr om_begin)) ;
}
/* Never converts? Not true for us. */
virtual bool do_always_noconv() const throw()
{
return false;
}
/* Max extern_type for one intern_type */
virtual int do_max_length() const throw()
{
return sizeof(intern_type);
}
/* do_encoding returns one of the following:
-1, if the external representation of a character uses
a stateful encoding
a constant number representing the maximum width in externT
elements used to represent a character in a fixed-width encoding
0, if the external representation of the characters in the
character set uses a variable size encoding
*/
virtual int do_encoding() const throw()
{
return sizeof(intern_type);
}
};
} // noconv namespace
int main()
{
try {
std::locale loc(std::locale::classic() ,new noconv::codecvt);
std::wofstream fout;
fout.imbue(loc);
fout.open("hello2.txt");
if (!fout) {
std::cerr << "Error: Unable to create fout\n";
return 2;
}
wchar_t signature = 0xfeff;
fout.write(&signature,1);
fout << L"Hello";
fout.close();
system("dir hello2.txt");
system("type hello2.txt");
}
catch (std::exception e) {
std::cerr << "Exception: " << e.what() << std::endl;
}
}
--------8<--------
Your unicode text file has its 16-bit BOM in it, indicating that it is a 16 bit little endian file. Allong with the 5 x 16-bits for L"Hello" you have a file size of 12 bytes.
BOMs are an uncomfortable thing. Here's a good look-up for them: http://www.i18nguy.com/unicode/c-unicode.html#BOM
--------8<--------
#include <fstream>
int main()
{
std::ofstream fout("hello.txt");
fout << "Hello";
fout.close();
system("dir hello.txt");
system("type hello.txt");
}
--------8<--------
Your text file has 5 bytes in it, wit each byte corresponding to a character in the string "Hello".
Now try this:
--------8<--------
#include <iostream>
#include <fstream>
#include <string>
#include <locale>
namespace noconv {
class codecvt : public std::codecvt<wchar_t,char,
public:
explicit codecvt(size_t refs = 0) : std::codecvt<wchar_t,char,
protected:
virtual result do_in(
state_type& state
,const extern_type *from_begin
,const extern_type *from_end
,const extern_type *&from_next
,intern_type *to_begin
,intern_type *to_end
,intern_type *&to_next
) const
{
return noconv;
}
/* Here's where we convert from the internal representation to the external
representation written to disk */
virtual result do_out(
state_type& state
,const intern_type *from_begin
,const intern_type *from_end
,const intern_type *&from_next
,extern_type *to_begin
,extern_type *to_end
,extern_type *&to_next
) const
{
const intern_type *src = from_begin;
const intern_type *src_end = from_end;
intern_type *dst = reinterpret_cast<intern_ty
intern_type *dst_end = reinterpret_cast<intern_ty
while (dst+1 <= dst_end && src < src_end)
*dst++ = *src++;
from_next = src;
to_next = reinterpret_cast<extern_ty
return ok;
}
virtual result do_unshift(
state_type& state
,extern_type *to_begin
,extern_type *to_end
,extern_type *&to_next
) const
{
return noconv;
}
virtual int do_length(
state_type& state
, const extern_type *from_begin
,const extern_type *from_end
,size_t max_internal_chars
) const throw()
{
return std::min(max_internal_char
}
/* Never converts? Not true for us. */
virtual bool do_always_noconv() const throw()
{
return false;
}
/* Max extern_type for one intern_type */
virtual int do_max_length() const throw()
{
return sizeof(intern_type);
}
/* do_encoding returns one of the following:
-1, if the external representation of a character uses
a stateful encoding
a constant number representing the maximum width in externT
elements used to represent a character in a fixed-width encoding
0, if the external representation of the characters in the
character set uses a variable size encoding
*/
virtual int do_encoding() const throw()
{
return sizeof(intern_type);
}
};
} // noconv namespace
int main()
{
try {
std::locale loc(std::locale::classic()
std::wofstream fout;
fout.imbue(loc);
fout.open("hello2.txt");
if (!fout) {
std::cerr << "Error: Unable to create fout\n";
return 2;
}
wchar_t signature = 0xfeff;
fout.write(&signature,1);
fout << L"Hello";
fout.close();
system("dir hello2.txt");
system("type hello2.txt");
}
catch (std::exception e) {
std::cerr << "Exception: " << e.what() << std::endl;
}
}
--------8<--------
Your unicode text file has its 16-bit BOM in it, indicating that it is a 16 bit little endian file. Allong with the 5 x 16-bits for L"Hello" you have a file size of 12 bytes.
BOMs are an uncomfortable thing. Here's a good look-up for them: http://www.i18nguy.com/unicode/c-unicode.html#BOM
It would be nice if content type and encoding could both be provided by the directory system (cf. MIME). Having BOMs to handle encoding partially and file extensions to cover content type partially is a real mess, isn't it?
ASKER
copy(
contents.begin(),
contents.end(),
std::ostream_iterator<std: :wstring,w char_t>(fo ut)
);
how to copy it as ascii not wide unicode?
contents.begin(),
contents.end(),
std::ostream_iterator<std:
);
how to copy it as ascii not wide unicode?
Converting from wchar_t to char the "standard way" is pretty ugly. You need to use ctype's narrow, which means using a facet from the locale. You would have thought that you'd be able to use iterators with it and convert directly from an istreambuf_iterator to an ostreambuf_iterator, but the standard just has it working with character pointers. In the code below, I load a wchar_t vector and use narrow to convert the wide characters to narrow characters.
Now that I'm writing this explanation I ask myself why I didn't simply copy from an istream_iterator imbued with our no-conversion locale and write to an ostream_iterator imbued with the classic locale. That would definitely be a lot simpler than the following code, but having gone to the effort of putting together the following illustration, I can't bring myself to delete it :-)
Here it is for what it's worth...
--------8<--------
#include <iostream>
#include <fstream>
#include <string>
#include <locale>
#include <iterator>
#include <algorithm>
#include <vector>
#include <cctype>
namespace noconv {
class codecvt : public std::codecvt<wchar_t,char, mbstate_t> {
public:
explicit codecvt(size_t refs = 0) : std::codecvt<wchar_t,char, mbstate_t> (refs) {}
protected:
virtual result do_in(
state_type& state
,const extern_type *from_begin
,const extern_type *from_end
,const extern_type *&from_next
,intern_type *to_begin
,intern_type *to_end
,intern_type *&to_next
) const
{
return noconv;
}
/* Here's where we convert from the internal representation to the external
representation written to disk */
virtual result do_out(
state_type& state
,const intern_type *from_begin
,const intern_type *from_end
,const intern_type *&from_next
,extern_type *to_begin
,extern_type *to_end
,extern_type *&to_next
) const
{
const intern_type *src = from_begin;
const intern_type *src_end = from_end;
intern_type *dst = reinterpret_cast<intern_ty pe*>(to_be gin);
intern_type *dst_end = reinterpret_cast<intern_ty pe*>(to_en d);
while (dst+1 <= dst_end && src < src_end)
*dst++ = *src++;
from_next = src;
to_next = reinterpret_cast<extern_ty pe*>(dst);
return ok;
}
virtual result do_unshift(
state_type& state
,extern_type *to_begin
,extern_type *to_end
,extern_type *&to_next
) const
{
return noconv;
}
virtual int do_length(
state_type& state
, const extern_type *from_begin
,const extern_type *from_end
,size_t max_internal_chars
) const throw()
{
return std::min(max_internal_char s,size_t(f rom_end-fr om_begin)) ;
}
/* Never converts? Not true for us. */
virtual bool do_always_noconv() const throw()
{
return false;
}
/* Max extern_type for one intern_type */
virtual int do_max_length() const throw()
{
return sizeof(intern_type);
}
/* do_encoding returns one of the following:
-1, if the external representation of a character uses
a stateful encoding
a constant number representing the maximum width in externT
elements used to represent a character in a fixed-width encoding
0, if the external representation of the characters in the
character set uses a variable size encoding
*/
virtual int do_encoding() const throw()
{
return sizeof(intern_type);
}
};
} // noconv namespace
int main()
{
try {
std::locale loc(std::locale::classic() ,new noconv::codecvt);
{
std::wofstream wfout; // Create a 16LE unicode file
wfout.imbue(loc);
wfout.open("wide.txt");
if (!wfout)
return std::cerr << "Error: Unable to create wfout\n",2;
wchar_t signature = 0xfeff;
wfout.write(&signature,1);
wfout << L"Hello";
}
{
std::wifstream wfin; // Open the 16LE unicode file
wfin.imbue(loc);
wfin.open("wide.txt");
if (!wfin)
return std::cerr << "Error: Unable to open wfin\n",2;
wchar_t signature;
wfin.read(&signature,1);
// Little-endian reading
if (signature != 0xfeff)
return std::cerr << "Error: File is not UTF-16LE UNICODE\n",3;
typedef std::istreambuf_iterator<w char_t> IItr;
std::vector<wchar_t> wcontent(IItr(wfin),(IItr( )));
const int contentLength = wcontent.size();
std::vector<char> ncontent(contentLength);
bool success = (std::use_facet<std::ctype <wchar_t> >(loc).narrow
(&wcontent[0],&wcontent[co ntentLengt h],'?',&nc ontent[0]) != 0);
if (!success)
return std::cerr << "Error: Narrow failed\n",4;
std::ofstream nfout; // Create a narrow character (ANSI) file
nfout.open("narrow.txt");
if (!nfout)
return std::cerr << "Error: Unable to create nfout\n",5;
typedef std::ostreambuf_iterator<c har> OItr;
copy(ncontent.begin(),ncon tent.end() ,OItr(nfou t));
}
}
catch (std::exception e) {
std::cerr << "Exception: " << e.what() << std::endl;
}
}
--------8<--------
Now that I'm writing this explanation I ask myself why I didn't simply copy from an istream_iterator imbued with our no-conversion locale and write to an ostream_iterator imbued with the classic locale. That would definitely be a lot simpler than the following code, but having gone to the effort of putting together the following illustration, I can't bring myself to delete it :-)
Here it is for what it's worth...
--------8<--------
#include <iostream>
#include <fstream>
#include <string>
#include <locale>
#include <iterator>
#include <algorithm>
#include <vector>
#include <cctype>
namespace noconv {
class codecvt : public std::codecvt<wchar_t,char,
public:
explicit codecvt(size_t refs = 0) : std::codecvt<wchar_t,char,
protected:
virtual result do_in(
state_type& state
,const extern_type *from_begin
,const extern_type *from_end
,const extern_type *&from_next
,intern_type *to_begin
,intern_type *to_end
,intern_type *&to_next
) const
{
return noconv;
}
/* Here's where we convert from the internal representation to the external
representation written to disk */
virtual result do_out(
state_type& state
,const intern_type *from_begin
,const intern_type *from_end
,const intern_type *&from_next
,extern_type *to_begin
,extern_type *to_end
,extern_type *&to_next
) const
{
const intern_type *src = from_begin;
const intern_type *src_end = from_end;
intern_type *dst = reinterpret_cast<intern_ty
intern_type *dst_end = reinterpret_cast<intern_ty
while (dst+1 <= dst_end && src < src_end)
*dst++ = *src++;
from_next = src;
to_next = reinterpret_cast<extern_ty
return ok;
}
virtual result do_unshift(
state_type& state
,extern_type *to_begin
,extern_type *to_end
,extern_type *&to_next
) const
{
return noconv;
}
virtual int do_length(
state_type& state
, const extern_type *from_begin
,const extern_type *from_end
,size_t max_internal_chars
) const throw()
{
return std::min(max_internal_char
}
/* Never converts? Not true for us. */
virtual bool do_always_noconv() const throw()
{
return false;
}
/* Max extern_type for one intern_type */
virtual int do_max_length() const throw()
{
return sizeof(intern_type);
}
/* do_encoding returns one of the following:
-1, if the external representation of a character uses
a stateful encoding
a constant number representing the maximum width in externT
elements used to represent a character in a fixed-width encoding
0, if the external representation of the characters in the
character set uses a variable size encoding
*/
virtual int do_encoding() const throw()
{
return sizeof(intern_type);
}
};
} // noconv namespace
int main()
{
try {
std::locale loc(std::locale::classic()
{
std::wofstream wfout; // Create a 16LE unicode file
wfout.imbue(loc);
wfout.open("wide.txt");
if (!wfout)
return std::cerr << "Error: Unable to create wfout\n",2;
wchar_t signature = 0xfeff;
wfout.write(&signature,1);
wfout << L"Hello";
}
{
std::wifstream wfin; // Open the 16LE unicode file
wfin.imbue(loc);
wfin.open("wide.txt");
if (!wfin)
return std::cerr << "Error: Unable to open wfin\n",2;
wchar_t signature;
wfin.read(&signature,1);
// Little-endian reading
if (signature != 0xfeff)
return std::cerr << "Error: File is not UTF-16LE UNICODE\n",3;
typedef std::istreambuf_iterator<w
std::vector<wchar_t> wcontent(IItr(wfin),(IItr(
const int contentLength = wcontent.size();
std::vector<char> ncontent(contentLength);
bool success = (std::use_facet<std::ctype
(&wcontent[0],&wcontent[co
if (!success)
return std::cerr << "Error: Narrow failed\n",4;
std::ofstream nfout; // Create a narrow character (ANSI) file
nfout.open("narrow.txt");
if (!nfout)
return std::cerr << "Error: Unable to create nfout\n",5;
typedef std::ostreambuf_iterator<c
copy(ncontent.begin(),ncon
}
}
catch (std::exception e) {
std::cerr << "Exception: " << e.what() << std::endl;
}
}
--------8<--------
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Oh yes... beware that there is no attempt to handle non ASCII characters in that! If it was seriously being used to convert between UTF-8 and UTF-16 as it suggests, it ought to deal with multi-byte UTF-8 characters. It would be safer to use this only to convert between ISO-8859-1 (Latin1) and UTF-16. You need to work with jkr's suggested wcstombs/mbstowcs functions to work with multi-bytes.
All said and done, if you *really* need to convert between XML file formats in Windows, look no further than Microsoft's XSLT support. Scroll to the bottom of http://msdn.microsoft.com/XML/XMLDownloads/default.aspx and follow the link to the Command Line Transformation Utility (msxsl.exe), which comes with source code.
All said and done, if you *really* need to convert between XML file formats in Windows, look no further than Microsoft's XSLT support. Scroll to the bottom of http://msdn.microsoft.com/XML/XMLDownloads/default.aspx and follow the link to the Command Line Transformation Utility (msxsl.exe), which comes with source code.
ASKER
that is it
but there is some errors runtime related to out of the
" wcontent " band when executing
&wcontent[contentLength]
I'll deal with it myself
Anyhow thanks a lot Mr <rstaveley>
but there is some errors runtime related to out of the
" wcontent " band when executing
&wcontent[contentLength]
I'll deal with it myself
Anyhow thanks a lot Mr <rstaveley>