URGENT: String Manipulation: Getting Rid of invalid characters for XML output

Posted on 2003-12-03
Last Modified: 2013-12-03
I am trying to output an xml file that captures all of the subjects of e-mails.  However, in one of my e-mail subjects, I have the square character (the character that is displayed for an invalid character), and I want to replace all such characters with white space when outputting them to an XML file.  How can I accomplish this?  What characters are invalid for xml files?  the subjects are stored as char*
Question by:jjacksn

Author Comment

ID: 9872353
I don't mind opening the stream after it has been written, rewritting the stream, and then removing all of the bad characters either.
LVL 23

Expert Comment

by:Roshan Davis
ID: 9872522
Get all node texts and replace 0x0A (carriage return char), 0x0D (enter char), 0x3F (question mark char (?)) characters
LVL 49

Assisted Solution

DanRollins earned 166 total points
ID: 9873116
>> square character (the character that is displayed for an invalid character),
That character just means that there is no glyph for the character code in the current font.  What you consider invalid may be perfectly valid if you display in a different font.

It is fairly easy to write code that reads an XML file and discards (or replaces with space) all characters that do not have glyphs in the a standard font, such as Arial or Courier.  Before generating the XML file, just trot each string through a function such as:

     Cleanup( char* p )  {
            while( *p != '\0') {
                  if ( (*p < ' ') || (*p > '~') ) {  // others if you want...
                       *p= ' '

-- Dan

Assisted Solution

rendaduiyan earned 166 total points
ID: 9873553
try STL replace,
//suppose char* pszSubject is declared
vector<char> vSubject(pszSubject, pszSubject + strlen(pszSubject));
replace(vSubject.begin(), vSubject.end(), '[', ' ') ;
replace(vSubject.begin(), vSubject.end(), ']', ' ') ;

Accepted Solution

xssass earned 168 total points
ID: 9956837

You will get a square for non-printable characters. So you just have to filter them out... Theire ascii codes are below 32 and above 126. So the next code should work:

     Cleanup( char* p )  {
            while( *p != '\0') {
                  if ( (int(p) <= 31) || int(p) >= 127) {
                       *p= ' '


Featured Post

Active Directory Webinar

We all know we need to protect and secure our privileges, but where to start? Join Experts Exchange and ManageEngine on Tuesday, April 11, 2017 10:00 AM PDT to learn how to track and secure privileged users in Active Directory.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Templates For Beginners Or How To Encourage The Compiler To Work For You Introduction This tutorial is targeted at the reader who is, perhaps, familiar with the basics of C++ but would prefer a little slower introduction to the more ad…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question