URGENT: String Manipulation: Getting Rid of invalid characters for XML output

Posted on 2003-12-03
Last Modified: 2013-12-03
I am trying to output an xml file that captures all of the subjects of e-mails.  However, in one of my e-mail subjects, I have the square character (the character that is displayed for an invalid character), and I want to replace all such characters with white space when outputting them to an XML file.  How can I accomplish this?  What characters are invalid for xml files?  the subjects are stored as char*
Question by:jjacksn
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Author Comment

ID: 9872353
I don't mind opening the stream after it has been written, rewritting the stream, and then removing all of the bad characters either.
LVL 23

Expert Comment

by:Roshan Davis
ID: 9872522
Get all node texts and replace 0x0A (carriage return char), 0x0D (enter char), 0x3F (question mark char (?)) characters
LVL 49

Assisted Solution

DanRollins earned 166 total points
ID: 9873116
>> square character (the character that is displayed for an invalid character),
That character just means that there is no glyph for the character code in the current font.  What you consider invalid may be perfectly valid if you display in a different font.

It is fairly easy to write code that reads an XML file and discards (or replaces with space) all characters that do not have glyphs in the a standard font, such as Arial or Courier.  Before generating the XML file, just trot each string through a function such as:

     Cleanup( char* p )  {
            while( *p != '\0') {
                  if ( (*p < ' ') || (*p > '~') ) {  // others if you want...
                       *p= ' '

-- Dan

Assisted Solution

rendaduiyan earned 166 total points
ID: 9873553
try STL replace,
//suppose char* pszSubject is declared
vector<char> vSubject(pszSubject, pszSubject + strlen(pszSubject));
replace(vSubject.begin(), vSubject.end(), '[', ' ') ;
replace(vSubject.begin(), vSubject.end(), ']', ' ') ;

Accepted Solution

xssass earned 168 total points
ID: 9956837

You will get a square for non-printable characters. So you just have to filter them out... Theire ascii codes are below 32 and above 126. So the next code should work:

     Cleanup( char* p )  {
            while( *p != '\0') {
                  if ( (int(p) <= 31) || int(p) >= 127) {
                       *p= ' '


Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
Many modern programming languages support the concept of a property -- a class member that combines characteristics of both a data member and a method.  These are sometimes called "smart fields" because you can add logic that is applied automaticall…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question