Solved

URGENT: String Manipulation: Getting Rid of invalid characters for XML output

Posted on 2003-12-03
5
584 Views
Last Modified: 2013-12-03
I am trying to output an xml file that captures all of the subjects of e-mails.  However, in one of my e-mail subjects, I have the square character (the character that is displayed for an invalid character), and I want to replace all such characters with white space when outputting them to an XML file.  How can I accomplish this?  What characters are invalid for xml files?  the subjects are stored as char*
0
Comment
Question by:jjacksn
5 Comments
 
LVL 5

Author Comment

by:jjacksn
ID: 9872353
I don't mind opening the stream after it has been written, rewritting the stream, and then removing all of the bad characters either.
0
 
LVL 23

Expert Comment

by:Roshan Davis
ID: 9872522
Get all node texts and replace 0x0A (carriage return char), 0x0D (enter char), 0x3F (question mark char (?)) characters
0
 
LVL 49

Assisted Solution

by:DanRollins
DanRollins earned 166 total points
ID: 9873116
>> square character (the character that is displayed for an invalid character),
That character just means that there is no glyph for the character code in the current font.  What you consider invalid may be perfectly valid if you display in a different font.

It is fairly easy to write code that reads an XML file and discards (or replaces with space) all characters that do not have glyphs in the a standard font, such as Arial or Courier.  Before generating the XML file, just trot each string through a function such as:

     Cleanup( char* p )  {
            while( *p != '\0') {
                  if ( (*p < ' ') || (*p > '~') ) {  // others if you want...
                       *p= ' '
                  }
                  p++;
            }
     }

-- Dan
0
 
LVL 5

Assisted Solution

by:rendaduiyan
rendaduiyan earned 166 total points
ID: 9873553
try STL replace,
//suppose char* pszSubject is declared
vector<char> vSubject(pszSubject, pszSubject + strlen(pszSubject));
replace(vSubject.begin(), vSubject.end(), '[', ' ') ;
replace(vSubject.begin(), vSubject.end(), ']', ' ') ;
....
0
 
LVL 2

Accepted Solution

by:
xssass earned 168 total points
ID: 9956837
hello,

You will get a square for non-printable characters. So you just have to filter them out... Theire ascii codes are below 32 and above 126. So the next code should work:

     Cleanup( char* p )  {
            while( *p != '\0') {
                  if ( (int(p) <= 31) || int(p) >= 127) {
                       *p= ' '
                  }
                  p++;
            }
     }

k.
0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction This article is the first in a series of articles about the C/C++ Visual Studio Express debugger.  It provides a quick start guide in using the debugger. Part 2 focuses on additional topics in breakpoints.  Lastly, Part 3 focuses on th…
What is C++ STL?: STL stands for Standard Template Library and is a part of standard C++ libraries. It contains many useful data structures (containers) and algorithms, which can spare you a lot of the time. Today we will look at the STL Vector. …
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question