Solved

Encoding ASCII characters 128-255 as character references

Posted on 2004-04-23
13
1,060 Views
Last Modified: 2006-11-17
How can I use the XMLDocument object to encode characters in the ascii range 128-255 as character references (such as ¿) rather than the two byte representation of the character?
0
Comment
Question by:PLavelle
  • 7
  • 4
  • 2
13 Comments
 
LVL 26

Expert Comment

by:rdcpro
ID: 10902547
You can't.  In fact, if you start out with character references in an XML object, they will be parsed into their characters, and  unless it's a markup character, or a character that should be escaped (according to HTML), it will be output as the character itself.  For example, the non-breaking space character, &#160; is parsed, and output as the actual character...it's not re-escaped.  In an XSLT transformation, some characters will be escaped, depending on whether you set the <xsl:output method="html" />

It sounds to me like you're experiencing an encoding problem, which can come from a number of different places.  If you're seeing two-byte characters, then I'm betting you're doing something where the XML (or the output of the transform) gets converted to a string.  This will force UTF-16, which is a switch in encoding, and can cause the wrong characters to be rendered. What exactly is going wrong?  

Regards,
Mike Sharp
0
 

Author Comment

by:PLavelle
ID: 10902619
I am using an XMLDocument object in .NET to write out the XML from a .NET application. When I read the XML into a VB6 application, the string contains two characters for every character over 127 in the ascii table. For instance, ¿ is replaced by ¿ .
0
 

Author Comment

by:PLavelle
ID: 10902656
Also, the XML is read into VB6 as a string, not using the DOM.
0
 
LVL 26

Expert Comment

by:rdcpro
ID: 10902779
Strings are always UTF-16.  This is probably the issue you're having.  VB6 thinks the xml is a different encoding because of the xml declaration...or some other similar reason.   If you're going to use strings at any point, you need to make sure the file is written out as UTF-16.  In an XSLT, you could do this with:

<xsl:output method="xml" encoding="utf-16" />

Can you post some of the code that you use to write out the XML in .NET, as well as a snippet of the XML (including the declaration, if present)?  


Regards,
Mike Sharp
0
 

Author Comment

by:PLavelle
ID: 10902896
I changed it to UTF-16 encoding and it still writes out two byte characters. VB6 interprets these two byte characters as two characters instead of just one.
0
 
LVL 10

Expert Comment

by:Yury_Delendik
ID: 10902960
Try this

Dim sw As New System.IO.StreamWriter("test.xml", System.Text.Encoding.ASCII)
xmldoc.Save(sw)
sw.Close()
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 

Author Comment

by:PLavelle
ID: 10903036
Yury, that just replaces those characters with question marks.
0
 
LVL 26

Expert Comment

by:rdcpro
ID: 10903113
Yury:  The problem with that approach is that it will simply strip the high order bytes, losing the characters.  The characters will be replaced by the ASCII codepoint 63 which is a "?".   It would be better to save it in UTF-16, if it's subsequently going to be used in a string by VB6.  For Intel based systems:

Dim sw As New System.IO.StreamWriter("test.xml", System.Text.Encoding.Unicode)

PLavelle:  How did you change the encoding to UTF-16?  You can't do this simply by changing the declaration, you must explicitly change the encoding when it's being output.  Post your code, so I can see what's going on.

Regards,
Mike Sharp
0
 

Author Comment

by:PLavelle
ID: 10903154
       Dim iRow As Int32
        Dim iCol As Int32
        Dim sValue As String
        Dim xmlDoc As New XmlDocument
        Dim oElement As XmlElement
        Dim oAttribute As XmlAttribute
        Dim xmlWriter As New XmlTextWriter("c:\test2.xml", System.Text.Encoding.Unicode)

        oElement = xmlDoc.CreateElement("Table")
        oAttribute = xmlDoc.CreateAttribute("Company")
        oAttribute.Value = Constants.Company.Name
        oElement.Attributes.Append(oAttribute)
        oAttribute = xmlDoc.CreateAttribute("Version")
        oAttribute.Value = Constants.Versions.FundTable
        oElement.Attributes.Append(oAttribute)

        xmlDoc.AppendChild(oElement)
        For iRow = 1 To grd.Rows.Count - 1
            oElement = xmlDoc.CreateElement("DocElement")
            For iCol = 0 To grd.Cols.Count - 1
                sValue = Trim(grd(iRow, iCol))
                If Trim(sValue) <> "" Then
                    oAttribute = xmlDoc.CreateAttribute(grd.Cols(iCol).Name)
                    oAttribute.Value = sValue
                    oElement.Attributes.Append(oAttribute)
                End If
            Next
            xmlDoc.DocumentElement.AppendChild(oElement)
        Next

        xmlDoc.Save(xmlWriter)
0
 

Author Comment

by:PLavelle
ID: 10903161
I had to change some names, but there is the code that writes out the XML from the grid.
0
 
LVL 26

Accepted Solution

by:
rdcpro earned 500 total points
ID: 10903297
Hmmm... I don't see anything wrong there.  I'd say the data is messed up either before this (when the grd is filled out) or during the load in VB6.  You could try ISO-8859-1 for the output encoding...but I'd first try to specify

Dim xmlWriter As New XmlTextWriter("c:\test2.xml", System.Text.Encoding.Default)

This will cause the encoding to be whatever your systems default codepage is, and hopefully the VB6 app will figure it out when it reads the file.

Let me know how this works!

Regards,
Mike Sharp
0
 
LVL 10

Expert Comment

by:Yury_Delendik
ID: 10907704
What are you using to read data in VB6? FileSystem Object ot standard VB statements?
0
 

Author Comment

by:PLavelle
ID: 10929366
The ISO-8859-1 encoding seems to write out the correct character (1 byte). Thanks for all of your help.
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Summary Displaying images in RichTextBox is a common requirement with limited solutions available. Pasting through clipboard or embedding into RTF content only support static images.  This article describes how to insert Windows control objects int…
A long time ago (May 2011), I have written an article showing you how to create a DLL using Visual Studio 2005 to be hosted in SQL Server 2005. That was valid at that time and it is still valid if you are still using these versions. You can still re…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now