We help IT Professionals succeed at work.

Schema to Correct XML Output

Is it possible to write a "conversion" schema which would reformat the xml file itself so that this

<class name="BottomLine" extends="change">
     <prop name="description" type="string"/>
     <prop name="new"/>
     <prop name="new2"/><mapping-rule><simple-mapping>aaa</simple-mapping></mapping-rule>
</class>

would be changed to

<class name="BottomLine" extends="change">
     <prop name="description" type="string"/>
     <prop name="new"/>
     <prop name="new2"/>
          <mapping-rule>
               <simple-mapping>aaa</simple-mapping>
          </mapping-rule>
</class>

What I'm looking for is a way to permanently change the xml file, so that the conversion schema would only need to be used once.

Comment
Watch Question

Commented:
You can state,

<xsl:output indent="yes"/>
after the,
<xsl:stylesheet>
tag -

but it didn't had any effect for me.
-- I could offer VB functions to convert the XML source, with indentations and line breaks, to a string, and to save it to file. Works great but might be not 'original' XSL.
Martin LissSocial distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Author

Commented:
What I've done so far for example is to add a text node with a value of "VbCrLf & vbTab & vbTab" prior to the <mapping-rule> node. While that LOOKS good, it has the problem that the text node is left behind if at some future date the <mapping-rule> node is deleted. If your VB code does something else, I'd be interested in seeing it.

Commented:
Well, that code works without any problem, but it's lengthy and thus, might spoil this thread. I'ld post it if there is no XSL solution in 2 days, OK?
Martin LissSocial distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Author

Commented:
Sure.

Commented:
A good tool to reformat Xml is Tidy, see:

http://www.w3.org/People/Raggett/tidy/
Martin LissSocial distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Author

Commented:
robbert, I don't think that chabaud's suggestion is something that I can use from within VB, so I would appreciate it if you would post your code. Thanks.

Commented:
Meanwhile, I found the following stylesheet, and it worked great for me. Originally, I meant the VB code posted after that which I used extensively, without any modification.

-----------------------------------------------------------

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output method="xml" indent="yes"/>
      <xsl:template match = "/">
            <xsl:copy-of select="." />
      </xsl:template>
</xsl:stylesheet>

-----------------------------------------------------------

Public Function FormatXML(ByRef objNode As MSXML2.IXMLDOMNode, _
Optional ByVal intTabLevel As Integer = 0, _
Optional ByVal strLineBreakChars As String = vbCrLf) As String
    On Error GoTo ErrHandler
   
    Dim blnMixedTextNode As Boolean
    Dim blnHasATextNodeOnly As Boolean
    Dim i As Integer

    With objNode
        Select Case .nodeType
        Case NODE_DOCUMENT, NODE_DOCUMENT_FRAGMENT
            ' all child nodes of the document should be at the same indent Level
            ' just iterate over them and recurse with 0 indent
            For i = 0 To .childNodes.length - 1
                FormatXML = FormatXML & FormatXML(.childNodes(i))
            Next i
       
        Case NODE_TEXT
            ' should render the same way the default IE5 stylesheet does for mixed content
            ' figure out if we're in some mixed content
            blnMixedTextNode = (.parentNode.childNodes.length > 1)  'if this text node has any siblings it's in mixed content

            ' if mixed indent this string
            If blnMixedTextNode Then FormatXML = String(intTabLevel, vbTab)
            ' we're gonna strip out any tabs and carriage returns from the Text
            FormatXML = FormatXML & Trim(Replace(Replace(.xml, strLineBreakChars, " "), vbTab, " "))
            ' if mixed add cariage return
            If blnMixedTextNode Then FormatXML = FormatXML & strLineBreakChars

        Case NODE_ELEMENT
            If .hasChildNodes Then
                ' if the node has only one child and that child is text we won't add carriage return after opening tag
                blnHasATextNodeOnly = (.childNodes(0).nodeType = NODE_TEXT) And (.childNodes.length = 1)
            End If
           
            ' open the start tag
            FormatXML = String(intTabLevel, vbTab) & "<" & .nodeName

            ' recurse over the attributes
            For i = 0 To .Attributes.length - 1
                FormatXML = FormatXML + FormatXML(.Attributes(i))
            Next i

            ' properly close the start tag based on node's contents
            If Not .hasChildNodes Then       ' no child nodes so it's an empty element
                FormatXML = FormatXML & "/>" & strLineBreakChars
               
            Else
                If blnHasATextNodeOnly Then    ' has only text for children - don't add carriage return
                    FormatXML = FormatXML & ">"
                Else                            ' has child elements - add carriage return
                    FormatXML = FormatXML & ">" & strLineBreakChars
                End If
               
                ' recurse if there's children
                For i = 0 To .childNodes.length - 1
                    FormatXML = FormatXML & FormatXML(.childNodes(i), intTabLevel + 1)
                Next i
               
                ' properly indent and add the end tag
                If Not blnHasATextNodeOnly Then FormatXML = FormatXML & String(intTabLevel, vbTab)
                FormatXML = FormatXML & "</" & .nodeName & ">" & strLineBreakChars
               
            End If
                   
        Case NODE_COMMENT, NODE_CDATA_SECTION
            ' if comment is on more than one line don't indent
            If InStr(1, .xml, vbCr) = 0 Then FormatXML = String(intTabLevel, vbTab)
            FormatXML = FormatXML & .xml & strLineBreakChars
       
        Case NODE_ATTRIBUTE
            ' if there are double quotes in the attribute use single quotes to surrond the attr value
            If InStr(1, .Text, Chr(34)) > 0 Then
                FormatXML = " " & .nodeName & "='" & .Text & "'"
            Else
                FormatXML = " " & .nodeName & "=" & Chr(34) & .Text & Chr(34)
            End If
       
        Case NODE_ENTITY
            ' and we would never want entites expanded
           
        Case Else
            ' all other node types should just return their xml (properly indented)
            ' these include - entity refs, pi's, notations, doctypes
            FormatXML = String(intTabLevel, vbTab) & .xml & strLineBreakChars
           
        End Select
    End With
   
    ' the msxml parser is using vbCrLf as line separator...
    If strLineBreakChars <> vbCrLf Then
        FormatXML = Replace(FormatXML, vbCrLf, strLineBreakChars)
    End If
   
    Exit Function
   
ErrHandler:
    Err.Raise Err.Number, Err.Source, Err.Description
End Function

Commented:
& good luck :-)
Martin LissSocial distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Author

Commented:
I'm sorry it has taken me this long to respond, but I've been busy with another project. I tried using the VB code but it doesn't work for me. I first tried to call it with a node that I just added to the XML and the second time I tried to pass it the root node at program termination. Neither way seemed to correct the output. As far as the stylesheet goes, I have to confess my ignorance on how to use it. Do I have to insert it or refer to it in the XML file itself, or can I "call" it somehow from within my VB program?

Commented:
Here is a good doc from Microsoft:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/xmlsdk30/htm/xmmthtransformnode.asp


Dim source As New Msxml2.DOMDocument
Dim stylesheet As New Msxml2.DOMDocument

' Load data.
source.async = False
source.Load "books.xml"
 
' Load style sheet.
stylesheet.async = False
stylesheet.Load "sample.xsl"

' Do the transform
MsgBox source.transformNode(stylesheet)

Commented:
As for the VB code, it returns a formatted XML-string but doesn't change the XML of any node.
As for the stylesheet, you would do what chabaud suggested (the same you're doing all the time when applying an XSL stylesheet).
Martin LissSocial distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Author

Commented:
chabuad (or anyone): Nothing that I do seems to correct the look of my output xml file. Even the messagebox in your code example displays the xml unchanged. (Note: I did not Dim or use "source", but rather I substituted my already loaded DOMDocument named XMLDoc and did "MsgBox XMLDoc.transformNode(stylesheet)".

I've increased the points for this and if someone would email me at martin.liss@icn.siemens.com I will send them a zip file containing the problem xml file, the stylesheet and VB code that they can hopefully change to show me whatI am doing wrong.

To restate what I am trying to do... I'm looking for a way to permanently change the xml file so that when it is looked at as a text file (in standard Notepad for example) the indenting is the same as would be seen if I opened the xml in IE.

Commented:
>  the stylesheet

You would use the stylesheet I've posted above. - I emailed you.

Commented:
Private Sub Command2_Click()

    Dim source As New Msxml2.DOMDocument
    Dim stylesheet As New Msxml2.DOMDocument
   
    ' Load data.
    source.async = False
    source.Load (App.Path & "\small.xml")
   
    ' Load style sheet.
    stylesheet.async = False
    stylesheet.Load App.Path & "\output.xsl"
   
    ' Do the transform
    source.transformNode stylesheet
   
    source.save App.Path & "\_result.xml"
   
    Exit Sub
   
End Sub

------------------
works for me; it's indenting the nodes I un-indented.

But I changed standalone to Yes, removed the <?xml-stylesheet> tag and the namespaces as there was an error loading it.
Martin LissSocial distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Author

Commented:
robbert: Thank you for the code you sent me. It does reformat the XML file but I can't use it. The reason for that is that my actual xml file is very large (12,000+ lines) and the code to reformat it takes much too long to be acceptable to a user of my program. I'm still hoping for a stylesheet or similar approach that will work.

Commented:
OK. - For the reference, let's note that the following:

-----------------------------------------------------------
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="xml" indent="yes"/>
     <xsl:template match = "/">
          <xsl:copy-of select="." />
     </xsl:template>
</xsl:stylesheet>
-----------------------------------------------------------

has a bug as it doesn't arrange multiple nodes being on one line.

Commented:
...with MSXML v3.0
Martin LissSocial distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Author

Commented:
msxml4 is supposed to be out any day now and maybe it will fix that bug, but when I get the chance to try it, maybe I could use code similar to yours to just find and break up lines that contain more than one node and then use the stylesheet. I'm hoping that that code along with the stylesheet will not take long to execute. If it works, I'll award the points to you.
Commented:
I thought the nodes in one line might be "divided" by vbCr's (or even vbLf's?), and MSXML splits by vbCrLf. But I don't think the probability is high. - MSXML is using vbLf as line feed within a node's text.
Martin LissSocial distance - Don't touch your face - Wash your hands for 20 seconds
CERTIFIED EXPERT
Most Valuable Expert 2017
Distinguished Expert 2018

Author

Commented:
I'm accepting your answers to my problem as the solution because while my problem is not completely solved your code and comments suggested to me that I could (at least) break up the run-on lines. I've done that and while I still haven't been able to get the indentations to change, at least now it is more easily read in a text editor.

(I hope you don't mind the "grade".)

Explore More ContentExplore courses, solutions, and other research materials related to this topic.