Link to home
Start Free TrialLog in
Avatar of Watnog
WatnogFlag for Belgium

asked on

excel vbs extract data from xml files: 1 line per file

Dear Experts, related to former question and the vba solution proposed there by Bill Prew.

In fact it's pretty good, but I need output for each file (so for each "timestamp") on 1 line.

The vbs looks like this:

' Define folder to scan for files, and output extract file
sBaseDir = "B:\ee\EE29062182\xmltest"
sOutFile = "B:\ee\EE29062182\extract.xml"

' Template for output lines with place holders for extracted text
sTemplate = "<timestamp>[tim]</timestamp>~<connection>[con]</connection>~<sql>[sql]</sql>"

' Create filesystem object, and XMLDOM object for parsing XML
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oXmlDoc = CreateObject("Microsoft.XMLDOM")
oXmlDoc.Async = "False"

' Open output file for writing
Set oOutFile = oFSO.OpenTextFile (sOutFile, 2, True)

' Process all the XML files in the folder
For Each oFile In oFSO.GetFolder(sBaseDir).Files
    If UCase(Right(oFile.Name, 4)) = ".XML" Then

        ' Load the XML file for parsing
        oXmlDoc.Load(oFile.Path)

        ' Process all historyItem nodes (typically only one)
        Set oNodes = oXmlDoc.selectNodes ("/history/historyItem")
        For Each oNode in oNodes

            ' Get the values of child notes we want
            sTim = oNode.getElementsByTagName("timestamp").Item(0).Text
            sCon = oNode.getElementsByTagName("connection").Item(0).Text
            sSql = oNode.getElementsByTagName("sql").Item(0).Text

            ' Build and write extracted values using template to output file
            sOut = Replace(Replace(Replace(sTemplate, "[tim]", sTim), "[con]", sCon), "[sql]", sSql)
            oOutFile.WriteLine sOut

        Next

    End If
Next

' Close output file
oOutFile.Close

Open in new window


It produces a file like below:

<timestamp>17/10/17 15:34</timestamp>~<connection>DB1</connection>~<sql>Delete from EPK.ZYQF3 where USERID = 'U325371';</sql>
<timestamp>17/10/17 15:26</timestamp>~<connection>DB1</connection>~<sql>update cwd32 set staexc='9900' where refopn = 'B7J13MPAR03IQGXY' and staexc='3502';
update cwd32 set staexc='9900' where refopn = 'B7J13MPAO10IQG13' and staexc='3502';
delete from nmd10 where refexn in 'B3D12CB5T00A006R';
delete from nmd10 where refexn in 'B6E10CBJJR13B3Z8';</sql>
<timestamp>17/10/17 15:24</timestamp>~<connection>DB2</connection>~<sql>delete blengadm.egi_ctarubsi where (numcta, iderub, valrub, datdeb) in (select egi_ctarubsi.numcta, egi_ctarubsi.iderub, egi_ctarubsi.valrub, egi_ctarubsi.datdeb from blengadm.egi_ctarubsi, blengadm.eg_ctasi, blengadm.eg_ctasi CPS
where egi_ctarubsi.numcta = eg_ctasi.numcta and egi_ctarubsi.iderub = 'SITX-CPS' and eg_ctasi.indclo = '0' and eg_ctasi.numcps = CPS.numctx and eg_ctasi.numcta <> CPS.numcta and
(CPS.numcta, egi_ctarubsi.valrub, egi_ctarubsi.datdeb) not in (select numcta, valrub, datdeb from blengadm.egi_ctarubsi where iderub='SITEX'));</sql>

There's 9 lines whereas ideally I have 4.

Can this be done?

Thanks for  your help.
W
Avatar of Bill Prew
Bill Prew

The multiple lines are because there are line breaks in the SQL data in the input XML file.  Do you want to remove them and force onto a single line, losing that formatting?


»bp
ASKER CERTIFIED SOLUTION
Avatar of Bill Prew
Bill Prew

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Watnog

ASKER

Yes, that's it.
Thank you Bill. Have a nice day!
Welcome.


»bp