bat or vbs to extract data from directory holding xml files

Hi Experts,

I need to parse "Sql Developer" history. That history is retained in a dir under appdata as xml files and looks like this:

<?xml version='1.0'  encoding='UTF-8' ?>
<history>
                <historyItem>
                                <sql><![CDATA[update epd01 set monres_fis_2 = '1,47' where numcpt = '103521560782' and monres_fis_2 = '2,94';
update epd01 set monres_fis_2 = '6,09' where numcpt = '103521723359' and monres_fis_2 = '12,18';
update epd01 set monres_fis_2 = '253,26' where numcpt = '103521738416' and monres_fis_2 = '506,52';
update epd01 set monres_fis_2 = '27,72' where numcpt = '103521762260' and monres_fis_2 = '55,44';
update epd01 set monres_fis_2 = '450,84' where numcpt = '103521762361' and monres_fis_2 = '901,68';]]></sql>
                                <connection><![CDATA[databasename]]></connection>
                                <timestamp><![CDATA[12/10/17 17:02]]></timestamp>
                                <type><![CDATA[Script]]></type>
                                <executed><![CDATA[1]]></executed>
                                <execTime><![CDATA[0.131]]></execTime>
                </historyItem>
</history>

Open in new window


All files in the dir need processed (in order) and certain values redirected into a file.
"timestamp" "connection" "sql" are the values needed.

Preferably we have "timestamp" "connection" "sql" piped along in the result file if that would be possible.
Can  you please check into this? I've been looking around EE but couldn't find something spot on.

Thanks.
Pascal
WatnogAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Bill PrewCommented:
What do you mean by processed "in order" ?

Please provide a sample of the output file you want.

Will all the XML files processed go into the same single output file?


»bp
0
WatnogAuthor Commented:
Good day Bill.
I attach a zip with just 4 xml files (they are small), and a file showing how the output could look like. I gave it an xml extension as that shows better in Notpad++.
With "in order" I mean that the files are to be read in the order they have in windows explorer (sorted on date/time), but i guess in Excel it could be sorted by the timestamp value, so that might not matter much.
And yes the output is collected in 1 file.

Thank you for your time.
Pascal
extract.xml
xml.zip
0
Ben Personick (Previously QCubed)Lead Network EngineerCommented:
REM Script: ConsolidateXMLs.cmd
@(
	SETLOCAL
	ECHO OFF
	SET "_SourceDir=C:\Admin\xmltest"
	SET "_FileGlob=*History.xml"
	SET "_OutputXML=C:\Admin\OuputXML.xml"
	SET "_eLvL=0"
)

CALL :Main

(
	ENDLOCAL
	EXIT /b %eLvl%
)

:Main
	FOR /R "%_SourceDir%" %%A IN ("%_FileGlob%") DO (
		ECHO. Processing "%%~fA"
		findstr /C:"<sql>" /C:"<timestamp>" /C:"<connection>" "%%~fA" >>"%_OutputXML%"
	)
	echo.
	echo. Processing Completed.
	echo. Output File Located here: "%_OutputXML%"
	echo.
	echo. To see output typed to screen press any key
	pause
	type "%_OutputXML%"
GOTO :EOF

Open in new window

0
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

Ben Personick (Previously QCubed)Lead Network EngineerCommented:
I realised that the above was not quite what you wanted.

So here you are, a working version which keeps the output as single lines and in the order you wanted:

REM Script: ConsolidateXMLs.cmd
@(
	SETLOCAL ENABLEDELAYEDEXPANSION
	ECHO OFF
	SET "_SourceDir=C:\Admin\xmltest"
	SET "_FileGlob=*History.xml"
	SET "_OutputXML=C:\Admin\extract.xml"
	SET "_eLvL=0"
)

CALL :Main

(
	ENDLOCAL
	EXIT /b %eLvl%
)

:Main
	ECHO.>"%_OutputXML%"
	FOR /R "%_SourceDir%" %%A IN ("%_FileGlob%") DO (
		ECHO. Processing "%%~fA"
		REM Trim the output
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<timestamp>" /C:"<connection>" "%%~fA"') DO (
			REM Collect the Timestamp and Connection:
			FOR /F "Tokens=1 Delims=<> " %%a IN ("%%~B") DO (
				CALL SET "_Var%%~a=%%~B"
			) 
		)
		REM Collect the SQL Query:
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<sql>" "%%~fA"') DO (
			FOR /F "Tokens=*" %%b IN ('ECHO. "!_Vartimestamp!~!_Varconnection!~%%~B"') DO (
				ECHO.%%~b
			)
		) >>"%_OutputXML%"
	)
	echo.
	echo. Processing Completed.
	echo. Output File Located here: "%_OutputXML%"
	echo.
	echo. To see output typed to screen press any key
	pause
	type "%_OutputXML%"
GOTO :EOF

:OrderThem
	ECHO.
	FOR /F "Tokens=* Delims=<> " %%A IN (%*) DO (
		ECHO. CALL SET _Tmp%%~A="<%%~A><%%~B"
		CALL SET _Tmp%%~A="<%%~A><%%~B"
	)
	IF /I "%_Tmptimestamp%" NEQ "" (
		FOR /F "Tokens=*"
		ECHO."%_Tmptimestamp%~%_Tmpconection%~%_Tmpsql%">>"%_OutputXML%"
		FOR %%A IN (sql connection timestamp) DO (
			ECHO.CALL SET "_Tmp%%~A="
			CALL SET "_Tmp%%~A="
		)
	)
GOTO :EOF

Open in new window

0
WatnogAuthor Commented:
That looks really good except that the timestamp "value" appears to be missing...
The output looks like this:

<timestamp><08]]></timestamp>~<connection><[CDATA[BLBESE]]></connection>~<sql><[CDATA[update eg_ctasi set indclo = '1' where numcta = 'LSKEK114QAG9V28LLXJW';

I attach what's produced.
Again, thanks.
W.
extract.xml
1
Bill PrewCommented:
Here is a basic VBS approach, give it a try an see what you think.  Save as a VBS file, and run as below after adjusting the folder and output paths near the top.

cscript EE29062182.vbs

' Define folder to scan for files, and output extract file
sBaseDir = "B:\ee\EE29062182\xmltest"
sOutFile = "B:\ee\EE29062182\extract.xml"

' Template for output lines with place holders for extracted text
sTemplate = "<timestamp>[tim]</timestamp>~<connection>[con]</connection>~<sql>[sql]</sql>"

' Create filesystem object, and XMLDOM object for parsing XML
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oXmlDoc = CreateObject("Microsoft.XMLDOM")
oXmlDoc.Async = "False"

' Open output file for writing
Set oOutFile = oFSO.OpenTextFile (sOutFile, 2, True)

' Process all the XML files in the folder
For Each oFile In oFSO.GetFolder(sBaseDir).Files
    If UCase(Right(oFile.Name, 4)) = ".XML" Then

        ' Load the XML file for parsing
        oXmlDoc.Load(oFile.Path)

        ' Process all historyItem nodes (typically only one)
        Set oNodes = oXmlDoc.selectNodes ("/history/historyItem")
        For Each oNode in oNodes

            ' Get the values of child notes we want
            sTim = oNode.getElementsByTagName("timestamp").Item(0).Text
            sCon = oNode.getElementsByTagName("connection").Item(0).Text
            sSql = oNode.getElementsByTagName("sql").Item(0).Text

            ' Build and write extracted values using template to output file
            sOut = Replace(Replace(Replace(sTemplate, "[tim]", sTim), "[con]", sCon), "[sql]", sSql)
            oOutFile.WriteLine sOut

        Next

    End If
Next

' Close output file
oOutFile.Close

Open in new window


»bp
0
Ben Personick (Previously QCubed)Lead Network EngineerCommented:
Hey there Watnog,

It's because at the last tick I tightened up the script into using Delayed expansion instead of using the "CALL" because I thought it looked nicer.

Here it is working with the delayed expansion removed again. :)

REM Script: ConsolidateXMLs.cmd
@(
	SETLOCAL
	ECHO OFF
	SET "_SourceDir=C:\Admin\xmltest"
	SET "_FileGlob=*History.xml"
	SET "_OutputXML=C:\Admin\extract.xml"
	SET "_eLvL=0"
)

CALL :Main

(
	ENDLOCAL
	EXIT /b %eLvl%
)

:Main
	DEL /F /Q "%_OutputXML%"
	FOR /R "%_SourceDir%" %%A IN ("%_FileGlob%") DO (
		ECHO. Processing "%%~fA"
		REM Trim the output
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<timestamp>" "%%~fA"') DO (
			 SET "_Vartimestamp=%%~B"
		)
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<connection>" "%%~fA"') DO (
			SET "_Varconnection=%%~B"
		)
		CALL :OrderThem "%%~fA"
	)
	echo.
	echo. Processing Completed.
	echo. Output File Located here: "%_OutputXML%"
	echo.
	echo. To see output typed to screen press any key
	pause
	type "%_OutputXML%"
GOTO :EOF

:setCMD
	ECHO.SET %*
	SET %*
GOTO :EOF

:OrderThem
	REM Collect the SQL Query:
	FOR /F "Tokens=*" %%B IN ('findstr /C:"<sql>" "%~1"') DO (
		FOR /F "Tokens=*" %%b IN ('ECHO. "%_Vartimestamp%~%_Varconnection%~%%~B"') DO (
			ECHO.%%~b
		)
	)>>"%_OutputXML%"
GOTO :EOF

Open in new window



Here is the resulting XML I generated from your source files:

<timestamp><![CDATA[12/10/17 17:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[select 'update epd01 set monres_fis_2 = '''||((select case when sum(monopn) is not null then sum(monopn) else 0 end from epd10 where typopn='17' and staopn between '4000' and '9000' and numcpt=epd01.numcpt)+(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn in ('04','05') and staopn <='9000' and numcpt=epd01.numcpt)-(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn='02' and staopn between '4000' and '9000' and numcpt=epd01.numcpt))||''' where numcpt = '''||numcpt||''' and monres_fis_2 = '''||monres_fis_2||''';'
<timestamp><![CDATA[12/10/17 16:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update cbd31 set valatt_car='0' where numcar_int in ('B7F29ZOGMJ1S8DQU','B7J11ZOFA138QBSN') and codzon='00Z';]]></sql>
<timestamp><![CDATA[12/10/17 17:02]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update epd01 set monres_fis_2 = '1,47' where numcpt = '103521560782' and monres_fis_2 = '2,94';
<timestamp><![CDATA[12/10/17 15:08]]></timestamp>~<connection><![CDATA[BLBESE]]></connection>~<sql><![CDATA[update eg_ctasi set indclo = '1' where numcta = 'LSKEK114QAG9V28LLXJW';

Open in new window

0
Ben Personick (Previously QCubed)Lead Network EngineerCommented:
@Watnot

  Note that the above script goes by general sort (Name) which is the default windows sort.
 
  If you want the XML sorted by modified date then a slight variation would be necessary which is in the script here:

REM Script: ConsolidateXMLs.cmd
@(
	SETLOCAL
	ECHO OFF
	SET "_SourceDir=C:\Admin\xmltest"
	SET "_FileGlob=*History.xml"
	SET "_OutputXML=C:\Admin\extract.xml"
	SET "_eLvL=0"
)

CALL :Main

(
	ENDLOCAL
	EXIT /b %eLvl%
)

:Main
	DEL /F /Q "%_OutputXML%"
	FOR /F "Tokens=*" %%A IN ('DIR /A:-D /B /O:D "%_SourceDir%\%_FileGlob%"') DO (
		ECHO. Processing "%_SourceDir%\%%~A"
		REM Trim the output
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<timestamp>" "%_SourceDir%\%%~A"') DO (
			 SET "_Vartimestamp=%%~B"
		)
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<connection>" "%_SourceDir%\%%~A"') DO (
			SET "_Varconnection=%%~B"
		)
		CALL :OrderThem "%_SourceDir%\%%~A"
	)
	echo.
	echo. Processing Completed.
	echo. Output File Located here: "%_OutputXML%"
	echo.
	echo. To see output typed to screen press any key
	pause
	type "%_OutputXML%"
GOTO :EOF

:setCMD
	ECHO.SET %*
	SET %*
GOTO :EOF

:OrderThem
	REM Collect the SQL Query:
	FOR /F "Tokens=*" %%B IN ('findstr /C:"<sql>" "%~1"') DO (
		FOR /F "Tokens=*" %%b IN ('ECHO. "%_Vartimestamp%~%_Varconnection%~%%~B"') DO (
			ECHO.%%~b
		)
	)>>"%_OutputXML%"
GOTO :EOF

Open in new window


Example Outpur XML:

<timestamp><![CDATA[12/10/17 15:08]]></timestamp>~<connection><![CDATA[BLBESE]]></connection>~<sql><![CDATA[update eg_ctasi set indclo = '1' where numcta = 'LSKEK114QAG9V28LLXJW';
<timestamp><![CDATA[12/10/17 16:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update cbd31 set valatt_car='0' where numcar_int in ('B7F29ZOGMJ1S8DQU','B7J11ZOFA138QBSN') and codzon='00Z';]]></sql>
<timestamp><![CDATA[12/10/17 17:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[select 'update epd01 set monres_fis_2 = '''||((select case when sum(monopn) is not null then sum(monopn) else 0 end from epd10 where typopn='17' and staopn between '4000' and '9000' and numcpt=epd01.numcpt)+(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn in ('04','05') and staopn <='9000' and numcpt=epd01.numcpt)-(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn='02' and staopn between '4000' and '9000' and numcpt=epd01.numcpt))||''' where numcpt = '''||numcpt||''' and monres_fis_2 = '''||monres_fis_2||''';'
<timestamp><![CDATA[12/10/17 17:02]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update epd01 set monres_fis_2 = '1,47' where numcpt = '103521560782' and monres_fis_2 = '2,94';

Open in new window



If you want to reverse the sort orrder than you can change the DIR with "/O:D" to read "/O:-D"

IE, THIS:
	FOR /F "Tokens=*" %%A IN ('DIR /A:-D /B /O:D "%_SourceDir%\%_FileGlob%"') DO (

Open in new window



Becomes THIS:
	FOR /F "Tokens=*" %%A IN ('DIR /A:-D /B /O:-D "%_SourceDir%\%_FileGlob%"') DO (

Open in new window


and the resulting XML looks like this:

<timestamp><![CDATA[12/10/17 17:02]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update epd01 set monres_fis_2 = '1,47' where numcpt = '103521560782' and monres_fis_2 = '2,94';
<timestamp><![CDATA[12/10/17 17:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[select 'update epd01 set monres_fis_2 = '''||((select case when sum(monopn) is not null then sum(monopn) else 0 end from epd10 where typopn='17' and staopn between '4000' and '9000' and numcpt=epd01.numcpt)+(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn in ('04','05') and staopn <='9000' and numcpt=epd01.numcpt)-(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn='02' and staopn between '4000' and '9000' and numcpt=epd01.numcpt))||''' where numcpt = '''||numcpt||''' and monres_fis_2 = '''||monres_fis_2||''';'
<timestamp><![CDATA[12/10/17 16:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update cbd31 set valatt_car='0' where numcar_int in ('B7F29ZOGMJ1S8DQU','B7J11ZOFA138QBSN') and codzon='00Z';]]></sql>
<timestamp><![CDATA[12/10/17 15:08]]></timestamp>~<connection><![CDATA[BLBESE]]></connection>~<sql><![CDATA[update eg_ctasi set indclo = '1' where numcta = 'LSKEK114QAG9V28LLXJW';

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
WatnogAuthor Commented:
I stick to Ben's solution as it allows me to sort and because I have 1 row per file processed: since I need vlookup potential that's crucial. I'm  very grateful.
Thank you both.
W.
0
Ben Personick (Previously QCubed)Lead Network EngineerCommented:
Hey Watnog, glad to help :)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Batch

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.