[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

bat or vbs to extract data from directory holding xml files

Posted on 2017-10-12
10
Medium Priority
?
72 Views
Last Modified: 2017-10-16
Hi Experts,

I need to parse "Sql Developer" history. That history is retained in a dir under appdata as xml files and looks like this:

<?xml version='1.0'  encoding='UTF-8' ?>
<history>
                <historyItem>
                                <sql><![CDATA[update epd01 set monres_fis_2 = '1,47' where numcpt = '103521560782' and monres_fis_2 = '2,94';
update epd01 set monres_fis_2 = '6,09' where numcpt = '103521723359' and monres_fis_2 = '12,18';
update epd01 set monres_fis_2 = '253,26' where numcpt = '103521738416' and monres_fis_2 = '506,52';
update epd01 set monres_fis_2 = '27,72' where numcpt = '103521762260' and monres_fis_2 = '55,44';
update epd01 set monres_fis_2 = '450,84' where numcpt = '103521762361' and monres_fis_2 = '901,68';]]></sql>
                                <connection><![CDATA[databasename]]></connection>
                                <timestamp><![CDATA[12/10/17 17:02]]></timestamp>
                                <type><![CDATA[Script]]></type>
                                <executed><![CDATA[1]]></executed>
                                <execTime><![CDATA[0.131]]></execTime>
                </historyItem>
</history>

Open in new window


All files in the dir need processed (in order) and certain values redirected into a file.
"timestamp" "connection" "sql" are the values needed.

Preferably we have "timestamp" "connection" "sql" piped along in the result file if that would be possible.
Can  you please check into this? I've been looking around EE but couldn't find something spot on.

Thanks.
Pascal
0
Comment
Question by:Watnog
  • 5
  • 3
  • 2
10 Comments
 
LVL 59

Expert Comment

by:Bill Prew
ID: 42328789
What do you mean by processed "in order" ?

Please provide a sample of the output file you want.

Will all the XML files processed go into the same single output file?


»bp
0
 

Author Comment

by:Watnog
ID: 42328966
Good day Bill.
I attach a zip with just 4 xml files (they are small), and a file showing how the output could look like. I gave it an xml extension as that shows better in Notpad++.
With "in order" I mean that the files are to be read in the order they have in windows explorer (sorted on date/time), but i guess in Excel it could be sorted by the timestamp value, so that might not matter much.
And yes the output is collected in 1 file.

Thank you for your time.
Pascal
extract.xml
xml.zip
0
 
LVL 15
ID: 42329343
REM Script: ConsolidateXMLs.cmd
@(
	SETLOCAL
	ECHO OFF
	SET "_SourceDir=C:\Admin\xmltest"
	SET "_FileGlob=*History.xml"
	SET "_OutputXML=C:\Admin\OuputXML.xml"
	SET "_eLvL=0"
)

CALL :Main

(
	ENDLOCAL
	EXIT /b %eLvl%
)

:Main
	FOR /R "%_SourceDir%" %%A IN ("%_FileGlob%") DO (
		ECHO. Processing "%%~fA"
		findstr /C:"<sql>" /C:"<timestamp>" /C:"<connection>" "%%~fA" >>"%_OutputXML%"
	)
	echo.
	echo. Processing Completed.
	echo. Output File Located here: "%_OutputXML%"
	echo.
	echo. To see output typed to screen press any key
	pause
	type "%_OutputXML%"
GOTO :EOF

Open in new window

0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 15
ID: 42329390
I realised that the above was not quite what you wanted.

So here you are, a working version which keeps the output as single lines and in the order you wanted:

REM Script: ConsolidateXMLs.cmd
@(
	SETLOCAL ENABLEDELAYEDEXPANSION
	ECHO OFF
	SET "_SourceDir=C:\Admin\xmltest"
	SET "_FileGlob=*History.xml"
	SET "_OutputXML=C:\Admin\extract.xml"
	SET "_eLvL=0"
)

CALL :Main

(
	ENDLOCAL
	EXIT /b %eLvl%
)

:Main
	ECHO.>"%_OutputXML%"
	FOR /R "%_SourceDir%" %%A IN ("%_FileGlob%") DO (
		ECHO. Processing "%%~fA"
		REM Trim the output
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<timestamp>" /C:"<connection>" "%%~fA"') DO (
			REM Collect the Timestamp and Connection:
			FOR /F "Tokens=1 Delims=<> " %%a IN ("%%~B") DO (
				CALL SET "_Var%%~a=%%~B"
			) 
		)
		REM Collect the SQL Query:
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<sql>" "%%~fA"') DO (
			FOR /F "Tokens=*" %%b IN ('ECHO. "!_Vartimestamp!~!_Varconnection!~%%~B"') DO (
				ECHO.%%~b
			)
		) >>"%_OutputXML%"
	)
	echo.
	echo. Processing Completed.
	echo. Output File Located here: "%_OutputXML%"
	echo.
	echo. To see output typed to screen press any key
	pause
	type "%_OutputXML%"
GOTO :EOF

:OrderThem
	ECHO.
	FOR /F "Tokens=* Delims=<> " %%A IN (%*) DO (
		ECHO. CALL SET _Tmp%%~A="<%%~A><%%~B"
		CALL SET _Tmp%%~A="<%%~A><%%~B"
	)
	IF /I "%_Tmptimestamp%" NEQ "" (
		FOR /F "Tokens=*"
		ECHO."%_Tmptimestamp%~%_Tmpconection%~%_Tmpsql%">>"%_OutputXML%"
		FOR %%A IN (sql connection timestamp) DO (
			ECHO.CALL SET "_Tmp%%~A="
			CALL SET "_Tmp%%~A="
		)
	)
GOTO :EOF

Open in new window

0
 

Author Comment

by:Watnog
ID: 42329734
That looks really good except that the timestamp "value" appears to be missing...
The output looks like this:

<timestamp><08]]></timestamp>~<connection><[CDATA[BLBESE]]></connection>~<sql><[CDATA[update eg_ctasi set indclo = '1' where numcta = 'LSKEK114QAG9V28LLXJW';

I attach what's produced.
Again, thanks.
W.
extract.xml
1
 
LVL 59

Expert Comment

by:Bill Prew
ID: 42330665
Here is a basic VBS approach, give it a try an see what you think.  Save as a VBS file, and run as below after adjusting the folder and output paths near the top.

cscript EE29062182.vbs

' Define folder to scan for files, and output extract file
sBaseDir = "B:\ee\EE29062182\xmltest"
sOutFile = "B:\ee\EE29062182\extract.xml"

' Template for output lines with place holders for extracted text
sTemplate = "<timestamp>[tim]</timestamp>~<connection>[con]</connection>~<sql>[sql]</sql>"

' Create filesystem object, and XMLDOM object for parsing XML
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oXmlDoc = CreateObject("Microsoft.XMLDOM")
oXmlDoc.Async = "False"

' Open output file for writing
Set oOutFile = oFSO.OpenTextFile (sOutFile, 2, True)

' Process all the XML files in the folder
For Each oFile In oFSO.GetFolder(sBaseDir).Files
    If UCase(Right(oFile.Name, 4)) = ".XML" Then

        ' Load the XML file for parsing
        oXmlDoc.Load(oFile.Path)

        ' Process all historyItem nodes (typically only one)
        Set oNodes = oXmlDoc.selectNodes ("/history/historyItem")
        For Each oNode in oNodes

            ' Get the values of child notes we want
            sTim = oNode.getElementsByTagName("timestamp").Item(0).Text
            sCon = oNode.getElementsByTagName("connection").Item(0).Text
            sSql = oNode.getElementsByTagName("sql").Item(0).Text

            ' Build and write extracted values using template to output file
            sOut = Replace(Replace(Replace(sTemplate, "[tim]", sTim), "[con]", sCon), "[sql]", sSql)
            oOutFile.WriteLine sOut

        Next

    End If
Next

' Close output file
oOutFile.Close

Open in new window


»bp
0
 
LVL 15
ID: 42330701
Hey there Watnog,

It's because at the last tick I tightened up the script into using Delayed expansion instead of using the "CALL" because I thought it looked nicer.

Here it is working with the delayed expansion removed again. :)

REM Script: ConsolidateXMLs.cmd
@(
	SETLOCAL
	ECHO OFF
	SET "_SourceDir=C:\Admin\xmltest"
	SET "_FileGlob=*History.xml"
	SET "_OutputXML=C:\Admin\extract.xml"
	SET "_eLvL=0"
)

CALL :Main

(
	ENDLOCAL
	EXIT /b %eLvl%
)

:Main
	DEL /F /Q "%_OutputXML%"
	FOR /R "%_SourceDir%" %%A IN ("%_FileGlob%") DO (
		ECHO. Processing "%%~fA"
		REM Trim the output
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<timestamp>" "%%~fA"') DO (
			 SET "_Vartimestamp=%%~B"
		)
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<connection>" "%%~fA"') DO (
			SET "_Varconnection=%%~B"
		)
		CALL :OrderThem "%%~fA"
	)
	echo.
	echo. Processing Completed.
	echo. Output File Located here: "%_OutputXML%"
	echo.
	echo. To see output typed to screen press any key
	pause
	type "%_OutputXML%"
GOTO :EOF

:setCMD
	ECHO.SET %*
	SET %*
GOTO :EOF

:OrderThem
	REM Collect the SQL Query:
	FOR /F "Tokens=*" %%B IN ('findstr /C:"<sql>" "%~1"') DO (
		FOR /F "Tokens=*" %%b IN ('ECHO. "%_Vartimestamp%~%_Varconnection%~%%~B"') DO (
			ECHO.%%~b
		)
	)>>"%_OutputXML%"
GOTO :EOF

Open in new window



Here is the resulting XML I generated from your source files:

<timestamp><![CDATA[12/10/17 17:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[select 'update epd01 set monres_fis_2 = '''||((select case when sum(monopn) is not null then sum(monopn) else 0 end from epd10 where typopn='17' and staopn between '4000' and '9000' and numcpt=epd01.numcpt)+(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn in ('04','05') and staopn <='9000' and numcpt=epd01.numcpt)-(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn='02' and staopn between '4000' and '9000' and numcpt=epd01.numcpt))||''' where numcpt = '''||numcpt||''' and monres_fis_2 = '''||monres_fis_2||''';'
<timestamp><![CDATA[12/10/17 16:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update cbd31 set valatt_car='0' where numcar_int in ('B7F29ZOGMJ1S8DQU','B7J11ZOFA138QBSN') and codzon='00Z';]]></sql>
<timestamp><![CDATA[12/10/17 17:02]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update epd01 set monres_fis_2 = '1,47' where numcpt = '103521560782' and monres_fis_2 = '2,94';
<timestamp><![CDATA[12/10/17 15:08]]></timestamp>~<connection><![CDATA[BLBESE]]></connection>~<sql><![CDATA[update eg_ctasi set indclo = '1' where numcta = 'LSKEK114QAG9V28LLXJW';

Open in new window

0
 
LVL 15

Accepted Solution

by:
Ben Personick (Previously QCubed) earned 2000 total points
ID: 42330705
@Watnot

  Note that the above script goes by general sort (Name) which is the default windows sort.
 
  If you want the XML sorted by modified date then a slight variation would be necessary which is in the script here:

REM Script: ConsolidateXMLs.cmd
@(
	SETLOCAL
	ECHO OFF
	SET "_SourceDir=C:\Admin\xmltest"
	SET "_FileGlob=*History.xml"
	SET "_OutputXML=C:\Admin\extract.xml"
	SET "_eLvL=0"
)

CALL :Main

(
	ENDLOCAL
	EXIT /b %eLvl%
)

:Main
	DEL /F /Q "%_OutputXML%"
	FOR /F "Tokens=*" %%A IN ('DIR /A:-D /B /O:D "%_SourceDir%\%_FileGlob%"') DO (
		ECHO. Processing "%_SourceDir%\%%~A"
		REM Trim the output
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<timestamp>" "%_SourceDir%\%%~A"') DO (
			 SET "_Vartimestamp=%%~B"
		)
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<connection>" "%_SourceDir%\%%~A"') DO (
			SET "_Varconnection=%%~B"
		)
		CALL :OrderThem "%_SourceDir%\%%~A"
	)
	echo.
	echo. Processing Completed.
	echo. Output File Located here: "%_OutputXML%"
	echo.
	echo. To see output typed to screen press any key
	pause
	type "%_OutputXML%"
GOTO :EOF

:setCMD
	ECHO.SET %*
	SET %*
GOTO :EOF

:OrderThem
	REM Collect the SQL Query:
	FOR /F "Tokens=*" %%B IN ('findstr /C:"<sql>" "%~1"') DO (
		FOR /F "Tokens=*" %%b IN ('ECHO. "%_Vartimestamp%~%_Varconnection%~%%~B"') DO (
			ECHO.%%~b
		)
	)>>"%_OutputXML%"
GOTO :EOF

Open in new window


Example Outpur XML:

<timestamp><![CDATA[12/10/17 15:08]]></timestamp>~<connection><![CDATA[BLBESE]]></connection>~<sql><![CDATA[update eg_ctasi set indclo = '1' where numcta = 'LSKEK114QAG9V28LLXJW';
<timestamp><![CDATA[12/10/17 16:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update cbd31 set valatt_car='0' where numcar_int in ('B7F29ZOGMJ1S8DQU','B7J11ZOFA138QBSN') and codzon='00Z';]]></sql>
<timestamp><![CDATA[12/10/17 17:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[select 'update epd01 set monres_fis_2 = '''||((select case when sum(monopn) is not null then sum(monopn) else 0 end from epd10 where typopn='17' and staopn between '4000' and '9000' and numcpt=epd01.numcpt)+(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn in ('04','05') and staopn <='9000' and numcpt=epd01.numcpt)-(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn='02' and staopn between '4000' and '9000' and numcpt=epd01.numcpt))||''' where numcpt = '''||numcpt||''' and monres_fis_2 = '''||monres_fis_2||''';'
<timestamp><![CDATA[12/10/17 17:02]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update epd01 set monres_fis_2 = '1,47' where numcpt = '103521560782' and monres_fis_2 = '2,94';

Open in new window



If you want to reverse the sort orrder than you can change the DIR with "/O:D" to read "/O:-D"

IE, THIS:
	FOR /F "Tokens=*" %%A IN ('DIR /A:-D /B /O:D "%_SourceDir%\%_FileGlob%"') DO (

Open in new window



Becomes THIS:
	FOR /F "Tokens=*" %%A IN ('DIR /A:-D /B /O:-D "%_SourceDir%\%_FileGlob%"') DO (

Open in new window


and the resulting XML looks like this:

<timestamp><![CDATA[12/10/17 17:02]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update epd01 set monres_fis_2 = '1,47' where numcpt = '103521560782' and monres_fis_2 = '2,94';
<timestamp><![CDATA[12/10/17 17:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[select 'update epd01 set monres_fis_2 = '''||((select case when sum(monopn) is not null then sum(monopn) else 0 end from epd10 where typopn='17' and staopn between '4000' and '9000' and numcpt=epd01.numcpt)+(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn in ('04','05') and staopn <='9000' and numcpt=epd01.numcpt)-(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn='02' and staopn between '4000' and '9000' and numcpt=epd01.numcpt))||''' where numcpt = '''||numcpt||''' and monres_fis_2 = '''||monres_fis_2||''';'
<timestamp><![CDATA[12/10/17 16:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update cbd31 set valatt_car='0' where numcar_int in ('B7F29ZOGMJ1S8DQU','B7J11ZOFA138QBSN') and codzon='00Z';]]></sql>
<timestamp><![CDATA[12/10/17 15:08]]></timestamp>~<connection><![CDATA[BLBESE]]></connection>~<sql><![CDATA[update eg_ctasi set indclo = '1' where numcta = 'LSKEK114QAG9V28LLXJW';

Open in new window

0
 

Author Closing Comment

by:Watnog
ID: 42332579
I stick to Ben's solution as it allows me to sort and because I have 1 row per file processed: since I need vlookup potential that's crucial. I'm  very grateful.
Thank you both.
W.
0
 
LVL 15
ID: 42332745
Hey Watnog, glad to help :)
0

Featured Post

How to Use the Help Bell

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention.  Check out this how-to article for more information.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
Introduction: Recently, I got a requirement to zip all files individually with batch file script in Windows OS. I don't know much about scripting, but I searched Google and found a lot of examples and websites to complete my task. Finally, I was ab…
This Micro Tutorial will teach you how to add a cinematic look to any film or video out there. There are very few simple steps that you will follow to do so. This will be demonstrated using Adobe Premiere Pro CS6.
In a question here at Experts Exchange (https://www.experts-exchange.com/questions/29062564/Adobe-acrobat-reader-DC.html), a member asked how to create a signature in Adobe Acrobat Reader DC (the free Reader product, not the paid, full Acrobat produ…
Suggested Courses

831 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question