[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now

x
?
Solved

bat or vbs to extract data from directory holding xml files

Posted on 2017-10-12
10
Medium Priority
?
53 Views
Last Modified: 2017-10-16
Hi Experts,

I need to parse "Sql Developer" history. That history is retained in a dir under appdata as xml files and looks like this:

<?xml version='1.0'  encoding='UTF-8' ?>
<history>
                <historyItem>
                                <sql><![CDATA[update epd01 set monres_fis_2 = '1,47' where numcpt = '103521560782' and monres_fis_2 = '2,94';
update epd01 set monres_fis_2 = '6,09' where numcpt = '103521723359' and monres_fis_2 = '12,18';
update epd01 set monres_fis_2 = '253,26' where numcpt = '103521738416' and monres_fis_2 = '506,52';
update epd01 set monres_fis_2 = '27,72' where numcpt = '103521762260' and monres_fis_2 = '55,44';
update epd01 set monres_fis_2 = '450,84' where numcpt = '103521762361' and monres_fis_2 = '901,68';]]></sql>
                                <connection><![CDATA[databasename]]></connection>
                                <timestamp><![CDATA[12/10/17 17:02]]></timestamp>
                                <type><![CDATA[Script]]></type>
                                <executed><![CDATA[1]]></executed>
                                <execTime><![CDATA[0.131]]></execTime>
                </historyItem>
</history>

Open in new window


All files in the dir need processed (in order) and certain values redirected into a file.
"timestamp" "connection" "sql" are the values needed.

Preferably we have "timestamp" "connection" "sql" piped along in the result file if that would be possible.
Can  you please check into this? I've been looking around EE but couldn't find something spot on.

Thanks.
Pascal
0
Comment
Question by:Watnog
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
  • 2
10 Comments
 
LVL 58

Expert Comment

by:Bill Prew
What do you mean by processed "in order" ?

Please provide a sample of the output file you want.

Will all the XML files processed go into the same single output file?


»bp
0
 

Author Comment

by:Watnog
Good day Bill.
I attach a zip with just 4 xml files (they are small), and a file showing how the output could look like. I gave it an xml extension as that shows better in Notpad++.
With "in order" I mean that the files are to be read in the order they have in windows explorer (sorted on date/time), but i guess in Excel it could be sorted by the timestamp value, so that might not matter much.
And yes the output is collected in 1 file.

Thank you for your time.
Pascal
extract.xml
xml.zip
0
 
LVL 14
REM Script: ConsolidateXMLs.cmd
@(
	SETLOCAL
	ECHO OFF
	SET "_SourceDir=C:\Admin\xmltest"
	SET "_FileGlob=*History.xml"
	SET "_OutputXML=C:\Admin\OuputXML.xml"
	SET "_eLvL=0"
)

CALL :Main

(
	ENDLOCAL
	EXIT /b %eLvl%
)

:Main
	FOR /R "%_SourceDir%" %%A IN ("%_FileGlob%") DO (
		ECHO. Processing "%%~fA"
		findstr /C:"<sql>" /C:"<timestamp>" /C:"<connection>" "%%~fA" >>"%_OutputXML%"
	)
	echo.
	echo. Processing Completed.
	echo. Output File Located here: "%_OutputXML%"
	echo.
	echo. To see output typed to screen press any key
	pause
	type "%_OutputXML%"
GOTO :EOF

Open in new window

0
Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

 
LVL 14
I realised that the above was not quite what you wanted.

So here you are, a working version which keeps the output as single lines and in the order you wanted:

REM Script: ConsolidateXMLs.cmd
@(
	SETLOCAL ENABLEDELAYEDEXPANSION
	ECHO OFF
	SET "_SourceDir=C:\Admin\xmltest"
	SET "_FileGlob=*History.xml"
	SET "_OutputXML=C:\Admin\extract.xml"
	SET "_eLvL=0"
)

CALL :Main

(
	ENDLOCAL
	EXIT /b %eLvl%
)

:Main
	ECHO.>"%_OutputXML%"
	FOR /R "%_SourceDir%" %%A IN ("%_FileGlob%") DO (
		ECHO. Processing "%%~fA"
		REM Trim the output
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<timestamp>" /C:"<connection>" "%%~fA"') DO (
			REM Collect the Timestamp and Connection:
			FOR /F "Tokens=1 Delims=<> " %%a IN ("%%~B") DO (
				CALL SET "_Var%%~a=%%~B"
			) 
		)
		REM Collect the SQL Query:
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<sql>" "%%~fA"') DO (
			FOR /F "Tokens=*" %%b IN ('ECHO. "!_Vartimestamp!~!_Varconnection!~%%~B"') DO (
				ECHO.%%~b
			)
		) >>"%_OutputXML%"
	)
	echo.
	echo. Processing Completed.
	echo. Output File Located here: "%_OutputXML%"
	echo.
	echo. To see output typed to screen press any key
	pause
	type "%_OutputXML%"
GOTO :EOF

:OrderThem
	ECHO.
	FOR /F "Tokens=* Delims=<> " %%A IN (%*) DO (
		ECHO. CALL SET _Tmp%%~A="<%%~A><%%~B"
		CALL SET _Tmp%%~A="<%%~A><%%~B"
	)
	IF /I "%_Tmptimestamp%" NEQ "" (
		FOR /F "Tokens=*"
		ECHO."%_Tmptimestamp%~%_Tmpconection%~%_Tmpsql%">>"%_OutputXML%"
		FOR %%A IN (sql connection timestamp) DO (
			ECHO.CALL SET "_Tmp%%~A="
			CALL SET "_Tmp%%~A="
		)
	)
GOTO :EOF

Open in new window

0
 

Author Comment

by:Watnog
That looks really good except that the timestamp "value" appears to be missing...
The output looks like this:

<timestamp><08]]></timestamp>~<connection><[CDATA[BLBESE]]></connection>~<sql><[CDATA[update eg_ctasi set indclo = '1' where numcta = 'LSKEK114QAG9V28LLXJW';

I attach what's produced.
Again, thanks.
W.
extract.xml
1
 
LVL 58

Expert Comment

by:Bill Prew
Here is a basic VBS approach, give it a try an see what you think.  Save as a VBS file, and run as below after adjusting the folder and output paths near the top.

cscript EE29062182.vbs

' Define folder to scan for files, and output extract file
sBaseDir = "B:\ee\EE29062182\xmltest"
sOutFile = "B:\ee\EE29062182\extract.xml"

' Template for output lines with place holders for extracted text
sTemplate = "<timestamp>[tim]</timestamp>~<connection>[con]</connection>~<sql>[sql]</sql>"

' Create filesystem object, and XMLDOM object for parsing XML
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oXmlDoc = CreateObject("Microsoft.XMLDOM")
oXmlDoc.Async = "False"

' Open output file for writing
Set oOutFile = oFSO.OpenTextFile (sOutFile, 2, True)

' Process all the XML files in the folder
For Each oFile In oFSO.GetFolder(sBaseDir).Files
    If UCase(Right(oFile.Name, 4)) = ".XML" Then

        ' Load the XML file for parsing
        oXmlDoc.Load(oFile.Path)

        ' Process all historyItem nodes (typically only one)
        Set oNodes = oXmlDoc.selectNodes ("/history/historyItem")
        For Each oNode in oNodes

            ' Get the values of child notes we want
            sTim = oNode.getElementsByTagName("timestamp").Item(0).Text
            sCon = oNode.getElementsByTagName("connection").Item(0).Text
            sSql = oNode.getElementsByTagName("sql").Item(0).Text

            ' Build and write extracted values using template to output file
            sOut = Replace(Replace(Replace(sTemplate, "[tim]", sTim), "[con]", sCon), "[sql]", sSql)
            oOutFile.WriteLine sOut

        Next

    End If
Next

' Close output file
oOutFile.Close

Open in new window


»bp
0
 
LVL 14
Hey there Watnog,

It's because at the last tick I tightened up the script into using Delayed expansion instead of using the "CALL" because I thought it looked nicer.

Here it is working with the delayed expansion removed again. :)

REM Script: ConsolidateXMLs.cmd
@(
	SETLOCAL
	ECHO OFF
	SET "_SourceDir=C:\Admin\xmltest"
	SET "_FileGlob=*History.xml"
	SET "_OutputXML=C:\Admin\extract.xml"
	SET "_eLvL=0"
)

CALL :Main

(
	ENDLOCAL
	EXIT /b %eLvl%
)

:Main
	DEL /F /Q "%_OutputXML%"
	FOR /R "%_SourceDir%" %%A IN ("%_FileGlob%") DO (
		ECHO. Processing "%%~fA"
		REM Trim the output
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<timestamp>" "%%~fA"') DO (
			 SET "_Vartimestamp=%%~B"
		)
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<connection>" "%%~fA"') DO (
			SET "_Varconnection=%%~B"
		)
		CALL :OrderThem "%%~fA"
	)
	echo.
	echo. Processing Completed.
	echo. Output File Located here: "%_OutputXML%"
	echo.
	echo. To see output typed to screen press any key
	pause
	type "%_OutputXML%"
GOTO :EOF

:setCMD
	ECHO.SET %*
	SET %*
GOTO :EOF

:OrderThem
	REM Collect the SQL Query:
	FOR /F "Tokens=*" %%B IN ('findstr /C:"<sql>" "%~1"') DO (
		FOR /F "Tokens=*" %%b IN ('ECHO. "%_Vartimestamp%~%_Varconnection%~%%~B"') DO (
			ECHO.%%~b
		)
	)>>"%_OutputXML%"
GOTO :EOF

Open in new window



Here is the resulting XML I generated from your source files:

<timestamp><![CDATA[12/10/17 17:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[select 'update epd01 set monres_fis_2 = '''||((select case when sum(monopn) is not null then sum(monopn) else 0 end from epd10 where typopn='17' and staopn between '4000' and '9000' and numcpt=epd01.numcpt)+(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn in ('04','05') and staopn <='9000' and numcpt=epd01.numcpt)-(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn='02' and staopn between '4000' and '9000' and numcpt=epd01.numcpt))||''' where numcpt = '''||numcpt||''' and monres_fis_2 = '''||monres_fis_2||''';'
<timestamp><![CDATA[12/10/17 16:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update cbd31 set valatt_car='0' where numcar_int in ('B7F29ZOGMJ1S8DQU','B7J11ZOFA138QBSN') and codzon='00Z';]]></sql>
<timestamp><![CDATA[12/10/17 17:02]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update epd01 set monres_fis_2 = '1,47' where numcpt = '103521560782' and monres_fis_2 = '2,94';
<timestamp><![CDATA[12/10/17 15:08]]></timestamp>~<connection><![CDATA[BLBESE]]></connection>~<sql><![CDATA[update eg_ctasi set indclo = '1' where numcta = 'LSKEK114QAG9V28LLXJW';

Open in new window

0
 
LVL 14

Accepted Solution

by:
Ben Personick (Previously QCubed) earned 2000 total points
@Watnot

  Note that the above script goes by general sort (Name) which is the default windows sort.
 
  If you want the XML sorted by modified date then a slight variation would be necessary which is in the script here:

REM Script: ConsolidateXMLs.cmd
@(
	SETLOCAL
	ECHO OFF
	SET "_SourceDir=C:\Admin\xmltest"
	SET "_FileGlob=*History.xml"
	SET "_OutputXML=C:\Admin\extract.xml"
	SET "_eLvL=0"
)

CALL :Main

(
	ENDLOCAL
	EXIT /b %eLvl%
)

:Main
	DEL /F /Q "%_OutputXML%"
	FOR /F "Tokens=*" %%A IN ('DIR /A:-D /B /O:D "%_SourceDir%\%_FileGlob%"') DO (
		ECHO. Processing "%_SourceDir%\%%~A"
		REM Trim the output
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<timestamp>" "%_SourceDir%\%%~A"') DO (
			 SET "_Vartimestamp=%%~B"
		)
		FOR /F "Tokens=*" %%B IN ('findstr /C:"<connection>" "%_SourceDir%\%%~A"') DO (
			SET "_Varconnection=%%~B"
		)
		CALL :OrderThem "%_SourceDir%\%%~A"
	)
	echo.
	echo. Processing Completed.
	echo. Output File Located here: "%_OutputXML%"
	echo.
	echo. To see output typed to screen press any key
	pause
	type "%_OutputXML%"
GOTO :EOF

:setCMD
	ECHO.SET %*
	SET %*
GOTO :EOF

:OrderThem
	REM Collect the SQL Query:
	FOR /F "Tokens=*" %%B IN ('findstr /C:"<sql>" "%~1"') DO (
		FOR /F "Tokens=*" %%b IN ('ECHO. "%_Vartimestamp%~%_Varconnection%~%%~B"') DO (
			ECHO.%%~b
		)
	)>>"%_OutputXML%"
GOTO :EOF

Open in new window


Example Outpur XML:

<timestamp><![CDATA[12/10/17 15:08]]></timestamp>~<connection><![CDATA[BLBESE]]></connection>~<sql><![CDATA[update eg_ctasi set indclo = '1' where numcta = 'LSKEK114QAG9V28LLXJW';
<timestamp><![CDATA[12/10/17 16:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update cbd31 set valatt_car='0' where numcar_int in ('B7F29ZOGMJ1S8DQU','B7J11ZOFA138QBSN') and codzon='00Z';]]></sql>
<timestamp><![CDATA[12/10/17 17:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[select 'update epd01 set monres_fis_2 = '''||((select case when sum(monopn) is not null then sum(monopn) else 0 end from epd10 where typopn='17' and staopn between '4000' and '9000' and numcpt=epd01.numcpt)+(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn in ('04','05') and staopn <='9000' and numcpt=epd01.numcpt)-(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn='02' and staopn between '4000' and '9000' and numcpt=epd01.numcpt))||''' where numcpt = '''||numcpt||''' and monres_fis_2 = '''||monres_fis_2||''';'
<timestamp><![CDATA[12/10/17 17:02]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update epd01 set monres_fis_2 = '1,47' where numcpt = '103521560782' and monres_fis_2 = '2,94';

Open in new window



If you want to reverse the sort orrder than you can change the DIR with "/O:D" to read "/O:-D"

IE, THIS:
	FOR /F "Tokens=*" %%A IN ('DIR /A:-D /B /O:D "%_SourceDir%\%_FileGlob%"') DO (

Open in new window



Becomes THIS:
	FOR /F "Tokens=*" %%A IN ('DIR /A:-D /B /O:-D "%_SourceDir%\%_FileGlob%"') DO (

Open in new window


and the resulting XML looks like this:

<timestamp><![CDATA[12/10/17 17:02]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update epd01 set monres_fis_2 = '1,47' where numcpt = '103521560782' and monres_fis_2 = '2,94';
<timestamp><![CDATA[12/10/17 17:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[select 'update epd01 set monres_fis_2 = '''||((select case when sum(monopn) is not null then sum(monopn) else 0 end from epd10 where typopn='17' and staopn between '4000' and '9000' and numcpt=epd01.numcpt)+(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn in ('04','05') and staopn <='9000' and numcpt=epd01.numcpt)-(select case when sum(montot_ant_tax) is not null then sum(montot_ant_tax) else 0 end from epd10 where typopn='02' and staopn between '4000' and '9000' and numcpt=epd01.numcpt))||''' where numcpt = '''||numcpt||''' and monres_fis_2 = '''||monres_fis_2||''';'
<timestamp><![CDATA[12/10/17 16:01]]></timestamp>~<connection><![CDATA[CRELANE_ThalerProd_epk]]></connection>~<sql><![CDATA[update cbd31 set valatt_car='0' where numcar_int in ('B7F29ZOGMJ1S8DQU','B7J11ZOFA138QBSN') and codzon='00Z';]]></sql>
<timestamp><![CDATA[12/10/17 15:08]]></timestamp>~<connection><![CDATA[BLBESE]]></connection>~<sql><![CDATA[update eg_ctasi set indclo = '1' where numcta = 'LSKEK114QAG9V28LLXJW';

Open in new window

0
 

Author Closing Comment

by:Watnog
I stick to Ben's solution as it allows me to sort and because I have 1 row per file processed: since I need vlookup potential that's crucial. I'm  very grateful.
Thank you both.
W.
0
 
LVL 14
Hey Watnog, glad to help :)
0

Featured Post

[Webinar] Lessons on Recovering from Petya

Skyport is working hard to help customers recover from recent attacks, like the Petya worm. This work has brought to light some important lessons. New malware attacks like this can take down your entire environment. Learn from others mistakes on how to prevent Petya like worms.

Join & Write a Comment

When you receive another warning that your shared drive is almost full and you have asked your users to clean out old files again and again, here is a single command that may help. This command will place all the files that have not been used rec…
This is a fine trick which I've found useful many times, when you just don't want to accidentally run a batch script or the commands needs administrator rights.
Video by: ITPro.TV
In this episode Don builds upon the troubleshooting techniques by demonstrating how to properly monitor a vSphere deployment to detect problems before they occur. He begins the show using tools found within the vSphere suite as ends the show demonst…
Please read the paragraph below before following the instructions in the video — there are important caveats in the paragraph that I did not mention in the video. If your PaperPort 12 or PaperPort 14 is failing to start, or crashing, or hanging, …

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question