Solved

Dealing with multiple xml files

Posted on 2008-10-31
29
718 Views
Last Modified: 2012-05-05
Hi,

I am a ETL developer, I am writing an interface to extract the data from oracle source and generate xml files. Due to lack of space I have to generate multiple xml files. I am new to this board and new to the batch scripting, Please help me out as I am stuck.

There are 2 issues I have.

ISSUE 1) I am generating Multiple xml files with the following naming convention
    01_GAPSMain_CA_20081029131933
    01_GAPSMain_CA_20081029132011
    01_GAPSMain_CA_20081029132145
 - FIrst thing is I have to create a list which contains all the xml files generated.
 - Merge the files, but the merged file should not contain the duplicate common header and closing tags.
  XML files content looks like below

XML FILE1

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<all xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <transaction>
    <table name="ClbMyGapsDelivery">
      <record id="1">
        <hkeycol name="MyBpaMainHkey">1905</hkeycol>
        <datacol name="MyBpaMainHkey">1905</datacol>
       </record>
      <formula column="MyBpaMainPKey">pkey from bpamain </formula>
      <formula column="MyPrdMainPKey">pkey from prdmain </formula>
    </table>
  </transaction>
</all>

XML FILE2

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<all xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <transaction>
    <table name="ClbMyGapsDelivery">
      <record id="2">
        <hkeycol name="MyBpaMainHkey">1905</hkeycol>
        <datacol name="MyBpaMainHkey">1905</datacol>
       </record>
      <formula column="MyBpaMainPKey">pkey from bpamain </formula>
      <formula column="MyPrdMainPKey">pkey from prdmain </formula>
    </table>
  </transaction>
</all>
 
Merged file should look like below

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<all xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <transaction>
    <table name="ClbMyGapsDelivery">
      <record id="1">
        <hkeycol name="MyBpaMainHkey">1905</hkeycol>
        <datacol name="MyBpaMainHkey">1905</datacol>
       </record>
      <record id="2">
        <hkeycol name="MyBpaMainHkey">1905</hkeycol>
        <datacol name="MyBpaMainHkey">1905</datacol>
       </record>
      <formula column="MyBpaMainPKey">pkey from bpamain </formula>
      <formula column="MyPrdMainPKey">pkey from prdmain </formula>
    </table>
  </transaction>
</all>

ISSUE 2) After merging, the merged file name should be with the format   01_GAPSMain_CA_<Maxmum timestamp from all the files". and tjhis file name has to be kept in a list file for further processing.
0
Comment
Question by:vvgpal123
  • 16
  • 12
29 Comments
 
LVL 38

Expert Comment

by:BillDL
ID: 22859616
OK, well I don't know much about XML and you aren't too conversant with batch files, so we'll put our heads together ;-)

Actually, just treat the XML files as plain text files and here's a couple of fundamentals that you can tailor to suit.

So, from XML File 1 you need to strip off the tags:

      <formula column="MyBpaMainPKey">pkey from bpamain </formula>
      <formula column="MyPrdMainPKey">pkey from prdmain </formula>
    </table>
  </transaction>
</all>

and from XML File 2 you need to strip off the tags:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<all xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <transaction>
    <table name="ClbMyGapsDelivery">

Basics:

The command:

copy /a source1.xml+source2.xml /a final.xml

would append the contents of source2.xml to the contents of source1.xml and create the new "final.xml" file.  The /a switches treat the files as ascii text rather than as binary files (/b).

The following command will search the file "source2.xml" for all lines containing the text string "?xml version=" which is unique to the first header tag.

find /i /v "?xml version=" source2.xml

The /i switch tells it to disregard the case.
The /v switch tells it to look for all lines NOT containing the text string
The text enclosed in " " is the string to search for. It does not need to be the entire line, but the batch file will find the Line containing it and act on the whole line as instructed.

Now, you can redirect what would be shown on screen as the Find results to another file using the > symbol after the command like this:

find /i /v "?xml version=" source2.xml > outfile.txt

This unfortunately will contain the unwanted heading created at the top of the output file by the Find command.  To get around this, you can first use the TYPE command which loads the file as though displaying its contents on screen, but then you can PIPE it (ie. pass its results to) into the Find command with the |  symbol, like this:

type source2.xml | find /i /v "<?xml version=" > outfile.txt

Usually it is not recommended using a reserveed batch scripting character like < in a command unless you want it to be used as intended, but in this case because the < of the tag is enclosed in " " as the string to find, it is treated literally.

Note that the single redirection > will overwrite the contents of the file into which screen output or results are directed.  Double >> append the redirected contents.
Try that single command on one *.xml file, replacing the file name as required.  You will see that it strips out the first header.

The following series of commands in your batch file would strip out the 4 common headers from your XML File 2 using the Find command using the /v switch (ie. find all files that DON'T contain the string) and, using several temporary files, eventually write the contents (minus the headers) back to the original source2.xml file, overwriting and replacing its existing contents:

@echo off
type source2.xml | find /i /v "?xml version=" > outfile1.txt
type outfile1.txt | find /i /v "all xmlns:xsi=" > outfile2.txt
del outfile1.txt > nul
type outfile2.txt | find /i /v "table name=" > outfile3.txt
del outfile2.txt > nul
type outfile3.txt | find /i /v "<transaction=" > outfile4.txt
del outfile3.txt > nul
type outfile4.txt | find /i /v "<table name=  > source2.xml
del outfile4.txt > nul

OK, so we can use the same method to strip out the unwanted 5 lines from your XML File 1:

type source1.xml | find /i /v "<formula column=" > outfile1.txt
type outfile1.txt | find /i /v "</table" > outfile2.txt
del outfile1.txt > nul
type outfile2.txt | find /i /v "</transaction" > outfile3.txt
del outfile2.txt > nul
type outfile3.txt | find /i /v "</all" > source1.xml
del outfile3.txt > nul


Follow this with the command to append source2.xml to source1.xml, and you have your merged file:

copy /a source1.xml+source2.xml /a final.xml

All that's missing is the final name of the file containing the date and time.

If you use the following sequence of variables together when naming a file, it should give you the naming convention that you need.  In the example, it will rename "final.xml" to:

01_GAPSMain_CA_yyyymmddhhminminss

This assumes that the following US Time and Date format is output in a Command Window when you type in the two separate commands:

echo %DATE%
echo %TIME%

mm/dd/yyyy ie. 11/02/2008
16:07:58.03

where:

yyyy = 2008
mm = month (now 11)
dd = day (now 02)
hh = hour (24 hour clock)
minmin = minutes past the hour
ss = seconds

ren final.xml 01_GAPSMain_CA_%DATE:~6,4%%DATE:~0,2%%DATE:~3,2%%TIME:~0,2%%TIME:~3,2%%TIME:~6,2%.xml

I'm sure I've got that right.  I'm UK based, and our Date format is DD/MM/YYYY rather than MM/DD/YYYY, so I'm counting characters on screen while typing.

The theory is that the %DATE% system variable can be modified using :~x,y where x = the character to start at, and y = how many characters to include.  Counting begins as zero being the leftmost character of what would be the unmodified %DATE% output, and the idea is to exclude the / and : characters from the fragments taken from the Date and Time output.

Lastly, to list all files of the *.xml file type in a folder just involves using the "Directory" (DIR) command and redirecting it to a list file like this:

dir /on /b /s *.xml > _XML_File_List.txt

The /b switch would tell it to list only the files' names, but the /s switch (as well as looking in sub-folders) converts the names in the list into fully qualified paths.  The /on switch lists files by name, alphabetically descending.  The underscore in the list file name places it at the top of the files in that folder.

Put this all together into a sample batch file that I've pasted into the Code Snippet and test it using two *.xml files named "source1.xml" and "source2.xml".  Let us know if it is something you can work with and we can then modify it to work with more than two source files, probably using the FOR command that loops through things in a batch file.

As I always say here, I use a very clunky and long-handed visual way of writing batch files.  Others will condense this into a few lines, or suggest a Visual Basic script that I'm not too conversant with.

Regards
Bill

@echo off

::

:: Place batch file in same directory as XML files,

:: or else add paths to all file names herein.

::

:: Strip off upper common header tags in XML File 2

::

type source2.xml | find /i /v "?xml version=" > outfile1.txt

type outfile1.txt | find /i /v "all xmlns:xsi=" > outfile2.txt

del outfile1.txt > nul

type outfile2.txt | find /i /v "table name=" > outfile3.txt

del outfile2.txt > nul

type outfile3.txt | find /i /v "  <transaction>" > outfile4.txt

del outfile3.txt > nul

type outfile4.txt | find /i /v "<table name=" > source2.xml

del outfile4.txt > nul

::

:: Strip off bottom common tags in XML File 1

::

type source1.xml | find /i /v "<formula column=" > outfile1.txt

type outfile1.txt | find /i /v "</table" > outfile2.txt

del outfile1.txt > nul

type outfile2.txt | find /i /v "</transaction" > outfile3.txt

del outfile2.txt > nul

type outfile3.txt | find /i /v "</all" > source1.xml

del outfile3.txt > nul

::

:: Create some variables for file name

::

set PREFIX=01_GAPSMain_CA_

set NOW=%DATE:~6,4%%DATE:~3,2%%DATE:~0,2%%TIME:~0,2%%TIME:~3,2%%
 

TIME:~6,2%

::

:: Append XML File 2 to XML File 1 and send to output file

::

copy /a source1.xml+source2.xml /a final.xml

::

:: Rename output file with date and time plus common prefix

::

ren final.xml %PREFIX%%NOW%.xml

::

:: Create list of all source XML files and new one

::

dir /on /b /s *.xml > _XML_File_List.txt

::

:: Clean up variables and close batch file

::

set PREFIX=

set NOW=

exit

Open in new window

0
 
LVL 38

Expert Comment

by:BillDL
ID: 22859660
Hmmm.

For some reason the code that is supposed to rename the final.xml file to one with the date and time isn't working.  I will need to look at this and see where it is wrong, but I need some sleep for now.

I would also recommend NOT overwriting the source1.xml and source2.xml files.  Instead, it would be better to use two new *.txt files named source1.txt and source2.txt.  This is just in case you end up overwriting one of the *.xml files you may later need.

3 lines to change in the batch file, and one to add:

type outfile4.txt | find /i /v "<table name=" > source2.xml
CHANGE TO:
type outfile4.txt | find /i /v "<table name=" > source2.txt

type outfile3.txt | find /i /v "</all" > source1.xml
CHANGE TO:
type outfile3.txt | find /i /v "</all" > source1.txt

copy /a source1.xml+source2.xml /a final.xml
CHANGE TO
copy /a source1.txt+source2.txt /a final.xml

Add line immediately below this:
del source*.txt > nul

New batch file attached as a *.txt file.  Rename to *.cmd or *.bat.

Strip-and-Concat.txt
0
 

Author Comment

by:vvgpal123
ID: 22859667
Appreciate for the answer, Actually there will be number of files not just 2 files. 1) Hedar tags from the first file should remain as is. 2) trailer tags from the last file should remain as is. 3) for the rest of the files it should remove both header and trailer flags.

Once again thanks for the response.
0
 
LVL 38

Expert Comment

by:BillDL
ID: 22859685
Watch out for the code snippet boxes.  My pasted batch file was screwed up where one of the lines was split over onto a new line.  Use the
0
 
LVL 38

Expert Comment

by:BillDL
ID: 22859709
Sorry, I missed your response.  I was just heading to bed and checked my emails one last time and saw you had responded.

If the batch file works as intended for you on two test files, then I'm sure it can be modified to process all *.xml files in a given folder.

I'm thinking along the lines of:

1. Stripping out the common headers from the start of an *.xml file and storing them in a *.txt file
2. Doing the same with the trailing lines and storing them in another text file.
3. Stripping out ALL the common headers from ALL *.xml files
4. Merge ALL the stripped-down *.xml files to one text file.
5. Append the file created at step 4 to the file created at step 1, and then appending the file created at step 2 to the very end of the new file.

The FOR command allows you in a batch file to process all files of a named type in a folder and repeat the specified action for all.  This could be achieved by using the CALL command to execute a 2nd batch file to process each *.xml file, and then repeat that for all *.xml files found in the folder.

If the test batch file works, then this can all be done next.
0
 

Author Comment

by:vvgpal123
ID: 22859729
BillDL,
Can you please modify a bit and send the code as I am new this scripting. Take your time please.
Thanks.
0
 

Author Comment

by:vvgpal123
ID: 22859804
Bill this is just a suggestion based on your idea.

In the generated xml data files there will be lot of common xml elements which we should not remove, so my suggestion is, how about keeping the common header and common trailers in 2 text files.

Here is an example

XML FILE1

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<all xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <transaction>
    <table name="ClbMyGapsDelivery">
      <record id="1">
        <hkeycol name="MyBpaMainHkey">1905</hkeycol>
        <datacol name="MyBpaMainHkey">1905</datacol>
       </record>
      <formula column="MyBpaMainPKey">pkey from bpamain </formula>
      <formula column="MyPrdMainPKey">pkey from prdmain </formula>
    </table>
  </transaction>
</all>

XML FILE2

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<all xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <transaction>
    <table name="ClbMyGapsDelivery">
      <record id="2">
        <hkeycol name="MyBpaMainHkey">1905</hkeycol>
        <datacol name="MyBpaMainHkey">1905</datacol>
       </record>
      <formula column="MyBpaMainPKey">pkey from bpamain </formula>
      <formula column="MyPrdMainPKey">pkey from prdmain </formula>
    </table>
  </transaction>
</all>


XML FILE3

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<all xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <transaction>
    <table name="ClbMyGapsDelivery">
      <record id="2">
        <hkeycol name="MyBpaMainHkey">1905</hkeycol>
        <datacol name="MyBpaMainHkey">1905</datacol>
       </record>
      <formula column="MyBpaMainPKey">pkey from bpamain </formula>
      <formula column="MyPrdMainPKey">pkey from prdmain </formula>
    </table>
  </transaction>
</all>




Step1: we can remove the headers and trailers from all the files in the first step.


step2: we create 2 text files, with common headers and common trailers.


    text file 1:( header text)

      <?xml version="1.0" encoding="UTF-8" standalone="no"?>
      <all xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
          <transaction>
              <table name="ClbMyGapsDelivery">

     text file 2:( trailer text)


      <formula column="MyBpaMainPKey">pkey from bpamain </formula>
      <formula column="MyPrdMainPKey">pkey from prdmain </formula>
    </table>
  </transaction>
</all>
     

step3:

   After strpping out the header and trailer records from all the files, we will merge all the files, along with the predefined text file header and trailers.


Bill Just FYI... I have to do this for 40 different interfaces, this GAPS is just one Interface.

Thanks
0
 
LVL 38

Expert Comment

by:BillDL
ID: 22860893
Hi again

If the actual XML file structure is the same for all of the 40 interfaces, and it is just the odd line of text that contains reference to the interface it was derived from (eg. GAPS), then a common batch file would work on all of them and you would just have to change the part in the batch file concerned with file naming.

It is probable that there is a pre-compiled program out there as freeware, shareware, or retail, which would allow you to do the job (or part of it) from within a Windows dialog.  A quick google search on the words "strip 10 lines from text file" revealed this page of utilities:
http://wareseeker.com/Utilities/Remove-Delete-Strip-Metadata-In-Multiple-Files-Software-7.0.zip/8010094
Obviously a more refined search would reveal programs more aligned to your specific needs and might make the job a bit more visual and involve less testing or tweaking code in a batch file.

Linux/Unix tends to have a lot more single programs that allow you to manipulate files, such as searching and replacing text using the GREP command.  Many commands have now been ported to Windows Executables and you'll find them included in programs like this:
http://www.powergrep.com/replace.html

There are two Linux/Unix commands that I can think of from my college days that are concerned with stripping off or reading specific lines in files: TOP and TAIL.  These have been compiled into Windows programs, and you may even find them in some of the Windows Support Tools or Resource Kits.

I'm going to try and finish what I started, ie. using batch programming WITHOUT calling any other program files.  I'm not saying it's the best method, but if it works then why not use it.  You should be aware that a big clunky batch file can take quite a while to process a lot of files in comparison to a little Windows program.  

OK, I'm going to make the batch file create a HEADER.txt and FOOTER.txt that will be re-used later.  Then it will remove ALL the common lines from ALL *.xml files in one folder.  All of those *.xml files will then be appended to the first, sequentially, with the Header in place at the top and Footer text applied at the very end.

I will try to complete this now and post when done, but I hope you can be patient if I am unable to post back today.  I have quite a lot on.

Bill
0
 
LVL 38

Expert Comment

by:BillDL
ID: 22860906
By the way, the batch file that I attached as a *.txt file should work as it is on two test files named "source1.xml" and "source2.xml".  From my tests it did just what you need, except that the creation of the final file name inclusive of the Time and Date isn't working meantime.

If you want to miss out the batch processing part that extracts the "header" and "footer", and just create these ahead of time for re-use, then that would be a bit easier.
0
 

Author Comment

by:vvgpal123
ID: 22861945
Yeah its better to have header and trailer files ahead, so that we can use these for removing from xml files and also for appending at the end.
0
 
LVL 38

Expert Comment

by:BillDL
ID: 22863693
Hi, I had a few minutes spare before going out to work and I looked at the problem and the most universally acceptable solution.  I need to check with you before continuing and finalising by proposed solution in the morning after I return from work though.

My proposal will have a header.txt and footer.txt prepared in advance.  It will use 2 batch files:

The 1st batch file is designed to strip off ALL the unwanted lines from ONE *.xml file and write the remainder to a temporary file.

The 2nd batch file calls the first batch file and tells it to process a named *.xml file.  What it does is walk through the folder containing the *.xml files sequentially and call the other batch file with the name of the next *.xml file found, and therefore only one file is processed at a time.

Because this is done sequentially, the contents being appended to the temporary file should be in order.

Finally, the 2nd batch file creates the final *.xml file by writing the header to it, then appends the cleaned up lines from the temporary file, and then tacks on the footer.  Lastly, it renames the file with the time and date and adds this to a running log of *.xml files created.

Does that sound OK to you?
Is it OK to have two batch files working together?

If so, I'll finish this off and you can test it.
0
 

Author Comment

by:vvgpal123
ID: 22863712
I dont have any problem with that, thnx
0
 
LVL 38

Expert Comment

by:BillDL
ID: 22869700
OK, I have created two batch files that work together and should do what you need to achieve.
If you look at screenshot No. 1, you will see the following batch files:
"_Process_XMLs.cmd"
This is the main batch file that should be run.  It does the "calling" and all other processing apart from the one task that the next batch file does.
"_Strip_Files.cmd"
All this does is receive the next named *.xml file in that folder, strip out unwanted tag lines, and append the remainder to a temporary file named "output.txt".  With each pass (ie. for each *.xml file found in that folder), it loops back to the main batch file until all *.xml files in the folder are processed.
The *.xml files in screenshot No. 1 are examples created from yours.  The only thing that differs in them is the "record number" tag line:
<record id="1"> - source1.xml
<record id="2"> - source2.xml
<record id="3"> - source3.xml
etc.
That appears from your examples to be the only difference.  I don't know *.xml code, but I am surprised that is the only difference.  I can only assume that the following tag lines will each contain unique data:
<hkeycol name="MyBpaMainHkey">1905</hkeycol>
<datacol name="MyBpaMainHkey">1905</datacol>
If so, then that is fine, because the FIND command used in the batch file only looks for the actual tag names, and will select the entire line for each record and append it into the final file.
PLEASE CONFIRM that this IS the case, because the batch files would need to be modified if I have not fully grasped what data will be in your *.xml files.
Anyway, these are the test files I used.  The file names are NOT important, because they are read in as each one is retrieved and processed sequentially.  So this should work with ANY *.xml file names.
I've named the batch files with a _ prefix so that they sit up at the top of the files when copied into any folder.
Going back to the main "_Process_XMLs.cmd" batch file, here's what it does in sequential order:
  1. Stores the "01_GAPSMain_CA_" file prefix for later use
  2. Creates a new log file of processing activity
  3. Selects an *.xml file in the folder and creates a source text file
  4. Tests for the presence of the new file and errors out if not found
  5. Extracts specific lines from source text file and creates Header.txt
  6. Extracts specific lines from source text file and creates Footer.txt
  7. Walks the folder sending names of *.xml files to "_Strip_Files.cmd"
    (this creates a new "outfile.txt" with sequential records)
  8. Creates new "merge file" by combining header, outfile, and footer
  9. Renames merged file with naming convention including date and time
  10. Creates a directory listing of new file and files used to make it.
  11. Deletes superfluous files from that folder.
Screenshot No. 2 shows the files that will be created.
_XML_File_List.txt is created with the following layout:
New Merged File:
G:\DOCS\Desktop\EE_XML\01_GAPSMain_CA_20081103165929.xml
 
Created From source Files:
 
source1.xml
source2.xml
source3.xml
source4.xml
source5.xml
_debug_log.txt (from my test) contains this content:
Log file started 03/11/2008 at 16:59:26.01
 
Creating '_source_file.txt' from one xml file
Source File Created OK
Creating header file fragment
Creating footer file fragment
Passing control to strip batch
stripping out lines of source1.xml
returning control to main
stripping out lines of source2.xml
returning control to main
stripping out lines of source3.xml
returning control to main
stripping out lines of source4.xml
returning control to main
stripping out lines of source5.xml
returning control to main
Control returned from strip batch, percent 1 is now ""
All xml files processed
Merging Header and Footer to final file
Merging done, concatenated file is 'final.xml'
'_final.xml' copied as 01_GAPSMain_CA_20081103165929.xml
You may notice that the TIME in the new merged file name (ie. 165929) does not match the time that the log file started (ie. 16:59:26). In other words, it took 3 seconds to process 5 *.xml files.
You may be able to get a rough idea of how long it would take to process the number of *.xml files that you will have in one folder.  The batch files are fully commented, and there is selected screen output that slows things down marginally.  The logging also uses up a bit of time, but the main bit that slows things down is the fact that the main batch file has to call the other one (ie. pass control to it) and then repeat this for as many times as there are *.xml files in the folder.
This is what I was describing about my way of writing batch files being clunky, but I like them to be laid out in a very visual and easy to follow format because I usually have to write them in between interruptions.  I am absolutely certain that someone could condense this into 10 lines using Visual Basic Script (ie. as a single *.vbs file) or with some more mature batch file scripting.
Your final merged *.xml file (in my example: 01_GAPSMain_CA_20081103165929.xml) is in the following format:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<all xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <transaction>
    <table name="ClbMyGapsDelivery">
      <record id="1">
        <hkeycol name="MyBpaMainHkey">1905</hkeycol>
        <datacol name="MyBpaMainHkey">1905</datacol>
       </record>
      <record id="2">
        <hkeycol name="MyBpaMainHkey">1905</hkeycol>
        <datacol name="MyBpaMainHkey">1905</datacol>
       </record>
      <record id="3">
etc, down to the closing tag of record No. 5:
       </record>
      <formula column="MyBpaMainPKey">pkey from bpamain </formula>
      <formula column="MyPrdMainPKey">pkey from prdmain </formula>
    </table>
  </transaction>
</all>
As I mentioned earlier, I assume that each <hkeycol> and <datacol> tags will contain different data.

I sincerely hope that I have not misinterpreted what you need or I will need to go through the batch files and modify/test them.   Please tell me I understood correctly :-)
OK, so the CRITERIA for using the batch files:
  • As written, they will process ONLY the *.xml files in the folder from which they are run.  This can be changed quite easily to work on a sub-folder, or even adding a new Right-Click > "Send To.." shortcut or similar if preferred.
  • BOTH batch files MUST remain together in the same folder.  Again this can be changed easily if needed.
  • I have only tested this on a directory path with no spaces, ie. G:\DOCS\Desktop\EE_XML\   I DO NOT know if it will work on file names with spaces or in directory paths with spaces.  The workaround is to enclose paths in " " but that can sometimes cause complications with some commands that treat " literally.
  • I have ONLY tested this in Windows XP SP3.  I'm sure it will work in Windows XP with no Service Packs, and PROBABLY in Windows 2000, but I don't know for sure.
  • You can name "_Process_XMLs.cmd" whatever you wish, but DO NOT rename "_Strip_Files.cmd" without changing all references in the main batch file to reflect the changed name.
  • Most importantly, "_Process_XMLs.cmd" has been written specifically for your "GAPSMain" *.xml files.  To use it in another folder on *.xml files that relate to another contract (or whatever the "GAPSMain" is a reference to), you need to change the batch file in two places as follows:
Right up near the top of the batch file you will see the line:
set PREFIX=01_GAPSMain_CA_
This is the text string that will be used to create the first part of the new and final file name.  To have the final *.xml file named with eg. "07_GAPSAncilliary_TX_", you would just change that line to:
set PREFIX=07_GAPSAncilliary_TX_
The other place you would need to change a line is down almost at the end of the batch file:
:: Create list of all source XML files and new one
::
echo New Merged File: >> _XML_File_List.txt
dir /on /b /s *GAPSMain*.xml >> _XML_File_List.txt
echo. >> _XML_File_List.txt >> _XML_File_List.txt
echo Created From source Files: >> _XML_File_List.txt
echo. >> _XML_File_List.txt
dir /on /b *.xml | find /i /v "GAPSMain" >> _XML_File_List.txt
::
echo Finished!
echo New merged file is %PREFIX%%NOW%.xml
The FIRST DIR command lists all the names of *.xml files CONTAINING "GAPSMain" in the FILE NAME and would output the list to screen.  But in this case the results are sent to your listing file "_XML_File_List.txt" to form the line:
New Merged File:
G:\DOCS\Desktop\EE_XML\01_GAPSMain_CA_20081103165929.xml
The purpose of the 2ND DIR command is to append only the names of the OTHER *.xml files FROM WHICH your final file was created, and creates the following lines in my file list:
Created From source Files:
 
source1.xml
source2.xml
source3.xml
etc.
This was just to make the file listing look neater, that's all.  If aesthetics aren't important, then we can lose this portion of the batch file and just create a list of ALL *.xml files in the folder, where the final renamed file may sit somewhere in the middle of the file list.
That's about all I can think of, and I'm sure 've covered everything.
I will upload the two batch files as .txt file attachments.  Rename them with the *.cmd file extension to replace the *.txt extension.
Please test them on a folder containing copies of the files.  I would hate to have them mess up irreplaceable source files.  I'm sure it will work, but you can't be too safe.
Regards
Bill

-Process-XMLs.txt
-Strip-Files.txt
01-Files-Needed.jpg
02-Resultant-Files.jpg
0
 

Author Comment

by:vvgpal123
ID: 22870285
HI,
I tried to test the scripts, but it didnt generate the merged file.

Here is what i did
1) created 4 test files
2) renamed both the script files
3)copied all the test files and scripts into GAPSMAIN folder
4) ran the first script.

I am sending all the scripts, test files and generated logs in this zip file. Please rename the test files to .xml and scripts to .cmd

Thank you


GAPSMAIN.zip
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:vvgpal123
ID: 22870483
Please find the attached document, which is showing some error messasge in dos window
Doc1.doc
0
 

Author Comment

by:vvgpal123
ID: 22871439
Bill,

I am still testing and make it work, but one more suggestion, in the _Strip_Files.cmd, instead of hardcoding the values there, we can take the lines from header and fooder and exclude these lines from all the xml files. becase those cloumns are going to be like 20 or 30 of them.
0
 

Author Comment

by:vvgpal123
ID: 22871506
Sorry let me put it clearly, in the  _Strip_Files.cmd script, you have hard coded based on the sample I gave it, but in realitity there are going to be 20 -30 columns that too they are diferent for each interface. so the best thing is we have to make use of our extracted HEADER and FOOTER files, to strip the xml files. basically do a minus of these header and footer files from each xml file.
0
 

Author Comment

by:vvgpal123
ID: 22871670
Bill,
While I am doing testing, I commented the line
"del _final.xml > nul ", I could see the _final.xml created with all the data showing up properly. we are almost closer. I think the problem is after this line.

Please see the log below. if you look at the last line, final renamed file doesnt look properly. I think that should be the problem.


Log file started Mon 11/03/2008 at 16:40:20.45
Creating '_source_file.txt' from one xml file
Source File Created OK
Creating header file fragment
Creating footer file fragment
Passing control to strip batch
stripping out lines of 01_GAPSMain_CA_1.xml
returning control to main
stripping out lines of 01_GAPSMain_CA_2.xml
returning control to main
stripping out lines of 01_GAPSMain_CA_3.xml
returning control to main
stripping out lines of 01_GAPSMain_CA_4.xml
returning control to main
Control returned from strip batch, percent 1 is now ""
All xml files processed
Merging Header and Footer to final file
Merging done, concatenated file is 'final.xml'
'_final.xml' copied as 01_GAPSMain_CA_/03/ 1Mo164022.xml

Bill to finilize this script we need to do few things.
1) need to fix the error which is shown above.
2) _Strip_files.cmd need to use he HEADER and FOOTER files rather than hard coding.
3) Can you make change so that I can pass directory name along with path as parameters to the script
for all the temp files, source files and the final merged files. I tried to do that but ended up with errors.

Thanks for the effort.

0
 

Author Comment

by:vvgpal123
ID: 22871956
Hi Bill,
I could able to fix the issue by copying the code from some samples.

The change I made is in the _Process_XMLs.cmd

instead of the line
set NOW=%DATE:~6,4%%DATE:~3,2%%DATE:~0,2%%TIME:~0,2%%TIME:~3,2%%TIME:~6,2%

I used the following lines

for /f "tokens=2-4 delims=/ "  %%a in ("%date%") do (set month=%%a& set day=%%b& set year=%%c)
for /f "tokens=1-3 delims=/:." %%a in ("%time%") do (set /a hour=%%a& set minute=%%b& set second=%%c)
REM If the hour is single digit, prefix it with a zero.
if %hour% lss 10 set hour=0%hour%
set NOW=%month%%day%%hour%%minute%%second%

now the file is getting generated.

Apart from this can you please make the changes for other 2 things which I mentioned in the previous comment. Thank you
0
 

Author Comment

by:vvgpal123
ID: 22872322
Hi Bill,

In the _Process_XMLs.cmd, the header and footer tags will be constant for all the files and also for all the other 40 interfaces. You can modify the code so that it creates the Header and footer files only once, basically those files will get created once after that we use the same these static files for stripping from the all the xml files and at the end the same header and footer files will be used for appending.

Just a suggestion for the code you have written, instead of using those text files I think you can use few variables for header and few variables for footer and append the variables to create header and footer files.

:: Create HEADER.txt from new '_source_file.txt'
::
echo Creating header file fragment >> _debug_log.txt
echo Creating header file fragment
::
type _source_file.txt | find /i /v "<record id=" > outfile1.txt
type outfile1.txt | find /i /v "hkeycol" > outfile2.txt
del outfile1.txt > nul
type outfile2.txt | find /i /v "datacol" > outfile3.txt
del outfile2.txt > nul
type outfile3.txt | find /i /v "</record>" > outfile4.txt
del outfile3.txt > nul
type outfile4.txt | find /i /v "<formula column=" > outfile5.txt
del outfile4.txt > nul
type outfile5.txt | find /i /v "hkeycol" > outfile6.txt
del outfile5.txt > nul
type outfile6.txt | find /i /v "</table>" > outfile7.txt
del outfile6.txt > nul
type outfile7.txt | find /i /v "</transaction" > outfile8.txt
del outfile7.txt > nul
type outfile8.txt | find /i /v "</all" > HEADER.txt
del outfile*.txt > nul
::
:: Create FOOTER.txt from new '_source_file.txt'
::
echo Creating footer file fragment >> _debug_log.txt
echo.
echo Creating footer file fragment
::
:: The following command would create the footer.txt file as
:: long as the xml files all have the same number of lines
::   more +8 _source_file.txt > FOOTER.txt
:: Delete the block of code below if using the 'more' command.
::
type _source_file.txt | find /i /v "?xml version=" > outfile1.txt
type outfile1.txt | find /i /v "all xmlns:xsi=" > outfile2.txt
del outfile1.txt > nul
type outfile2.txt | find /i /v "table name=" > outfile3.txt
del outfile2.txt > nul
type outfile3.txt | find /i /v "  <transaction>" > outfile4.txt
del outfile3.txt > nul
type outfile4.txt | find /i /v "<table name=" > outfile5.txt
del outfile4.txt > nul
type outfile5.txt | find /i /v "<record id=" > outfile6.txt
del outfile5.txt > nul
type outfile6.txt | find /i /v "hkeycol" > outfile7.txt
del outfile6.txt > nul
type outfile7.txt | find /i /v "datacol" > outfile8.txt
del outfile7.txt > nul
type outfile8.txt | find /i /v "</record" > FOOTER.txt
del outfile*.txt > nul
0
 

Author Comment

by:vvgpal123
ID: 22873390
Hi Bill,
Please find the attached zip file with all the scripts, source files and generated targets. Please rename the scripts with .cmd and source files with .xml.
I am able to make the script work but I made changes in both the scripts, of course all this is based on the script which you developed. I am passing the parameters to the main script.

After renaming the files just call the script "Call_With_Parameters.txt", it will do rest of the job. Please review the code and suggest the best methods to implement.

Thanks a lot for so much of effort from you.

AWAITING for your comments and suggestions.
Copy-of-GAPSMAIN.zip
0
 
LVL 38

Accepted Solution

by:
BillDL earned 500 total points
ID: 22885419
Hi, and sorry to have left you for a day.  I had a very busy night before last, and I was too tired to think when I got home.  You seem to have been working away at this with some success.

The reason that the batch files I posted did not work for you is simply that the code to extract only the digits from the date and miss out the / characters didn't match your date format.  It ended up including two / characters which made the path nonsensical because / is a non-permissible character in a file path.

Your date format (from the log file) is:
Mon 11/03/2008
(Day_Name mm/dd/2008)
and the time format is:
14:33:29.15
(hh:mm:ss.cc - where cc = hundredths of a second)

The batch file code did not take account of the fact that you have a week day like Mon, Tue, Wed, Thu, Fri, Sat, Sun leading the mm/dd/yyyy.

Based on the Day Name always being a 3 character word, the following command typed into a command window should extract, combine, and display a string containing ONLY the numeric characters of the date and time, and also miss out the / characters from the date (remember, the 'M' of 'Mon' is your Zero or "offset" character):

echo %DATE:~10,4%%DATE:~4,2%%DATE:~7,2%%TIME:~0,2%%TIME:~3,2%%TIME:~6,2%

Should now be (today being 5th Nov 2008 and 10:58am):
20081105105845
(yyyymmddhhmmss)

You'll see where it was wrong by looking at the file name (after the "01_GAPSMain_CA_" %prefix%) generated by my batch file on your system, which made it impossible to create a file by that name:

Mon 11/03/2008 <------- your date
/03/ 1Mo143331 <------- characters grabbed for file name

%DATE:~6,4% - The 6 grabbed the first / from the left, being offset 6, and included 4 characters, ie. /03/

%DATE:~3,2% - The 3 grabbed the space after Mon and included that and the first 1 of the month

%DATE:~0,2% - The zero grabbed the M of Mon and included that and the lowercase o of Mon

To correctly capture the characters at the desired offsets in the output from the %TIME% variable on your system you need:
%DATE:~10,4%%DATE:~4,2%%DATE:~7,2%

The characters extracted from the various offsets of the output from the %TIME% variable are OK.

OK, that addresses where the process of the file naming went wrong.

The UNC Path Error is one that I would try and avoid arriving at by using relative path references.  That was the main reason that I proposed placing the batch files right in amongst the *.xml files.  That way you would just need the names of the files, and the batch file assumes them to be in the same folder.

The following would also be useful for referencing directories relative to the location of the batch file.

..\sub-folder\filename.xml

ie. back up one folder level from the batch file and then forward and down to a sub-folder.

The variable %0 holds the name of the batch file being run, and includes the path to it enclosed in " ".

In terms of directory paths, %CD% will give you the "Current Directory" as a fully qualified path WITHOUT a trailing backslash and NOT enclosed in " "

I have attached a text file that lists additional command line Modifiers acceptable to Windows XP's 'FOR' command, but the following %0 Modifier will return the Current Directory (ie. the folder from which the batch file was run) WITH the trailing backslash if needed:

%~dp0

Also useful in stripping off the trailing backslash from a directory path held in the variable %1 is:

set STRING=%1
if "%STRING:~-1%"=="\" set STRING=%STRING:~0,-1%

You can use Modifiers to choose what portions of a string referenced by a variable you want to include or reject, like this:

%STRING:~10,5%
Uses only the next 5 characters from the 10th offset position, reading left to right.

%STRING:~-10%
Uses only the LAST 10 characters of the string, beginning from the last offset (reading from left to right).

%STRING:~0,-2%
Uses all EXCEPT the last two characters (offsets) of the string, reading the string from left to right.

Anyway, you seem to have circumvented some issues VERY effectively and craftily using the FOR command to break up the DATE into fragments, resulting in the creation of the %NOW% variable.  I like your troubleshooting IF statement to take account of single digit hours that I had overlooked:

if %hour% lss 10 set hour=0%hour%

I believe that you made a good choice by "calling" the main batch file using one that can be modified to suit each of the 40 file sets that you need to work on.  It makes sense having to only make changes in one batch file for each job.

There is ONE place that I think I MAY be able to improve (for want of a better expression) your batch files.  I changed "Call_With_Parameters.cmd" as follows, making use of the "Current Directory" (%CD%) system variable that I mentioned earlier:

for /F "delims=\" %%I in ("%CD%") do set FRAGMENT=01_%%~nI_CA_
Process_XMLs.cmd %FRAGMENT% Merged_file_List.txt %CD%\temp\ %CD%\src\ %CD%\tgt\ %CD%\

If you look at the attached text file showing the modifiers that can be used with the "FOR" command in XP, you will see the following:

%~I         - expands %I removing any surrounding quotes (")
%~nI        - expands %I to a file name only

If the command used was this:

for /F "delims=\" %%I in ("%CD%") do set FRAGMENT=%%~nI

Then it would only store the folder name in %FRAGMENT%, but used with the syntax suggested it picks up the FOLDER NAME rather than a FILE NAME, and to that I have made the "do SET=" command prefix the Current Directory's folder name with "01_" and suffix it with "_CA_" ready for the file names.

I hope that this might make your set of batch files universally usable with ALL your project file sets without any individual modification, BUT it assumes that the folder containing your 3 batch files will ALWAYS be named the same as the prefix wanted for the name of the final merged file etc, and with "01_" and "_CA_" added.  For example, if your project folder is:

C:\BILLSMain\

then the file name prefix will be "01_BILLSMain_CA_"
where the "BILLSMain" bit is derived directly from the master folder containing the batch file being run.

I have tested your final batch files using this modified "Call_With_Parameters.cmd" batch file.  This involved moving the folder around from drive to drive, and renaming the master folder.  It reflected the new folder name, and the parent folders I moved it all in and out made no difference to the results.

Great Job :-)

Let me know what you think about my little modification.

Regards
Bill
FOR-Command-Modifiers.txt
Call-With-Parameters.txt
0
 

Author Closing Comment

by:vvgpal123
ID: 31512090
Thanks Bill for so much of your time. Can you please send your email address to my personal email, vvgpal@hotmail.com
0
 
LVL 38

Expert Comment

by:BillDL
ID: 22885758
It's a pleasure, and don't forget that you put an extraordinary amount of effort into this yourself :-)
0
 

Author Comment

by:vvgpal123
ID: 22885855
Bill, would you mind giving your email address please
0
 
LVL 38

Expert Comment

by:BillDL
ID: 22885970
You will find it in my profile.  Just click on my user name, which is a link to my profile page.  You will see it in the first section of what may be an interesting read (or not!) ;-)
0
 
LVL 38

Expert Comment

by:BillDL
ID: 22885982
Hopefully you will want to post me a flight ticket to Southern CA where it's a lot warmer than it is here right now, he, he.
0
 

Author Comment

by:vvgpal123
ID: 22886002
lol...........
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
With the advent of Windows 10, Microsoft is pushing a Get Windows 10 icon into the notification area (system tray) of qualifying computers. There are many reasons for wanting to remove this icon. This two-part Experts Exchange video Micro Tutorial s…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now