Merge Excel file with contents of XML file - Part 2

Hi guys

Thanx to Geert yesterday via my previous question ( I was able to merge the contents of an XLSX file and an XML file to provide me with a completed XML file.

We now move onto another step that I need to work upon and hopefully it's something we can accomplish. I'll walk through the scenario we need to perform:

We receive a hard copy paper file that needs to be scanned. At the same time we receive an XLSX file that relates to this paper file;
The XLSX file is saved to a directory;
We scan the paper file and as part of the process we output metadata contents into an XML file that relates to this;
The scanner will output the XML file directly to the same directory we saved the XLSX file to (the file name will be identical except one file is XLSX and the other XML;
We run the merge routine to join the XLSX file and the corresponding XML file together (both files will have the same name);
We output the resulting file into a new sub-directory with the same name as the file and it will have the same name as the original XML file;

So Step 1 is that we receive the XLSX files and we would store them like this:

Directory with XLSX Files only
Step 2 would be to perform the scanning and the output files would be added to the directory like this:

Directory with XLSX and XML files
Step 3 is after we've performed the merge of each file and the directory would look like this:

Directory with XLSX, XML and sub-folders
Step 4 - The newly created file appears in the sub-folder

Finalised sub-directory with newly created merged XML file
From my earlier question ( I was able to determine we can merge the files together but we need to convert the XLSX files to XML before we can perform the merge work that Geert was so kind to walk me through.

In order to achieve this yesterday we created a .XSD file that we manually attached to the Excel file to allow us to do the mapping and the extract. I've attached an example of the Box 1A.XLSX file and the Box 1A-converted.XML file for comparison. The issue we have with the XLSX file is it comes to us with some fields merged together (so we have to remove the merge from these fields) then remove the blank columns. Doing this manually is easy enough - open the file; highlight the worksheet; right-click and choose to remove the merge; using Find & Select to remove the blank columns is quite simple but I'm looking to automate the create of the XML file from the XLSX if I can.

So I'm looking for advice on how we automate portions of this. We have the ability to do the work on a file by file basis but I'm now looking to perform this work on all files in a directory (we can output the files to a different directory if it helps matters). The idea would be that the job would be run once a day and everything contained in the folder would be used - Box 1A files joined together and output to the Box 1A directory; Box 2A files joined together and output to the Box 2A directory and so on...every night the folder area would be cleared out and all files/sub-directories would be removed and a fresh batch of data would be added the next day.

Hope this makes sense...I know it looks a little complicated but hopefully my directory snapshots and the files (plus the original question) make things a little easier to follow. If this has to be broken down into multiple questions (Excel conversion and then the automation process) then no problems and I'm happy to do this as well.

Here is a copy of the FileCreate batch file we have that pulls the 2 files and outputs them to a 3rd file. We need to automate this to pick up each file in the directory, with the corresponding ""-converted.XML file (so Box 1A.XML would be matched to Box 1A-converted.XML and so on) and we also need to create a directory based upon the name of the original XML file (the one without -converted at the end):

set root=C:\ProjectFolder\Project002\

set xslt=%root%ProjectFiles\merge-files.xsl

set saxon=%root%ProjectFiles\saxon9he.jar

set file1=%root%DataFiles\Box 1A-converted.xml

set file2=%root%DataFiles\Box 1A.xml

set file3=%root%DataFiles\Box 1A\Box 1A.xml

java -Xmx1200M -Xss3000k -cp %saxon% net.sf.saxon.Transform -xsl:%xslt% -

s:"%file2%" -o:"%file3%" initial-file-name="%file1%"

Open in new window

Hope this all makes sense and thanx for looking.
Steven O'NeillSolutions ArchitectAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Gertone (Geert Bormans)Information ArchitectCommented:
Hi Steven,

If I am not mistaken, I see two tasks in your resume
1 facilitate the Excel to XML part
2 run yesterdays operation on a multitude of files in a directory

2. is the easy part. A bit of loop programming in a windows batch file and you are done
Run it once a day as you please and you are good to go. Using variables as we did in the posted batch script gives you a high degree of configuration

About 1. I noted in your example excel (so I know what you are talking about I believe) dat there are a bunch of unmerged columns. Trying to bind that to XML fields is on option, but might be tiredsome. If the files are consequently the same, I would save all the excel files as XML 2003 spreadsheet format and use a second XSLT... no manual work required
Steven O'NeillSolutions ArchitectAuthor Commented:
Hi Geert

Yes you are correct in both parts. If you've got something for Part 2 that is appreciated.

Part 1 is the problem area in my mind as well and I thought the easiest way was to do something like you say (take the original XLSX file and save it as another format for XML purposes) but I still thot that columns B and D & E would have been the issue. I only need the data from what is effectively columns A, C, F & G (so we can discard B and D & E plus all the rest of the columns for now) but I wasn't sure of the best way to handle this (which is why I ended up doing this manually with the removal of the merges and then the Find & Select option to delete the blank columns).

If you think that saving everything as a particular file format in a batch is a way forward (again I'm quite open to doing this whatever way is easiest tbh) then I'm happy to listen to this.

Hope that helps.

Gertone (Geert Bormans)Information ArchitectCommented:
That is how I usually tackle Excel 2 XML automation
(Batch) save as spreadsheet XML 2003 => XSLT => done

excel => ssxml2003 => XML1
scan => metadata xml2
merge xml1 & xml2 => xml3

sounds a perfect workflow to me

with a bit of VB or C# you can do the excel to ss xml 2003
I have done the XSLT for ssxml2003 a zilion time
it is a breeze that way

... but likely not today :-(
Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

Steven O'NeillSolutions ArchitectAuthor Commented:
Hi Geert

Thanx for this. Absolutely not needing it today (too late in the day for us both I suspect lol).

If there's anything you can provide tomorrow/Thursday or Friday then that would be ideal :)

Thanx again

Gertone (Geert Bormans)Information ArchitectCommented:
Will visit customers for the next two days. Will try to do something in between. If not, will be in my office all Friday
Steven O'NeillSolutions ArchitectAuthor Commented:
Sweet...thanx Geert
Steven O'NeillSolutions ArchitectAuthor Commented:
Hi Geert

Hope you've had a good couple of days. Was wondering if/when you'll have the chance to review this question?

Thanx again

Gertone (Geert Bormans)Information ArchitectCommented:
Hi, been bitten by a sudden increase in my workload :-(

Anyway, I made an XSLT that transforms your excel file nicely into the XML required.
As a source you need to safe the excell as Spreadsheet XML 2003
You could open them all and save as... or you could write a little .net program that does that

Let's discuss next steps as soon as you tested this
Steven O'NeillSolutions ArchitectAuthor Commented:
Hi Geert

Thanx for getting back to me and thanx for the file. Question on usage though...where should I use this? Does it get used at the FileCreate file stage (where we state the root location, the xslt file (merge-file.xsl at present); set saxon and the 3 files before running everything together) - I've supplied my FileCreate file within the original post in case you want to review.

Or am I supposed to use this independently atm for testing against my original XLSX files to ensure they convert correctly? If so I'm not sure how I do that.

Sorry and thanx again for all your efforts.

Gertone (Geert Bormans)Information ArchitectCommented:
In a meeting now...
It is a step that transforms the excel spreadsheet XML into XML
so you should test seperately first on a number of files

you could use a batch like this (untested)

set root=C:\LocalData\GBSP\excel-merge\

set xslte=%root%excel-2-xml.xsl

set saxon=%root%saxon9he.jar

set source=%root%excel.xml

set target=%root%result.xml

java -Xmx1200M -Xss3000k -cp %saxon% net.sf.saxon.Transform -xsl:%xslte% -s:%source% -o:%target%

Open in new window

Steven O'NeillSolutions ArchitectAuthor Commented:
Hi Geert

No probs. So here's what I've now done:

Taken a few of the XLSX spreadsheets;
Opened them manually and saved them as XML Spreadsheet 2003 format;
Ran the batch file you gave me above (following tweaks for appropriate library) and the transformed XML file appears as expected;

So this is great :)

I believe now I need to do the following:

Find a way to convert a directory full of XLSX files into XML Spreadsheet 2003 format automagically;
Then once we have the files transformed we need a way to run the original batch file to merge the scanned XML file against the newly transformed XML file (matching against the file names);
Send the resulting merged XML file to a directory of the same name as the original file name;

Possibly a step or 3 I've missed here but hopefully you will keep me right with this :)

The updated finalised folder structure after we've added the scanned XML file for each box and the transformed XML file from the original XLSX file would look like this:

Updated Directory structure complete with transformed XLSX files into XML
The above image is a finished representation of how the files and directory structure should look (in my head) once we've completed each stage of the transformation/merging. You can see each step of the directory structure - following the transformation/merging stages - in the original post as well.

Does that make sense?

All looking very good so far Geert and thanx again for the help.

Gertone (Geert Bormans)Information ArchitectCommented:
maybe you could look into this tool for step 1

I have just been given a deadline for something by 4 this evening
But once 1. is done, we just need to do the magic in the batch, that I can handle :-)
Gertone (Geert Bormans)Information ArchitectCommented:
1. being "Find a way to convert a directory full of XLSX files into XML Spreadsheet 2003 format automagically;"
Steven O'NeillSolutions ArchitectAuthor Commented:
Sweet Geert

Works really well. I found this one - - which can work on the command line without any issues but it's very expensive for what we need it for (about $500!!) so the converter you've sent me works very nicely and for $34 seems a nice bargain...just need to determine if we can do this within the batch file but if not then no real biggy as we'll just add this as a step in the process (to perform a manual run of the software).

So we've now been able to take a bunch of the XLSX files and use this tool to convert them to XML Spreadsheet 2003 format.

If you're able to review the batch file work that would be absolutely fantastic :) and totally understand that deadlines are deadlines lol.

Thanx again

Steven O'NeillSolutions ArchitectAuthor Commented:
Hi Geert

Just to let you know I've now managed to find another converter which I'll be using: - very nice little tool in my opinion and reasonable cost.

It has the ability to be called from a batch file as well (even creates it's own batch file code for you depending on the selections made).

So I've now been able to find a way to convert the XLSX to XML Spreadsheet 2003 format and we have the coding to do everything else we need (thanx to you) but I believe we are still needing to alter the coding to perform steps 2 & 3 from above:

Then once we have the files transformed we need a way to run the original batch file to merge the scanned XML file against the newly transformed XML file (matching against the file names);
Send the resulting merged XML file to a directory of the same name as the original file name;

I also think we need a Step 2.5 which will create the Directory based upon the finalised filename - this might actually be Step 4 but wasn't sure if we should create the directory first before adding the file to it (wasn't sure which was easier).

Thanx again

Gertone (Geert Bormans)Information ArchitectCommented:

I lost track a bit. I researched a number of tools yesterday evening and found one that allows to create some batch callable script, similar to what you just described. But that one had the problem of spitting out a very old excel XML file which requires me to change the XSLT. Not sure if it is the same (I have seen so many yesterday, installed them and tested them, and wasn't happy with any one of them) Note that at the end all you need is a 5 lines of C# compiled into an exe because you can hard code the formats in and out

That is what i did yesterday. Today I had a meeting until just now

Having said that, if you are happy with the one you found, that is good. But can you please verify the XSLT transform on a couple of the results ujust to be sure?
Steven O'NeillSolutions ArchitectAuthor Commented:
Hi Geert

Attached are 4 files:

Original XLSX File;
Converted XML File using the Excel to XML tool I used;
Converted XML file using the excel-2-XML XSLT you provided;
The original XSLT file you provided me with;

All appear to be doing what I need so atm I'm happy enough. I don't have enough data atm to test this out as complete as I'd like but I've been assured that the files we receive will not deviate from what I've been working with (only time will tell lol).

Is this what you need/


Steven O'NeillSolutions ArchitectAuthor Commented:
Hi Geert

Can I just confirm that what I sent to you on Friday (via my previous reply) was what you were after?

Thanx again

Gertone (Geert Bormans)Information ArchitectCommented:
From your Friday post I understand that the batch transformer you are using is doing the right thing
Now let me look into your batch requirements
Steven O'NeillSolutions ArchitectAuthor Commented:
Hi Geert

Thanx again. Yes you are correct. The batch transformer I'll be using is the Total Excel Converter from ( Does exactly what it says on the tin and provides batch ready code to insert into your batch file.

As an example I have this in my projects area:

"C:\Program Files (x86)\TotalExcelConverter\ExcelConverter.exe" "C:\ProjectFolder\Project005\DataFiles\*.xlsx" "C:\ProjectFolder\Project005\DataFiles\" -kfs -c XML -template [Name].xml -cs -td Semi -UseOOFilter -pc M -PDFAuthor Softplicity -pdfver 1.4 -tjq 80 -HeadAlign c -HeadFont Tahoma -HeadSize 10 -FootAlign c -FootFont Tahoma -FootSize 10 -FontMapping ReplaceAllFonts -PNHP 1

Open in new window

Probably a load in there I can do without but I'll go through this once we've begun working on the finished file :)


Gertone (Geert Bormans)Information ArchitectCommented:
Working on th batch, it occurs to me
Box-1A.xlsx the excel file
Box-1A.xml the xml raw from the excel (generated by coolutils)
Box-1A-Converted.xml excel file transformed in useful XML

But what do you have in mind for the scanned extract metadata?
Box-1A-ScanningExtract.xml in the same folder?
Gertone (Geert Bormans)Information ArchitectCommented:
set root=E:\experts-exchange\201506-xml-merge\

set xslte=%root%excel-2-XML.xsl
set xsltm=%root%merge-files.xsl

set saxon=%root%saxon\saxon9he.jar

set in=%root%files\

for %%A in (%in%*.xlsx) do (
  java -Xmx1200M -Xss3000k -cp %saxon% net.sf.saxon.Transform -xsl:%xslte% -s:%in%%%~nA.xml -o:%in%%%~nA-converted.xml
  mkdir %in%%%~nA
  java -Xmx1200M -Xss3000k -cp %saxon% net.sf.saxon.Transform -xsl:%xsltm% -s:%in%%%~nA-ScanningExtract.xml -o:%in%%%~nA\%%~nA.xml  initial-file-name=%in%%%~nA-converted.xml

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Gertone (Geert Bormans)Information ArchitectCommented:
root to be set as the project root
(that is where I put the XSLTs in)

input files are in files
For each excel file in the files directory
- I run the transformer to make a excel to xml transform
- I create the dir
- I make the merge transform result in the subdir

Have fun
Steven O'NeillSolutions ArchitectAuthor Commented:
Sweet thanx Greet

Seems to do what we need it to do now :)

I'll do some more work on this in the morning after we've able to perform some more tests on it but it would appear to work as expected. I'll add the batch coding to this for the Excel conversion routine and fingers crossed it works without any issues lol.

I'll get back to you tomorrow (Tuesday) on this.

Thanx again

Steven O'NeillSolutions ArchitectAuthor Commented:
Absolutely superb Geert...thanx once again for everything you've done with this. All works a treat and does exactly what I was after.
Gertone (Geert Bormans)Information ArchitectCommented:
OK great, that is good news

You are welcome

Sorry I made you wait a bit longer than I intended.
I hope you understand we are all volunteers here,
so when pressure grows on other stuff, this one comes second :-(
Steven O'NeillSolutions ArchitectAuthor Commented:
Nope totally understand this Geert and appreciate every effort that is put in here by everyone who takes part at EE :)
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.