We help IT Professionals succeed at work.
Get Started

Merge Excel file with contents of XML file - Part 2

Steven O'Neill
Last Modified: 2015-07-07
Hi guys

Thanx to Geert yesterday via my previous question (https://www.experts-exchange.com/questions/28691313/Merge-Excel-file-with-contents-of-XML-file.html#a40843318) I was able to merge the contents of an XLSX file and an XML file to provide me with a completed XML file.

We now move onto another step that I need to work upon and hopefully it's something we can accomplish. I'll walk through the scenario we need to perform:

We receive a hard copy paper file that needs to be scanned. At the same time we receive an XLSX file that relates to this paper file;
The XLSX file is saved to a directory;
We scan the paper file and as part of the process we output metadata contents into an XML file that relates to this;
The scanner will output the XML file directly to the same directory we saved the XLSX file to (the file name will be identical except one file is XLSX and the other XML;
We run the merge routine to join the XLSX file and the corresponding XML file together (both files will have the same name);
We output the resulting file into a new sub-directory with the same name as the file and it will have the same name as the original XML file;

So Step 1 is that we receive the XLSX files and we would store them like this:

Directory with XLSX Files only
Step 2 would be to perform the scanning and the output files would be added to the directory like this:

Directory with XLSX and XML files
Step 3 is after we've performed the merge of each file and the directory would look like this:

Directory with XLSX, XML and sub-folders
Step 4 - The newly created file appears in the sub-folder

Finalised sub-directory with newly created merged XML file
From my earlier question (https://www.experts-exchange.com/questions/28691313/Merge-Excel-file-with-contents-of-XML-file.html#a40843318) I was able to determine we can merge the files together but we need to convert the XLSX files to XML before we can perform the merge work that Geert was so kind to walk me through.

In order to achieve this yesterday we created a .XSD file that we manually attached to the Excel file to allow us to do the mapping and the extract. I've attached an example of the Box 1A.XLSX file and the Box 1A-converted.XML file for comparison. The issue we have with the XLSX file is it comes to us with some fields merged together (so we have to remove the merge from these fields) then remove the blank columns. Doing this manually is easy enough - open the file; highlight the worksheet; right-click and choose to remove the merge; using Find & Select to remove the blank columns is quite simple but I'm looking to automate the create of the XML file from the XLSX if I can.

So I'm looking for advice on how we automate portions of this. We have the ability to do the work on a file by file basis but I'm now looking to perform this work on all files in a directory (we can output the files to a different directory if it helps matters). The idea would be that the job would be run once a day and everything contained in the folder would be used - Box 1A files joined together and output to the Box 1A directory; Box 2A files joined together and output to the Box 2A directory and so on...every night the folder area would be cleared out and all files/sub-directories would be removed and a fresh batch of data would be added the next day.

Hope this makes sense...I know it looks a little complicated but hopefully my directory snapshots and the files (plus the original question) make things a little easier to follow. If this has to be broken down into multiple questions (Excel conversion and then the automation process) then no problems and I'm happy to do this as well.

Here is a copy of the FileCreate batch file we have that pulls the 2 files and outputs them to a 3rd file. We need to automate this to pick up each file in the directory, with the corresponding ""-converted.XML file (so Box 1A.XML would be matched to Box 1A-converted.XML and so on) and we also need to create a directory based upon the name of the original XML file (the one without -converted at the end):

set root=C:\ProjectFolder\Project002\

set xslt=%root%ProjectFiles\merge-files.xsl

set saxon=%root%ProjectFiles\saxon9he.jar

set file1=%root%DataFiles\Box 1A-converted.xml

set file2=%root%DataFiles\Box 1A.xml

set file3=%root%DataFiles\Box 1A\Box 1A.xml

java -Xmx1200M -Xss3000k -cp %saxon% net.sf.saxon.Transform -xsl:%xslt% -

s:"%file2%" -o:"%file3%" initial-file-name="%file1%"

Open in new window

Hope this all makes sense and thanx for looking.
Watch Question
Information Architect
Top Expert 2006
This problem has been solved!
Unlock 1 Answer and 27 Comments.
See Answer
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE