Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Merge Excel file with contents of XML file - Part 2

Posted on 2015-06-23
27
Medium Priority
?
199 Views
Last Modified: 2015-07-07
Hi guys

Thanx to Geert yesterday via my previous question (http://www.experts-exchange.com/questions/28691313/Merge-Excel-file-with-contents-of-XML-file.html#a40843318) I was able to merge the contents of an XLSX file and an XML file to provide me with a completed XML file.

We now move onto another step that I need to work upon and hopefully it's something we can accomplish. I'll walk through the scenario we need to perform:

We receive a hard copy paper file that needs to be scanned. At the same time we receive an XLSX file that relates to this paper file;
The XLSX file is saved to a directory;
We scan the paper file and as part of the process we output metadata contents into an XML file that relates to this;
The scanner will output the XML file directly to the same directory we saved the XLSX file to (the file name will be identical except one file is XLSX and the other XML;
We run the merge routine to join the XLSX file and the corresponding XML file together (both files will have the same name);
We output the resulting file into a new sub-directory with the same name as the file and it will have the same name as the original XML file;

So Step 1 is that we receive the XLSX files and we would store them like this:

Directory with XLSX Files only
Step 2 would be to perform the scanning and the output files would be added to the directory like this:

Directory with XLSX and XML files
Step 3 is after we've performed the merge of each file and the directory would look like this:

Directory with XLSX, XML and sub-folders
Step 4 - The newly created file appears in the sub-folder

Finalised sub-directory with newly created merged XML file
From my earlier question (http://www.experts-exchange.com/questions/28691313/Merge-Excel-file-with-contents-of-XML-file.html#a40843318) I was able to determine we can merge the files together but we need to convert the XLSX files to XML before we can perform the merge work that Geert was so kind to walk me through.

In order to achieve this yesterday we created a .XSD file that we manually attached to the Excel file to allow us to do the mapping and the extract. I've attached an example of the Box 1A.XLSX file and the Box 1A-converted.XML file for comparison. The issue we have with the XLSX file is it comes to us with some fields merged together (so we have to remove the merge from these fields) then remove the blank columns. Doing this manually is easy enough - open the file; highlight the worksheet; right-click and choose to remove the merge; using Find & Select to remove the blank columns is quite simple but I'm looking to automate the create of the XML file from the XLSX if I can.

So I'm looking for advice on how we automate portions of this. We have the ability to do the work on a file by file basis but I'm now looking to perform this work on all files in a directory (we can output the files to a different directory if it helps matters). The idea would be that the job would be run once a day and everything contained in the folder would be used - Box 1A files joined together and output to the Box 1A directory; Box 2A files joined together and output to the Box 2A directory and so on...every night the folder area would be cleared out and all files/sub-directories would be removed and a fresh batch of data would be added the next day.

Hope this makes sense...I know it looks a little complicated but hopefully my directory snapshots and the files (plus the original question) make things a little easier to follow. If this has to be broken down into multiple questions (Excel conversion and then the automation process) then no problems and I'm happy to do this as well.

Here is a copy of the FileCreate batch file we have that pulls the 2 files and outputs them to a 3rd file. We need to automate this to pick up each file in the directory, with the corresponding ""-converted.XML file (so Box 1A.XML would be matched to Box 1A-converted.XML and so on) and we also need to create a directory based upon the name of the original XML file (the one without -converted at the end):

set root=C:\ProjectFolder\Project002\

set xslt=%root%ProjectFiles\merge-files.xsl

set saxon=%root%ProjectFiles\saxon9he.jar

set file1=%root%DataFiles\Box 1A-converted.xml

set file2=%root%DataFiles\Box 1A.xml

set file3=%root%DataFiles\Box 1A\Box 1A.xml

java -Xmx1200M -Xss3000k -cp %saxon% net.sf.saxon.Transform -xsl:%xslt% -

s:"%file2%" -o:"%file3%" initial-file-name="%file1%"

Open in new window


Hope this all makes sense and thanx for looking.
Box-1A.xlsx
Box-1A-converted.xml
Import-mapping.xsd
merge-files.xsl
Box-1A.xml
0
Comment
Question by:Steven O'Neill
  • 14
  • 13
27 Comments
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 40846947
Hi Steven,

If I am not mistaken, I see two tasks in your resume
1 facilitate the Excel to XML part
2 run yesterdays operation on a multitude of files in a directory

2. is the easy part. A bit of loop programming in a windows batch file and you are done
Run it once a day as you please and you are good to go. Using variables as we did in the posted batch script gives you a high degree of configuration

About 1. I noted in your example excel (so I know what you are talking about I believe) dat there are a bunch of unmerged columns. Trying to bind that to XML fields is on option, but might be tiredsome. If the files are consequently the same, I would save all the excel files as XML 2003 spreadsheet format and use a second XSLT... no manual work required
0
 
LVL 2

Author Comment

by:Steven O'Neill
ID: 40846975
Hi Geert

Yes you are correct in both parts. If you've got something for Part 2 that is appreciated.

Part 1 is the problem area in my mind as well and I thought the easiest way was to do something like you say (take the original XLSX file and save it as another format for XML purposes) but I still thot that columns B and D & E would have been the issue. I only need the data from what is effectively columns A, C, F & G (so we can discard B and D & E plus all the rest of the columns for now) but I wasn't sure of the best way to handle this (which is why I ended up doing this manually with the removal of the merges and then the Find & Select option to delete the blank columns).

If you think that saving everything as a particular file format in a batch is a way forward (again I'm quite open to doing this whatever way is easiest tbh) then I'm happy to listen to this.

Hope that helps.

Steven
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 40847023
That is how I usually tackle Excel 2 XML automation
(Batch) save as spreadsheet XML 2003 => XSLT => done

excel => ssxml2003 => XML1
scan => metadata xml2
merge xml1 & xml2 => xml3

sounds a perfect workflow to me

with a bit of VB or C# you can do the excel to ss xml 2003
I have done the XSLT for ssxml2003 a zilion time
it is a breeze that way

... but likely not today :-(
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 2

Author Comment

by:Steven O'Neill
ID: 40847068
Hi Geert

Thanx for this. Absolutely not needing it today (too late in the day for us both I suspect lol).

If there's anything you can provide tomorrow/Thursday or Friday then that would be ideal :)

Thanx again

Steven
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 40847102
Will visit customers for the next two days. Will try to do something in between. If not, will be in my office all Friday
0
 
LVL 2

Author Comment

by:Steven O'Neill
ID: 40847107
Sweet...thanx Geert
0
 
LVL 2

Author Comment

by:Steven O'Neill
ID: 40852378
Hi Geert

Hope you've had a good couple of days. Was wondering if/when you'll have the chance to review this question?

Thanx again

Steven
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 40862690
Hi, been bitten by a sudden increase in my workload :-(

Anyway, I made an XSLT that transforms your excel file nicely into the XML required.
As a source you need to safe the excell as Spreadsheet XML 2003
You could open them all and save as... or you could write a little .net program that does that

Let's discuss next steps as soon as you tested this
excel-2-XML.xsl
0
 
LVL 2

Author Comment

by:Steven O'Neill
ID: 40863153
Hi Geert

Thanx for getting back to me and thanx for the file. Question on usage though...where should I use this? Does it get used at the FileCreate file stage (where we state the root location, the xslt file (merge-file.xsl at present); set saxon and the 3 files before running everything together) - I've supplied my FileCreate file within the original post in case you want to review.

Or am I supposed to use this independently atm for testing against my original XLSX files to ensure they convert correctly? If so I'm not sure how I do that.

Sorry and thanx again for all your efforts.

Steven
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 40863183
In a meeting now...
It is a step that transforms the excel spreadsheet XML into XML
so you should test seperately first on a number of files

you could use a batch like this (untested)

set root=C:\LocalData\GBSP\excel-merge\

set xslte=%root%excel-2-xml.xsl

set saxon=%root%saxon9he.jar

set source=%root%excel.xml

set target=%root%result.xml

java -Xmx1200M -Xss3000k -cp %saxon% net.sf.saxon.Transform -xsl:%xslte% -s:%source% -o:%target%

Open in new window

0
 
LVL 2

Author Comment

by:Steven O'Neill
ID: 40863219
Hi Geert

No probs. So here's what I've now done:

Taken a few of the XLSX spreadsheets;
Opened them manually and saved them as XML Spreadsheet 2003 format;
Ran the batch file you gave me above (following tweaks for appropriate library) and the transformed XML file appears as expected;

So this is great :)

I believe now I need to do the following:

Find a way to convert a directory full of XLSX files into XML Spreadsheet 2003 format automagically;
Then once we have the files transformed we need a way to run the original batch file to merge the scanned XML file against the newly transformed XML file (matching against the file names);
Send the resulting merged XML file to a directory of the same name as the original file name;

Possibly a step or 3 I've missed here but hopefully you will keep me right with this :)

The updated finalised folder structure after we've added the scanned XML file for each box and the transformed XML file from the original XLSX file would look like this:

Updated Directory structure complete with transformed XLSX files into XML
The above image is a finished representation of how the files and directory structure should look (in my head) once we've completed each stage of the transformation/merging. You can see each step of the directory structure - following the transformation/merging stages - in the original post as well.

Does that make sense?

All looking very good so far Geert and thanx again for the help.

Steven
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 40863373
maybe you could look into this tool for step 1
http://download.cnet.com/Batch-Excel-File-Converter/3000-2079_4-76166368.html

I have just been given a deadline for something by 4 this evening
But once 1. is done, we just need to do the magic in the batch, that I can handle :-)
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 40863374
1. being "Find a way to convert a directory full of XLSX files into XML Spreadsheet 2003 format automagically;"
0
 
LVL 2

Author Comment

by:Steven O'Neill
ID: 40863405
Sweet Geert

Works really well. I found this one - http://www.softinterface.com/Convert-XLS/Features/Convert-XLSX-To-HTML-XML.htm - which can work on the command line without any issues but it's very expensive for what we need it for (about $500!!) so the converter you've sent me works very nicely and for $34 seems a nice bargain...just need to determine if we can do this within the batch file but if not then no real biggy as we'll just add this as a step in the process (to perform a manual run of the software).

So we've now been able to take a bunch of the XLSX files and use this tool to convert them to XML Spreadsheet 2003 format.

If you're able to review the batch file work that would be absolutely fantastic :) and totally understand that deadlines are deadlines lol.

Thanx again

Steven
0
 
LVL 2

Author Comment

by:Steven O'Neill
ID: 40865152
Hi Geert

Just to let you know I've now managed to find another converter which I'll be using: http://www.coolutils.com/Excel-to-XML - very nice little tool in my opinion and reasonable cost.

It has the ability to be called from a batch file as well (even creates it's own batch file code for you depending on the selections made).

So I've now been able to find a way to convert the XLSX to XML Spreadsheet 2003 format and we have the coding to do everything else we need (thanx to you) but I believe we are still needing to alter the coding to perform steps 2 & 3 from above:

Then once we have the files transformed we need a way to run the original batch file to merge the scanned XML file against the newly transformed XML file (matching against the file names);
Send the resulting merged XML file to a directory of the same name as the original file name;

I also think we need a Step 2.5 which will create the Directory based upon the finalised filename - this might actually be Step 4 but wasn't sure if we should create the directory first before adding the file to it (wasn't sure which was easier).

Thanx again

Steven
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 40865379
Steven,

I lost track a bit. I researched a number of tools yesterday evening and found one that allows to create some batch callable script, similar to what you just described. But that one had the problem of spitting out a very old excel XML file which requires me to change the XSLT. Not sure if it is the same (I have seen so many yesterday, installed them and tested them, and wasn't happy with any one of them) Note that at the end all you need is a 5 lines of C# compiled into an exe because you can hard code the formats in and out

That is what i did yesterday. Today I had a meeting until just now

Having said that, if you are happy with the one you found, that is good. But can you please verify the XSLT transform on a couple of the results ujust to be sure?
0
 
LVL 2

Author Comment

by:Steven O'Neill
ID: 40865392
Hi Geert

Attached are 4 files:

Original XLSX File;
Converted XML File using the Excel to XML tool I used;
Converted XML file using the excel-2-XML XSLT you provided;
The original XSLT file you provided me with;

All appear to be doing what I need so atm I'm happy enough. I don't have enough data atm to test this out as complete as I'd like but I've been assured that the files we receive will not deviate from what I've been working with (only time will tell lol).

Is this what you need/

Thanx

Steven
Box-1A.xlsx
Box-1A.xml
Box-1A-Converted.xml
excel-2-XML.xsl
0
 
LVL 2

Author Comment

by:Steven O'Neill
ID: 40868136
Hi Geert

Can I just confirm that what I sent to you on Friday (via my previous reply) was what you were after?

Thanx again

Steven
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 40869390
Steven,
From your Friday post I understand that the batch transformer you are using is doing the right thing
Now let me look into your batch requirements
0
 
LVL 2

Author Comment

by:Steven O'Neill
ID: 40869446
Hi Geert

Thanx again. Yes you are correct. The batch transformer I'll be using is the Total Excel Converter from CoolUtils.com (http://www.coolutils.com/TotalExcelConverter). Does exactly what it says on the tin and provides batch ready code to insert into your batch file.

As an example I have this in my projects area:

"C:\Program Files (x86)\TotalExcelConverter\ExcelConverter.exe" "C:\ProjectFolder\Project005\DataFiles\*.xlsx" "C:\ProjectFolder\Project005\DataFiles\" -kfs -c XML -template [Name].xml -cs -td Semi -UseOOFilter -pc M -PDFAuthor Softplicity -pdfver 1.4 -tjq 80 -HeadAlign c -HeadFont Tahoma -HeadSize 10 -FootAlign c -FootFont Tahoma -FootSize 10 -FontMapping ReplaceAllFonts -PNHP 1

Open in new window


Probably a load in there I can do without but I'll go through this once we've begun working on the finished file :)

Thanx

Steven
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 40869451
Working on th batch, it occurs to me
Box-1A.xlsx the excel file
Box-1A.xml the xml raw from the excel (generated by coolutils)
Box-1A-Converted.xml excel file transformed in useful XML

But what do you have in mind for the scanned extract metadata?
Box-1A-ScanningExtract.xml in the same folder?
0
 
LVL 60

Accepted Solution

by:
Geert Bormans earned 2000 total points
ID: 40869453
set root=E:\experts-exchange\201506-xml-merge\

set xslte=%root%excel-2-XML.xsl
set xsltm=%root%merge-files.xsl

set saxon=%root%saxon\saxon9he.jar

set in=%root%files\

for %%A in (%in%*.xlsx) do (
  java -Xmx1200M -Xss3000k -cp %saxon% net.sf.saxon.Transform -xsl:%xslte% -s:%in%%%~nA.xml -o:%in%%%~nA-converted.xml
  mkdir %in%%%~nA
  java -Xmx1200M -Xss3000k -cp %saxon% net.sf.saxon.Transform -xsl:%xsltm% -s:%in%%%~nA-ScanningExtract.xml -o:%in%%%~nA\%%~nA.xml  initial-file-name=%in%%%~nA-converted.xml
)

Open in new window

0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 40869458
root to be set as the project root
(that is where I put the XSLTs in)

input files are in files
For each excel file in the files directory
- I run the transformer to make a excel to xml transform
- I create the dir
- I make the merge transform result in the subdir

Have fun
0
 
LVL 2

Author Comment

by:Steven O'Neill
ID: 40869486
Sweet thanx Greet

Seems to do what we need it to do now :)

I'll do some more work on this in the morning after we've able to perform some more tests on it but it would appear to work as expected. I'll add the batch coding to this for the Excel conversion routine and fingers crossed it works without any issues lol.

I'll get back to you tomorrow (Tuesday) on this.

Thanx again

Steven
0
 
LVL 2

Author Closing Comment

by:Steven O'Neill
ID: 40870797
Absolutely superb Geert...thanx once again for everything you've done with this. All works a treat and does exactly what I was after.
0
 
LVL 60

Expert Comment

by:Geert Bormans
ID: 40870871
OK great, that is good news

You are welcome

Sorry I made you wait a bit longer than I intended.
I hope you understand we are all volunteers here,
so when pressure grows on other stuff, this one comes second :-(
0
 
LVL 2

Author Comment

by:Steven O'Neill
ID: 40870875
Nope totally understand this Geert and appreciate every effort that is put in here by everyone who takes part at EE :)
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Do you use a spreadsheet like Microsoft's Excel?  Have you ever wanted to link out to a non excel file on your computer or network drive?  This is the way I found to do it!
This article describes how to use a set of graphical playing cards to create a Draw Poker game in Excel or VB6.
The viewer will learn how to use the =DISCRINV command to create a discrete random variable, use this command to model a set of probabilities and outcomes in a Monte Carlo simulation, and learn how to find the standard deviation of a set of probabil…
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.

782 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question