We now move onto another step that I need to work upon and hopefully it's something we can accomplish. I'll walk through the scenario we need to perform:
We receive a hard copy paper file that needs to be scanned. At the same time we receive an XLSX file that relates to this paper file;
The XLSX file is saved to a directory;
We scan the paper file and as part of the process we output metadata contents into an XML file that relates to this;
The scanner will output the XML file directly to the same directory we saved the XLSX file to (the file name will be identical except one file is XLSX and the other XML;
We run the merge routine to join the XLSX file and the corresponding XML file together (both files will have the same name);
We output the resulting file into a new sub-directory with the same name as the file and it will have the same name as the original XML file;
So Step 1 is that we receive the XLSX files and we would store them like this:
Step 2 would be to perform the scanning and the output files would be added to the directory like this:
Step 3 is after we've performed the merge of each file and the directory would look like this:
Step 4 - The newly created file appears in the sub-folder
In order to achieve this yesterday we created a .XSD file that we manually attached to the Excel file to allow us to do the mapping and the extract. I've attached an example of the Box 1A.XLSX file and the Box 1A-converted.XML file for comparison. The issue we have with the XLSX file is it comes to us with some fields merged together (so we have to remove the merge from these fields) then remove the blank columns. Doing this manually is easy enough - open the file; highlight the worksheet; right-click and choose to remove the merge; using Find & Select to remove the blank columns is quite simple but I'm looking to automate the create of the XML file from the XLSX if I can.
So I'm looking for advice on how we automate portions of this. We have the ability to do the work on a file by file basis but I'm now looking to perform this work on all files in a directory (we can output the files to a different directory if it helps matters). The idea would be that the job would be run once a day and everything contained in the folder would be used - Box 1A files joined together and output to the Box 1A directory; Box 2A files joined together and output to the Box 2A directory and so on...every night the folder area would be cleared out and all files/sub-directories would be removed and a fresh batch of data would be added the next day.
Hope this makes sense...I know it looks a little complicated but hopefully my directory snapshots and the files (plus the original question) make things a little easier to follow. If this has to be broken down into multiple questions (Excel conversion and then the automation process) then no problems and I'm happy to do this as well.
Here is a copy of the FileCreate batch file we have that pulls the 2 files and outputs them to a 3rd file. We need to automate this to pick up each file in the directory, with the corresponding ""-converted.XML file (so Box 1A.XML would be matched to Box 1A-converted.XML and so on) and we also need to create a directory based upon the name of the original XML file (the one without -converted at the end):
If I am not mistaken, I see two tasks in your resume
1 facilitate the Excel to XML part
2 run yesterdays operation on a multitude of files in a directory
2. is the easy part. A bit of loop programming in a windows batch file and you are done
Run it once a day as you please and you are good to go. Using variables as we did in the posted batch script gives you a high degree of configuration
About 1. I noted in your example excel (so I know what you are talking about I believe) dat there are a bunch of unmerged columns. Trying to bind that to XML fields is on option, but might be tiredsome. If the files are consequently the same, I would save all the excel files as XML 2003 spreadsheet format and use a second XSLT... no manual work required
Steven O'Neill
ASKER
Hi Geert
Yes you are correct in both parts. If you've got something for Part 2 that is appreciated.
Part 1 is the problem area in my mind as well and I thought the easiest way was to do something like you say (take the original XLSX file and save it as another format for XML purposes) but I still thot that columns B and D & E would have been the issue. I only need the data from what is effectively columns A, C, F & G (so we can discard B and D & E plus all the rest of the columns for now) but I wasn't sure of the best way to handle this (which is why I ended up doing this manually with the removal of the merges and then the Find & Select option to delete the blank columns).
If you think that saving everything as a particular file format in a batch is a way forward (again I'm quite open to doing this whatever way is easiest tbh) then I'm happy to listen to this.
Hope that helps.
Steven
Gertone (Geert Bormans)
That is how I usually tackle Excel 2 XML automation
(Batch) save as spreadsheet XML 2003 => XSLT => done
Hope you've had a good couple of days. Was wondering if/when you'll have the chance to review this question?
Thanx again
Steven
Gertone (Geert Bormans)
Hi, been bitten by a sudden increase in my workload :-(
Anyway, I made an XSLT that transforms your excel file nicely into the XML required.
As a source you need to safe the excell as Spreadsheet XML 2003
You could open them all and save as... or you could write a little .net program that does that
Let's discuss next steps as soon as you tested this excel-2-XML.xsl
Steven O'Neill
ASKER
Hi Geert
Thanx for getting back to me and thanx for the file. Question on usage though...where should I use this? Does it get used at the FileCreate file stage (where we state the root location, the xslt file (merge-file.xsl at present); set saxon and the 3 files before running everything together) - I've supplied my FileCreate file within the original post in case you want to review.
Or am I supposed to use this independently atm for testing against my original XLSX files to ensure they convert correctly? If so I'm not sure how I do that.
Opened them manually and saved them as XML Spreadsheet 2003 format;
Ran the batch file you gave me above (following tweaks for appropriate library) and the transformed XML file appears as expected;
So this is great :)
I believe now I need to do the following:
Find a way to convert a directory full of XLSX files into XML Spreadsheet 2003 format automagically;
Then once we have the files transformed we need a way to run the original batch file to merge the scanned XML file against the newly transformed XML file (matching against the file names);
Send the resulting merged XML file to a directory of the same name as the original file name;
Possibly a step or 3 I've missed here but hopefully you will keep me right with this :)
The updated finalised folder structure after we've added the scanned XML file for each box and the transformed XML file from the original XLSX file would look like this:
The above image is a finished representation of how the files and directory structure should look (in my head) once we've completed each stage of the transformation/merging. You can see each step of the directory structure - following the transformation/merging stages - in the original post as well.
Does that make sense?
All looking very good so far Geert and thanx again for the help.
I have just been given a deadline for something by 4 this evening
But once 1. is done, we just need to do the magic in the batch, that I can handle :-)
1. being "Find a way to convert a directory full of XLSX files into XML Spreadsheet 2003 format automagically;"
Steven O'Neill
ASKER
Sweet Geert
Works really well. I found this one - http://www.softinterface.com/Convert-XLS/Features/Convert-XLSX-To-HTML-XML.htm - which can work on the command line without any issues but it's very expensive for what we need it for (about $500!!) so the converter you've sent me works very nicely and for $34 seems a nice bargain...just need to determine if we can do this within the batch file but if not then no real biggy as we'll just add this as a step in the process (to perform a manual run of the software).
So we've now been able to take a bunch of the XLSX files and use this tool to convert them to XML Spreadsheet 2003 format.
If you're able to review the batch file work that would be absolutely fantastic :) and totally understand that deadlines are deadlines lol.
Thanx again
Steven
Steven O'Neill
ASKER
Hi Geert
Just to let you know I've now managed to find another converter which I'll be using: http://www.coolutils.com/Excel-to-XML - very nice little tool in my opinion and reasonable cost.
It has the ability to be called from a batch file as well (even creates it's own batch file code for you depending on the selections made).
So I've now been able to find a way to convert the XLSX to XML Spreadsheet 2003 format and we have the coding to do everything else we need (thanx to you) but I believe we are still needing to alter the coding to perform steps 2 & 3 from above:
Then once we have the files transformed we need a way to run the original batch file to merge the scanned XML file against the newly transformed XML file (matching against the file names);
Send the resulting merged XML file to a directory of the same name as the original file name;
I also think we need a Step 2.5 which will create the Directory based upon the finalised filename - this might actually be Step 4 but wasn't sure if we should create the directory first before adding the file to it (wasn't sure which was easier).
I lost track a bit. I researched a number of tools yesterday evening and found one that allows to create some batch callable script, similar to what you just described. But that one had the problem of spitting out a very old excel XML file which requires me to change the XSLT. Not sure if it is the same (I have seen so many yesterday, installed them and tested them, and wasn't happy with any one of them) Note that at the end all you need is a 5 lines of C# compiled into an exe because you can hard code the formats in and out
That is what i did yesterday. Today I had a meeting until just now
Having said that, if you are happy with the one you found, that is good. But can you please verify the XSLT transform on a couple of the results ujust to be sure?
Steven O'Neill
ASKER
Hi Geert
Attached are 4 files:
Original XLSX File;
Converted XML File using the Excel to XML tool I used;
Converted XML file using the excel-2-XML XSLT you provided;
The original XSLT file you provided me with;
All appear to be doing what I need so atm I'm happy enough. I don't have enough data atm to test this out as complete as I'd like but I've been assured that the files we receive will not deviate from what I've been working with (only time will tell lol).
Steven,
From your Friday post I understand that the batch transformer you are using is doing the right thing
Now let me look into your batch requirements
Steven O'Neill
ASKER
Hi Geert
Thanx again. Yes you are correct. The batch transformer I'll be using is the Total Excel Converter from CoolUtils.com (http://www.coolutils.com/TotalExcelConverter). Does exactly what it says on the tin and provides batch ready code to insert into your batch file.
As an example I have this in my projects area:
"C:\Program Files (x86)\TotalExcelConverter\ExcelConverter.exe" "C:\ProjectFolder\Project005\DataFiles\*.xlsx" "C:\ProjectFolder\Project005\DataFiles\" -kfs -c XML -template [Name].xml -cs -td Semi -UseOOFilter -pc M -PDFAuthor Softplicity -pdfver 1.4 -tjq 80 -HeadAlign c -HeadFont Tahoma -HeadSize 10 -FootAlign c -FootFont Tahoma -FootSize 10 -FontMapping ReplaceAllFonts -PNHP 1
Probably a load in there I can do without but I'll go through this once we've begun working on the finished file :)
Thanx
Steven
Gertone (Geert Bormans)
Working on th batch, it occurs to me
Box-1A.xlsx the excel file
Box-1A.xml the xml raw from the excel (generated by coolutils)
Box-1A-Converted.xml excel file transformed in useful XML
But what do you have in mind for the scanned extract metadata?
Box-1A-ScanningExtract.xml in the same folder?
root to be set as the project root
(that is where I put the XSLTs in)
input files are in files
For each excel file in the files directory
- I run the transformer to make a excel to xml transform
- I create the dir
- I make the merge transform result in the subdir
Have fun
Steven O'Neill
ASKER
Sweet thanx Greet
Seems to do what we need it to do now :)
I'll do some more work on this in the morning after we've able to perform some more tests on it but it would appear to work as expected. I'll add the batch coding to this for the Excel conversion routine and fingers crossed it works without any issues lol.
I'll get back to you tomorrow (Tuesday) on this.
Thanx again
Steven
Steven O'Neill
ASKER
Absolutely superb Geert...thanx once again for everything you've done with this. All works a treat and does exactly what I was after.
Sorry I made you wait a bit longer than I intended.
I hope you understand we are all volunteers here,
so when pressure grows on other stuff, this one comes second :-(
Steven O'Neill
ASKER
Nope totally understand this Geert and appreciate every effort that is put in here by everyone who takes part at EE :)
If I am not mistaken, I see two tasks in your resume
1 facilitate the Excel to XML part
2 run yesterdays operation on a multitude of files in a directory
2. is the easy part. A bit of loop programming in a windows batch file and you are done
Run it once a day as you please and you are good to go. Using variables as we did in the posted batch script gives you a high degree of configuration
About 1. I noted in your example excel (so I know what you are talking about I believe) dat there are a bunch of unmerged columns. Trying to bind that to XML fields is on option, but might be tiredsome. If the files are consequently the same, I would save all the excel files as XML 2003 spreadsheet format and use a second XSLT... no manual work required