Solved

Merge XML Files

Posted on 2012-03-09
22
1,721 Views
Last Modified: 2012-03-12
Looking for a way to merge a folder with about 50 different xml files into one xml files. I will schedule it for a daily task to run and merge all xml files.. Anyone know if this can be done with Powershell, or any other type of script? I found this link that talks about merging Excel files, is this close if we changed it to xml?? Please help!  

http://www.youdidwhatwithtsql.com/merging-csv-files-with-powershell/330
0
Comment
Question by:LeviDaily
  • 8
  • 6
  • 4
  • +1
22 Comments
 
LVL 51

Expert Comment

by:Bill Prew
Comment Utility
Do all the XML files have a common schema?

I'd want to see some samples of the files to be merged.

Will there be any "duplicate" root nodes among the files, or is this more of an "append each file to the rest" type deal?

~bp
0
 
LVL 2

Author Comment

by:LeviDaily
Comment Utility
can i email you the files? I would rather not post them. there would not
be duplicatec
0
 
LVL 51

Expert Comment

by:Bill Prew
Comment Utility
It's actually against EE policy for us to exchange offline info while working a question (not my rule, but I get it).

If they contain sensitive data then typically posters will edit the file before posting to remove the sensitive data and replace it with meaningless info.

If it's a size issue then strip it down to just a representative sample of the file(s).

Understand, I'm not trying to be difficult, but I do want to abide by the EE rules on this so that all experts get to see the same info and messages.

~bp
0
 
LVL 51

Accepted Solution

by:
ahoffmann earned 200 total points
Comment Utility
do you mean to just concatenate the files? in DOS or similar that's:
type file1 file2 file3 > new-file

or do you mean to merge the content of the files according their xml structure?
then you need a proper xml parser and then walk through the xml tree programatically
0
 
LVL 2

Author Comment

by:LeviDaily
Comment Utility
i totally understand.. i wikk remove the sensitive data shortly and post.not sure if i know the differencr between concatenate or merge. say i have two xml files. i would go into file 1 and select all and copy, then open file 2 and paste the data.
0
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 150 total points
Comment Utility
You could certainly do this with perl.  The script would basically look like this (you'll likely need to tweak the XMLin and XMLout and merge calls):

#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;
use Data::Merger;
my $dir = shift or die "Usage: $0 <dir to merge XML files in>\n";
# get the list of files
opendir DIR, $dir or die "could not open dir $dir: $!";
my @files = grep /\.xml$/, readdir DIR;
closedir DIR;
# get the first XML
# see perldoc XML::Simple for options
my %in_opts = ( ForceArray => 1 );
my $xml = XMLin(shift(@files), %in_opts);
# loop and merge the others
foreach my $fil (@files) {
    my $tmp = XMLin($fil, %in_opts);
    $xml = merger($xml, $tmp);
}
# output the XML
open OUT, '>', 'output.xml' or die "could not write output.xml: $!";
# see perldoc XML::Simple for options
print OUT XMLout($xml, ( ForceArray => 1 ));
close OUT;

Open in new window


If Data::Merger doesn't give you enough control, you can use Data::Nested instead:

#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;
use Data::Nested;
my $dir = shift or die "Usage: $0 <dir to merge XML files in>\n";
# get the list of files
opendir DIR, $dir or die "could not open dir $dir: $!";
my @files = grep /\.xml$/, readdir DIR;
closedir DIR;
# get the first XML
# see perldoc XML::Simple for options
my %in_opts = ( ForceArray => 1 );
my $xml = XMLin(shift(@files), %in_opts);
# created nested data object - see perldoc Data::Nested
my $nds = new Data::Nested;
$nds->set_merge('merge_ul', 'merge');
# you could validate structure by calls to $nds->structure(...)
# loop and merge the others
foreach my $fil (@files) {
    my $tmp = XMLin($fil, %in_opts);
    $nds->merge($xml, $tmp, undef, 1);
}
# output the XML
open OUT, '>', 'output.xml' or die "could not write output.xml: $!";
# see perldoc XML::Simple for options
print OUT XMLout($xml, ( ForceArray => 1 ));
close OUT;

Open in new window

0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
> i would go into file 1 and select all and copy, then open file 2 and paste the data.
I assume that's concatenation, hence see my simple suggestion :)
0
 
LVL 51

Expert Comment

by:Bill Prew
Comment Utility
I wouldn't expect just concatenating all the single files together to yield a new single schema XML data file, but maybe I don't understand how you use the new merged file?

~bp
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
Concatenating will definitely not produce a valid XML file.  You need to merge them.  If there are no duplicates between files and only a single structure under the root element then this script will work (modified from earlier to get rid of heavy-duty merge modules):

#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;
my $dir = shift or die "Usage: $0 <dir to merge XML files in>\n";
# get the list of files
opendir DIR, $dir or die "could not open dir $dir: $!";
my @files = grep /\.xml$/, readdir DIR;
closedir DIR;
# get the first XML
# see perldoc XML::Simple for options
my %in_opts = ( ForceArray => 1 );
my $xml = XMLin(shift(@files), %in_opts);
# loop and merge the others
foreach my $fil (@files) {
    my $tmp = XMLin($fil, %in_opts);
    # this line is probably wrong but I don't have a handy XML file to test on
    # and it should be close - the goal is to append the elements just below the
    # root element to produce a merged and valid output XML
    push @{$xml->[0]}, @{$tmp->[0]};
}
# output the XML
open OUT, '>', 'output.xml' or die "could not write output.xml: $!";
# see perldoc XML::Simple for options
print OUT XMLout($xml, ( ForceArray => 1 ));
close OUT;

Open in new window

0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
do you stand on a powershell solution?
0
 
LVL 2

Author Comment

by:LeviDaily
Comment Utility
no i am up fror anything.. using windows xp, dont care what solution i use as long as it works
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 51

Expert Comment

by:Bill Prew
Comment Utility
@LeviDaily

I was still waiting for example files before I posted a solution...

~bp
0
 
LVL 2

Author Comment

by:LeviDaily
Comment Utility
Ok.. Sorry to take a while.. HEre is an update. This is  for a custom POS system. So for every transaction, there are 3 xml files created. Looks like they have 3 different schemas. I have attached 1, 2, & 3.
1.xml
2.xml
3.xml
0
 
LVL 2

Author Comment

by:LeviDaily
Comment Utility
Would it be easier if it converted all .xml's to a .csv file? I guess it doesnt have to be csv..
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
If the XML files have difference schemas, why are you trying to merge them into a single XML file?  It will never be valid (at least according to any DTD or XSD).

To go back to the very basics, what is it you are trying to do with the data?  You now mention csv when the original question was asking for merged XML (two very different things).
0
 
LVL 2

Author Comment

by:LeviDaily
Comment Utility
Sorry ... You are right.. I just need all the data in one readable file.. Since this is a Point of Sale machine, I want to be able to "merge" all those xml's into one file that will then get transferred over ftp to a server.. The accounting team will have access to it..

If there was a "transaction" in question, the accounting team would look at that file (.xml, or .csv...??) to verify the transaction took place.
0
 
LVL 51

Expert Comment

by:Bill Prew
Comment Utility
Why not just ZIP the files together first, and then transfer that over for possible future reference?

~bp
0
 
LVL 2

Author Comment

by:LeviDaily
Comment Utility
BP.. you are right, except the Accounting Team wants it all in one file so the data is searchable... For instance, if we merge all the files to one every night, then the accounting team can go to the day in question, and open all the one file and search for data..

There could be hundreds of xml files in one day, and right now they have to open each file individually to find what they are looking for.. the titles of the xml files are the timestamp, no data referencing the customer..
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
Can you teach the Accounting Team about grep (or whatever the Windows equivalent is (findstr?))?  There's no reason to open all of the files to search for something.

If they really want everything in one file and it does not have to be a valid XML file, then I would just go with the simple "type *.xml > combined.xml" approach.  It's simple, can be done in a bat file, and will produce something as good as any other "random" merge of the files.

If it does have to be valid XML, then I'd still do something in perl.
0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
> .. the Accounting Team wants it all in one file ..
what's the problem with my very first solution ID: 37705103
0
 
LVL 51

Assisted Solution

by:Bill Prew
Bill Prew earned 150 total points
Comment Utility
Simplest way is to use the COPY command to do the concatenation then.  You can either do:

copy file1.xml+file2.xml+file3.xml all.xml

or, if you want to merge all the files in the current folder then you can do:

copy *.xml all.txt

and then rename the all.txt to all.xml

~bp
0
 
LVL 2

Author Closing Comment

by:LeviDaily
Comment Utility
Thanks for all the help guys!! I am fine with the solution of type file>output since it doesnt require any other installations and is native to windows.. Sorry for being all over the place on this, but looks like they can search a text file.. Appreciate it much!!
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Utilizing an array to gracefully append to a list of EmailAddresses
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now