Evan Cutler
asked on
merging XML in bash
Greetings,
I have two xml documents:
I need to get <images> into <document> like this:
Is there a way to do it in a bash script? or something like that? xmllint?
Thanks
I have two xml documents:
<document>
<header></header>
<tag1>
<tag1a></tag1a>
</tag1>
</document>
<images>
<image>
<name></name>
<size></size>
</image>
....(more images)
</images>
I need to get <images> into <document> like this:
<document>
<header></header>
<tag1>
<tag1a></tag1a>
</tag1>
<images>
<image>
<name></name>
<size></size>
</image>
....(more images)
</images>
</document>
Is there a way to do it in a bash script? or something like that? xmllint?
Thanks
ASKER
yeah there is. unfortunately the document.xml is a HUGE XML document...and the only thing I have in my arsonal is my XPATH.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
You can try programming xmllint, namely xmllint --shell which can traverse xml tree and emit converted structure(s) and validate against DTD after if needed.
If the layout is as you described, and the </tag1> tag only occurs once in the file, a simple awk would do it:
awk '/<\/tag1>/{print;system("cat image.xml");next}{print}' doc.xml > output.xml
It wouldn't be indented in the way you show, but that shouldn't affect the XML itself. If you really wanted it indented, that would be just a bit more complex and messier.
ASKER
that's pretty genius simon,
instead of tag then print, can you do print (cat...) before </document>
to guarantee placement?
instead of tag then print, can you do print (cat...) before </document>
to guarantee placement?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I would just:
cat document.xml images.xml > newdocument.xml
And then edit 'newdocument.xml' and move the "</document>" line.
Do you have like 1000 (or more) files that you need to do this with? Is there other stuff *after* the "</document>" line?