Link to home
Start Free TrialLog in
Avatar of Evan Cutler
Evan CutlerFlag for United States of America

asked on

merging XML in bash

Greetings,
I have two xml documents:
<document>
     <header></header>
     <tag1>
          <tag1a></tag1a>
    </tag1>
</document>

Open in new window


<images>
     <image>
          <name></name>
          <size></size>
     </image>
     ....(more images)
</images>

Open in new window


I need to get <images> into <document> like this:
<document>
     <header></header>
     <tag1>
          <tag1a></tag1a>
    </tag1>
     <images>
          <image>
               <name></name>
               <size></size>
          </image>
          ....(more images)
     </images>
</document>

Open in new window


Is there a way to do it in a bash script? or something like that?  xmllint?

Thanks
Avatar of Nem Schlecht
Nem Schlecht
Flag of United States of America image

Not that I know of that is XML aware.

I would just:

cat document.xml images.xml > newdocument.xml

And then edit 'newdocument.xml' and move the "</document>" line.

Do you have like 1000 (or more) files that you need to do this with?  Is there other stuff *after* the "</document>" line?
Avatar of Evan Cutler

ASKER

yeah there is.  unfortunately the document.xml is a HUGE XML document...and the only thing I have in my arsonal is my XPATH.
SOLUTION
Avatar of Nem Schlecht
Nem Schlecht
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You can try programming xmllint, namely xmllint --shell which can traverse xml tree and emit converted structure(s) and validate against DTD after if needed.
If the layout is as you described, and the </tag1> tag only occurs once in the file, a simple awk would do it:
awk '/<\/tag1>/{print;system("cat image.xml");next}{print}' doc.xml > output.xml

Open in new window

It wouldn't be indented in the way you show, but that shouldn't affect the XML itself.  If you really wanted it indented, that would be just a bit more complex and messier.
that's pretty genius simon,
instead of tag then print, can you do print (cat...) before </document>
to guarantee placement?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial