how do I edit office 2007 word xml files programatically
Posted on 2007-11-30
If you take the new *.docx files that MSWord 2007 produces when it saves, and open it up in WinZip, you can see all the of xml files that make up the "dna" of the word file. One of the xml files is called 'document.xml' that stores all of the text context in the word doc.
So here is the question.. if you make a simple word doc, for example a doc that only contains the text "Hello World", you can easily find the tag the contains this text in the 'document.xml' thats part of the docx package. However, when I tried to change this tag in a text editor (to, lets say "Goodbye, World" , replace the original 'document.xml' with the modified one, and re-zip the package, word thinks something is corrupted.
I must be missing something because I know that the whole point of using XML as the source for a document is that other programing languages can edit the XML directly and read/modify/create word docs. I can't even get this to work when editing it manually in a text editor!
I suspect that the zipping step isnt correct, or maybe I have to update another xml file in the package to accept the modified document.xml file? I would like to learn how to edit MSOffice documents by editing the underlying XML... what am I missing? Ill give points for an answer that explains why my zipping didnt work rather than answers that are just links to general information about open xml for office. I feel like I read them all and couldnt find the answer!