Automatically Generate Documents from XML database

Posted on 2014-03-14
Last Modified: 2016-05-30
I am in the early phases of researching a solution to a very niche need. Basically we are taking data from an XML data file, and we want to generate a word document or PDF from a preexisting template and have the XML file fill in variable fields, the rest of the text is static. Now normally this doesn't seem like it would be too hard, but where most of the solutions we have tried are getting hung up is that sometimes we will have multiple fields of the same type, say "Product Code 1," Product Code 2," etc.

What we would like is something that, when it parses the XML file and finds data with the same tag it has encountered before, inserts a new field on a new line.

So if the XML file had three items with the product code tag, the part of the document that has the product codes would say:
Product Code 1: XXXX
Product Code 2: XXXX
Product Code 3: XXXX

If the XML file had 2 it would say:
Product Code 1: XXXX
Product Code 2: XXXX

and it would exclude these fields altogether if it didn't contain any...

Anyway, I am trying to figure out if I should develop something in VBA if possible, but since I want this to be automatic, I was thinking of doing it in C#. However, I am open to third party software as well. I know this is a weird question, so please ask for clarification if necessary!

Question by:indigo6
  • 4
  • 2
  • 2
  • +1
LVL 60

Accepted Solution

Geert Bormans earned 250 total points
ID: 39931294
Honestly, this is not a weird question at all.
I had to do similar things for a multitude of customers already.

My approach on this would be
- have the template in XML (for Word that could be WORD2003 XML, for PDF, XSL-FO would be a good option)
- pull data from the XML databas en mix it in the XML template (XSLT would be a good choice for that)
- you can call the XSLT from C# or whatever

If you save a word2003 XML with a .doc extension Word will open it seamlessly
LVL 45

Expert Comment

ID: 39931309
LVL 60

Expert Comment

by:Geert Bormans
ID: 39931460
I disagree on DITA,
There is a publishing framework for DITA to create Word and PDF, the Open Toolkit (Aikimarks 2nd link)
But it could be pretty though to customize the OT for merging data in
DITA is meant for single source publishing and component reuse,
not for merging XML data with document templates
Unless you have an automated process that generates DITA on the fly from merging your XML data with some stubb info, I see little added value. Why go for a generated DITA solution if direct publishing is more straightforward
LVL 45

Assisted Solution

aikimark earned 250 total points
ID: 39931725
I think my suggest to "look at" DITA was misconstrued as an "implement DITA" recommendation.  DITA implementation is not a trivial undertaking.  However, if your client/employer might be able to grow into a larger system, then knowing the DITA framework and tools might help you make some decisions with this task that would facilitate easier later DITA implementation.

In addition to Geert's suggestions, you have other solution path options...
* Some companies, such as DataDirect, have ODBC drivers for XML data.
* You can use Access or Excel to read the XML data and then consume their data once it is in row and column format.
* If the XML file was saved with the recordset.Save method and a second parameter of adPersistXML, then you should be able to open a recordset with a provider of "MSPersist;"
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline


Author Comment

ID: 39934444
Thank you all for the suggestions! I will look into these today. I also have demos scheduled with windward, hotdocs, and Ecrion software, as I won't be around too much longer to maintain the system, I'd like my successor to be able to call upon support if the requirements change.
LVL 60

Expert Comment

by:Geert Bormans
ID: 39934664
Ecrion actually does XSL-FO which I believe would be a good choice

Author Comment

ID: 39943251
Ok, thank you guys! I am scheduling live demos this week and will let you know what I go with.
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 41624563
I've requested that this question be deleted for the following reason:

Not enough information to confirm an answer.
LVL 60

Expert Comment

by:Geert Bormans
ID: 41624564
https:#a39931294 gives an industry proven approach for solving exactly the issue the OP was facing.
The question asked for an "approach"
A valid "approach" have been offered and detailed as far as posible without writing code
The approach is generally usefull for other viewing this question
Please accept as an answer: https:#a39931294

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now