• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 639
  • Last Modified:

xml

I have some XML data from an application that I want in Excel.  Each record is it's own XML file.  I need a way to cylce through all of the XML files in a particular directory and parse out a few data items from the XML.  is there an easy wat to do this?   I am running Excel:Mac 2004 and OS/X 10.4.6.  

Thanks
JDF
0
johnfaig
Asked:
johnfaig
  • 4
  • 3
1 Solution
 
d_g_watsonCommented:
You could write a simple Java class to parse the XML files, extract the relevant data, then output to a comma separated value (csv) file. CSV files can be loaded in to Excel then saved as XLS files.

Hope that helps,
Dave.
0
 
johnfaigAuthor Commented:
d_g_watson,

Can you point me towards a code segment to get me started???

JDF
0
 
d_g_watsonCommented:
This is a bit rough, but I think it demonstrates everything you should need:

package test.xml;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.DOMException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class ParseXmlFile
{
   public static void main(String[] args)
   {  
      try
      {
         // Read the XML document from file
         Document xmlDocument = null;
         {
            // XML file looks like:
            //<elementRoot>
            //   <elementA>Value A</elementA>
            //   <elementB>Value B</elementB>
            //</elementRoot>
            File xmlFile = new File("test.xml");
           
            DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
            xmlDocument = docBuilder.parse(xmlFile);
         }
         
         // Extract relevant info from the XML document
         // and write to CSV file. Resulting CSV file will look like:
         // elementA,Value A
         // elementB,Value B
         {
            // Create the CSV file
            File outputFile = new File("test.csv");
            outputFile.createNewFile();
            FileOutputStream outputFileStream = new FileOutputStream(outputFile, true);
         
            // Extract data from the DOM tree
            Node rootElement = xmlDocument.getFirstChild();
            NodeList childNodes = rootElement.getChildNodes();            
            for(int i = 0; i < childNodes.getLength(); i++)
            {
               Node currentNode = childNodes.item(i);
               if(currentNode.getNodeType() == Node.ELEMENT_NODE)
               {
                  String elementName = currentNode.getNodeName();
                  String elementValue = currentNode.getFirstChild().getNodeValue();
                 
                  outputFileStream.write((elementName + "," + elementValue + "\n").getBytes());
               }
            }
           
            // Close the file output stream
            outputFileStream.close();
         }
      }
      catch (DOMException e)
      {
         e.printStackTrace();
      }
      catch (FactoryConfigurationError e)
      {
         e.printStackTrace();
      }
      catch (ParserConfigurationException e)
      {
         e.printStackTrace();
      }
      catch (SAXException e)
      {
         e.printStackTrace();
      }
      catch (IOException e)
      {
         e.printStackTrace();
      }
   }
}
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
johnfaigAuthor Commented:
d_g_watson,

I don't want you to debug my code, but please take a quick look at the error message I received.  

[Fatal Error] test.xml:1:1: Content is not allowed in prolog.
org.xml.sax.SAXParseException: Content is not allowed in prolog.
        at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:264)
        at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:292)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:172)
        at javaapplication1.Main.main(Main.java:61)

Thanks
JDF
0
 
d_g_watsonCommented:
My first guess would be that there's something wrong with the XML being input - either the XML is invalid, or the file is not being read properly.

If you're sure the file contains valid XML (as you probably are), I would investigate if the file is being read properly.

Try inserting this after the line: File xmlFile = new File("test.xml");


            if(xmlFile.exists())
            {
               System.out.println("File exists!");
            }
            else
            {
               System.out.println("File DOES NOT exist!");
            }

This will tell you if the class is actually finding the "test.xml" file.

Let me know how that goes.
0
 
johnfaigAuthor Commented:
d_g_Watson,

The file exists and the file contents are listed below.  I'm not sure if the xml is syntactically correct, but I have no control over it.

<FCFORMSHEADER>
<?xml version="1.0"?>
<!DOCTYPE FIRSTCLASS SYSTEM "firstclass.dtd">
<firstclass>
      <fcobject objtype="oConfItem" formid="141" objname="Teacher">
            <field id="8018" index="0" type="number">23009</field>
            <subject index="0" >Student</subject>
            <tonames index="0" >RAP Slip</tonames>
            <tonames index="1" ></tonames>
            <ccnames index="0" ></ccnames>
            <bccnames index="0" ></bccnames>
      </fcobject>
</firstclass>
</FCFORMSHEADER>
0
 
d_g_watsonCommented:
Ah...this isn't valid XML. The "?xml" and "!DOCTYPE" declarations must come first, like this:

<?xml version="1.0"?>
<!DOCTYPE FIRSTCLASS SYSTEM "firstclass.dtd">
<FCFORMSHEADER>
   <firstclass>
        <fcobject objtype="oConfItem" formid="141" objname="Teacher">
             <field id="8018" index="0" type="number">23009</field>
             <subject index="0" >Student</subject>
             <tonames index="0" >RAP Slip</tonames>
             <tonames index="1" ></tonames>
             <ccnames index="0" ></ccnames>
             <bccnames index="0" ></bccnames>
        </fcobject>
   </firstclass>
</FCFORMSHEADER>

Are all your XML fragments like this? If it's not possible to change the XML, you may want to read the file in as a string, then perform some manipulation on the string to make it valid XML. For example, you could remove the "FCFORMSHEADER" start and end elements, or you could remove the "?xml" and "!DOCTYPE" declarations. Either of these options would make the XML valid ("well formed").

Alternatively, if your XML fragments are all of a similar format, you could always just read the file as a string, and extract the relevant data using regular expressions. This could get messy though, if your XML varies in size and format. See java.util.regex.Pattern javadocs for more info about regular expressions.

The easiest option is probably to strip off the start and end tags using string manipulation. This would leave you with:

<?xml version="1.0"?>
<!DOCTYPE FIRSTCLASS SYSTEM "firstclass.dtd">
<firstclass>
     <fcobject objtype="oConfItem" formid="141" objname="Teacher">
          <field id="8018" index="0" type="number">23009</field>
          <subject index="0" >Student</subject>
          <tonames index="0" >RAP Slip</tonames>
          <tonames index="1" ></tonames>
          <ccnames index="0" ></ccnames>
          <bccnames index="0" ></bccnames>
     </fcobject>
</firstclass>

This is well formed XML.

~Dave
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now