?
Solved

Want simple-to-use free XML parser library.

Posted on 2006-06-07
8
Medium Priority
?
943 Views
Last Modified: 2012-05-05
Hi All,

Does anyone know of a free XML parser library I can link with in VS 6 (not .NET)?

I will read XML data out of a text file and I need to pull the values of known fields out of it. The chunks wont be more than a few K in length. I need a reasonable performance level as there may be a few million records in the data file but each record can be treated as a separate XML fragment.

If this is already available in VS6 could someone post me some sample code to pull fields out of the XML.

Here's a sample fragment:

<Description enabled=""1"">3 For 2 Briefs</Description><Description country=""GB"" language=""en"" variant="""">3 For 2 Briefs</Description>",1970-01-01 00:00:00,2069-12-31 00:00:00,1003,1127988423781,2005-09-22 14:35:25,"<Promotion><LastUpdated>2005-09-22T14:35:25+01:00</LastUpdated><MajorVersion>1</MajorVersion><MinorVersion>3</MinorVersion><PromotionID>9980051</PromotionID><Description enabled=""1"">3 For 2 Briefs</Description><Description country=""GB"" language=""en"" variant="""">3 For 2 Briefs</Description><MultibuyGroup><CriterionID>1</CriterionID><ThresholdType>1</ThresholdType><ThresholdValue>0.0</ThresholdValue><ThresholdValue currency=""GBP"">3.0</ThresholdValue><MinItemPriceRange>0.0</MinItemPriceRange><MinItemPriceRange currency=""GBP"">0.0</MinItemPriceRange><MaxItemPriceRange>0.0</MaxItemPriceRange><MaxItemPriceRange currency=""GBP"">0.0</MaxItemPriceRange><RewardType>7</RewardType><RewardValue>0.0</RewardValue><RewardValue currency=""GBP"">0.0</RewardValue><EffectiveRewardValue>0.0</EffectiveRewardValue><EffectiveRewardValue currency=""GBP"">0.0</EffectiveRewardValue><AccountToDept></AccountToDept><AlertThresholdValue>0.0</AlertThresholdValue><AlertThresholdValue currency=""GBP"">0.0</AlertThresholdValue><UseFixedValueInBestDeal>0</UseFixedValueInBestDeal><GroupDescription>3 for 2 breifs</GroupDescription><Rolling>0</Rolling><UniqueItems>0</UniqueItems><AllItems>0</AllItems><RoundingRule>2</RoundingRule><ProductID>1204071003</ProductID></MultibuyGroup><Timetable><CriterionID></CriterionID><StartDate>1970-01-01T00:00:00+00:00</StartDate><FinishDate>2069-12-31T00:00:00+00:00</FinishDate></Timetable><Notes enabled=""1""></Notes><AlertMessage enabled=""1""></AlertMessage></Promotion>

It defines a store promotion. I want to pull out fields from it such as RewardType=7 RewardValue=0.0

Thanks all.

Paul
0
Comment
Question by:PaulCaswell
  • 4
  • 3
8 Comments
 
LVL 24

Accepted Solution

by:
fridom earned 500 total points
ID: 16852072
No problem Paul just use libxml2. It's homepage is here
http://xmlsoft.org/

Or if you can stand COM programming with C and are running on Windows you can use msxml

Extracting information can be done with in different ways, you can walk a tree recursively of you can use xpath expression, in you case a xpath expression seems to be appropriate.

Here's some code to give you an idea
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <libxml/tree.h>
#include <libxml/parser.h>
#include <libxml/xmlreader.h>




xmlDocPtr parse_xml_file(char *file_name){
      xmlDocPtr result = NULL;
      assert(file_name);
      result = xmlReadFile(file_name, NULL, XML_PARSE_PEDANTIC);
      if (NULL == result) {
            fprintf(stderr, "Parsing of %s failed\n", file_name);
      }
      return result;
}

static void print_node_information (xmlNode *node){
      switch (node->type) {
      case XML_ELEMENT_NODE:
            printf("type = element, name = %s\n", (char*) node->name);
            break;
      case XML_TEXT_NODE:
            printf("type = text, name = %s, value = %s\n", (char*) node->name,
                              (char*) node->content);
            break;
      case XML_ATTRIBUTE_NODE:
            printf("type = attribute, name = %s, value = %s\n", (char*) node->name,
                             (char*)node->content);
            break;
    default:
            printf("not yet provided\n");
            break;
      }
}

static void walk (xmlNode *node, void (*action) (xmlNode*)){
      xmlNode *cur = NULL;
      for(cur = node; cur; cur = cur->next){
            if (! xmlIsBlankNode(cur)){
                  action(cur);
            }
            walk(cur->children, action);
      }
}


static void print_attributes (xmlNode *node){
      xmlAttr *attr =NULL;
      if (node->type == XML_ELEMENT_NODE){
            if (node->properties){
                  for(attr = node->properties; attr; attr= attr->next){
                        printf("the node named %s has an attribute with\n"
                               "the name %s it's value is %s\n", node->name,
                                       attr->name, attr->children->content);

                  }
            }
      }
}

int main(void){
      char *file_name = "bookstore.xml";
      xmlDoc *doc = NULL;
    xmlNode *root_element = NULL;

    LIBXML_TEST_VERSION

      doc = parse_xml_file(file_name);
      if (NULL == doc){
            exit(EXIT_FAILURE);
      }

      root_element = xmlDocGetRootElement(doc);
      if (NULL == root_element){
            fprintf(stderr, "Could not get the root_element\n");
            goto clean;
      }

      walk(root_element,  print_attributes);

      clean:
            if (doc) xmlFreeDoc(doc);
            xmlCleanupParser();



      return 0;
}

with the following xml file:
?xml version="1.0"?>
<!-- A fragment of a book store inventory database -->
<bookstore>
  <book genre="novel" publicationdate="1997" ISBN="1-861001-57-8">
    <title>Pride And Prejudice</title>
    <author>
      <first-name>Jane</first-name>
      <last-name>Austen</last-name>
    </author>
    <price>24.95</price>
  </book>
  <book genre="novel" publicationdate="1992" ISBN="1-861002-30-1">
    <title>The Handmaid's Tale</title>
    <author>
      <first-name>Margaret</first-name>
      <last-name>Atwood</last-name>
    </author>
    <price>29.95</price>
  </book>
  <book genre="novel" publicationdate="1991" ISBN="1-861001-57-6">
    <title>Emma</title>
    <author>
      <first-name>Jane</first-name>
      <last-name>Austen</last-name>
    </author>
    <price>19.95</price>
  </book>
  <book genre="novel" publicationdate="1982" ISBN="1-861001-45-3">
    <title>Sense and Sensibility</title>
    <author>
      <first-name>Jane</first-name>
      <last-name>Austen</last-name>
    </author>
    <price>19.95</price>
  </book>
</booksto

do I get (partly)
the node named book has an attribute with
the name genre it's value is novel
the node named book has an attribute with
the name publicationdate it's value is 1997
the node named book has an attribute with
the name ISBN it's value is 1-861001-57-8
the node named book has an attribute with
the name genre it's value is novel
the node named book has an attribute with
the name publicationdate it's value is 1992
the node named book has an attribute with
the name ISBN it's value is 1-861002-30-1
the node named book has an attribute with
the name genre it's value is novel

Regards
Friedrich
0
 
LVL 24

Expert Comment

by:fridom
ID: 16852096
Sorry for the follow up in mxsml the same stuff looks like this:
#include <oaidl.h>
#include <objbase.h>
#include <oleauto.h>
#include <assert.h>
#include "helper.h"

#include "xml_utils.h"
#ifdef __LCC__
#include "msxml4.h"
#else
#import <msxml4.dll> raw_interfaces_only
using namespace MSXML2;
#endif
#ifndef HRCALL
#define HRCALL(a, errmsg) \
do { \
    h_rval = (a); \
    if (FAILED(h_rval)) { \
        fprintf(stderr, "%s:%d  HRCALL Failed: %s\n  0x%.8x = %s\n", \
                __FILE__, __LINE__, errmsg, h_rval, #a ); \
        goto clean; \
    } \
} while (0)
#endif

static void print_attributes_fun (IXMLDOMNode *node){
      BSTR node_name = NULL;
      BSTR attribute_name = NULL;
      VARIANT attribute_value;
      HRESULT h_rval;
      LONG count;
      IXMLDOMNamedNodeMap *attributes = NULL;
      IXMLDOMNode *an_attribute = NULL;
      OLECHAR *ole_node_name;
      VariantInit(&attribute_value);
      /* don't forget this or you program will crash */
      HRCALL(node->get_nodeName(&node_name), "getting node name");
      printf("length of node_name = %d\n", SysStringByteLen(node_name));

      if (0 == wcscmp(L"book", node_name)){
            /* according to the docs is it usually ok to treat BSTR as an OLECHAR */
            HRCALL(node->get_attributes(&attributes), "fetching attributes");
            /* you've seen this before in the has_attributes function */
            if (attributes){
                  HRCALL(attributes->get_length(&count), "getting count of attributes");
                  /* how many attributes are there? */
                  for (int i = 0; i < count; ++i){
                        HRCALL(attributes->get_item(i, &an_attribute), "");
                        /* each item is itself an IXMLDOMNode with all
                           the function of it.
                        */

                        HRCALL(an_attribute->get_nodeName(&attribute_name), "");
                        HRCALL(an_attribute->get_nodeValue(&attribute_value), "");
                        /* the value is a VARIANT, so you have to deal whith this
                           union
                        */
                        printf("the node named %ls has an attribute with\n"
                               "the name %ls it's value is %ls\n", node_name,
                                       attribute_name, V_BSTR(&attribute_value));
                        VariantClear(&attribute_value);
                        clean_sys_string(&attribute_name);
                        /* if you do not do the above two lines you application will leak ! */
                  }
            } else {
                  fprintf(stderr, "No attributes?\n");
            }
      }

      clean:
            clean_sys_string(&node_name);
            clean_var(&attributes, "attributes");
            clean_var(&an_attribute, "an attribute");
            clean_sys_string(&attribute_name);
            VariantClear(&attribute_value);

}


static void walk(IXMLDOMNode *el){
      IXMLDOMNode *cur;
      HRESULT h_rval;
      HRCALL(el->get_firstChild(&cur), "firstChild");
      /* get the first child */
      while(cur){
            print_attributes_fun(cur);
            walk(cur); /* recurse down if the node contains other nodes */
            HRCALL(cur->get_nextSibling(&cur), "nextSibling");
                  /* go to the next child */
      }

      clean:
            clean_var (&cur, "cur");
            /* clean up. Question why just one release here? */
}


static void show_reading_from_the_dom (IXMLDOMDocument2 * doc){
      IXMLDOMElement *root = NULL;
      BSTR node_name = NULL;
      HRESULT h_rval;
      HRCALL(doc->get_documentElement(&root), "getdocument");
      HRCALL(root->get_nodeName(&node_name), "getNodeName");
      walk(root);

      if (node_name){
            printf("the name of the root node is %ls\n", node_name);
      }

      clean_sys_string(&node_name);
      clean:
            clean_var(&root, "root element");
            if (node_name) clean_sys_string(&node_name);
}




int main(void){
      int result = EXIT_SUCCESS;
      int i_rval = 0;
      char *file_name = "bookstore.xml";
      char *new_file_name = "bs_changed.xml";
    IXMLDocument2 *doc = dom_from_com();
      doc = dom_from_com();
  if (NULL == doc){
        result = EXIT_FAILURE;
        goto clean;
      }

  i_rval = parse_xml_file(doc, file_name);
  if (i_rval < 0){
        result = EXIT_FAILURE;
        goto clean;
  }
  show_reading_from_the_dom(doc);
  return result;
  clean:
        clean_var(&doc, "cleaning the main COM object");

      return result;



and there a lot of utilities still are not shown here.

If you wan to do yourself a favour you better go for Visual Basic here ;-)

Regards
Friedrich
0
 
LVL 16

Author Comment

by:PaulCaswell
ID: 16852124
That sounds good Friedrich. Unfortunately I will be pulling the XML out of a CSV file and the XML will be in one of the fields.

Is there a mechanism I can use that allows me to pass the XML as a string to the parser?

Paul
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 16

Author Comment

by:PaulCaswell
ID: 16852403
Friedrich,

I understand there are two main ways to walk XML, one using DOM and one much faster method (I forget what it is called).

Would it be fairly easy in my situation to use the faster one? I would prefer speed over completeness here. I dont need the integrity of the XML checked, I just want field names, attributes and values and I wouldnt mind walking the whole tree.

Thanks,

Paul
0
 
LVL 16

Author Comment

by:PaulCaswell
ID: 16852593
Darn!

My boss just said it has to be done in Java! Oh well! Thanks for the help!

Paul
0
 
LVL 24

Expert Comment

by:fridom
ID: 16852690
Ok, one question after the other:
1) is there a mechanism to read xml from a string?
Yes, there is an function which used a string instead of a file
2) DOM and SAX. Sax is a streaming protocol, will say you register call backs while walking the xml, yes you could use the SAX part of libxml2 to do the job, and yes it's supposed to be much faster. DOM has to parse all the file and build an internal tree representation of the XML. SAX does not care a dime it just reacts on reading the part you are interested in.

However you probably have heard from Ajax and there you always have DOM representation. One can inspect the next round of updates to satisfy the coming memory needs.


3) Not too bad

Have a nice day
Friedrich
0
 

Expert Comment

by:Anu2117
ID: 22819591
Hi Friedrich,
 
    I was wondering if its possible to have recursive function return the string instead of void.  I want to modify below code such that, currently whatever goes to print stmt, should instead go to string and walk() should return that string.
Thanks,
 
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <libxml/tree.h>
#include <libxml/parser.h>
#include <libxml/xmlreader.h>




xmlDocPtr parse_xml_file(char *file_name){
      xmlDocPtr result = NULL;
      assert(file_name);
      result = xmlReadFile(file_name, NULL, XML_PARSE_PEDANTIC);
      if (NULL == result) {
            fprintf(stderr, "Parsing of %s failed\n", file_name);
      }
      return result;
}

static void print_node_information (xmlNode *node){
      switch (node->type) {
      case XML_ELEMENT_NODE:
            printf("type = element, name = %s\n", (char*) node->name);
            break;
      case XML_TEXT_NODE:
            printf("type = text, name = %s, value = %s\n", (char*) node->name,
                              (char*) node->content);
            break;
      case XML_ATTRIBUTE_NODE:
            printf("type = attribute, name = %s, value = %s\n", (char*) node->name,
                             (char*)node->content);
            break;
    default:
            printf("not yet provided\n");
            break;
      }
}

static void walk (xmlNode *node, void (*action) (xmlNode*)){
      xmlNode *cur = NULL;
      for(cur = node; cur; cur = cur->next){
            if (! xmlIsBlankNode(cur)){
                  action(cur);
            }
            walk(cur->children, action);
      }
}


static void print_attributes (xmlNode *node){
      xmlAttr *attr =NULL;
      if (node->type == XML_ELEMENT_NODE){
            if (node->properties){
                  for(attr = node->properties; attr; attr= attr->next){
                        printf("the node named %s has an attribute with\n"
                               "the name %s it's value is %s\n", node->name,
                                       attr->name, attr->children->content);

                  }
            }
      }
}

int main(void){
      char *file_name = "bookstore.xml";
      xmlDoc *doc = NULL;
    xmlNode *root_element = NULL;

    LIBXML_TEST_VERSION

      doc = parse_xml_file(file_name);
      if (NULL == doc){
            exit(EXIT_FAILURE);
      }

      root_element = xmlDocGetRootElement(doc);
      if (NULL == root_element){
            fprintf(stderr, "Could not get the root_element\n");
            goto clean;
      }

      walk(root_element,  print_attributes);

      clean:
            if (doc) xmlFreeDoc(doc);
            xmlCleanupParser();



      return 0;
}
0
 
LVL 24

Expert Comment

by:fridom
ID: 22819681
Well  it's programming so you can replace void (*action) (xmNode*) with whatever you like.
if you want a string just use char* (*action)(xmlNode*) and rewrite the walk method.

I can't tell what you like to achive and your mail does not state that rally. I've not idea what you mean with the string as  return value.

So you can use something like

char *very_long_string = malloc(1024*1024 * 1024) ; /* around 1 MB */
then in print_attribute you append to the string or return something longer.

 for(attr = node->properties; attr; attr= attr->next){
                       printf("the node named %s has an attribute with\n"
                              "the name %s it's value is %s\n", node->name,
                                      attr->name, attr->children->content);

               

use somethign like appent_to_string which  just appends that stuff to a string. If you do not like to do that "by hand" have a look at e.g glib which will give you quite a few funky stuff for handling strings
http://library.gnome.org/devel/glib/stable/

If you'd clarify what you like to see, and do not have the means  to that you might search for someone beeing able to implement that for you.

Regards
Friedrich


0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use nested-loops in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use conditional statements in the C programming language.
Suggested Courses
Course of the Month17 days, 3 hours left to enroll

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question