• C

Want simple-to-use free XML parser library.

Hi All,

Does anyone know of a free XML parser library I can link with in VS 6 (not .NET)?

I will read XML data out of a text file and I need to pull the values of known fields out of it. The chunks wont be more than a few K in length. I need a reasonable performance level as there may be a few million records in the data file but each record can be treated as a separate XML fragment.

If this is already available in VS6 could someone post me some sample code to pull fields out of the XML.

Here's a sample fragment:

<Description enabled=""1"">3 For 2 Briefs</Description><Description country=""GB"" language=""en"" variant="""">3 For 2 Briefs</Description>",1970-01-01 00:00:00,2069-12-31 00:00:00,1003,1127988423781,2005-09-22 14:35:25,"<Promotion><LastUpdated>2005-09-22T14:35:25+01:00</LastUpdated><MajorVersion>1</MajorVersion><MinorVersion>3</MinorVersion><PromotionID>9980051</PromotionID><Description enabled=""1"">3 For 2 Briefs</Description><Description country=""GB"" language=""en"" variant="""">3 For 2 Briefs</Description><MultibuyGroup><CriterionID>1</CriterionID><ThresholdType>1</ThresholdType><ThresholdValue>0.0</ThresholdValue><ThresholdValue currency=""GBP"">3.0</ThresholdValue><MinItemPriceRange>0.0</MinItemPriceRange><MinItemPriceRange currency=""GBP"">0.0</MinItemPriceRange><MaxItemPriceRange>0.0</MaxItemPriceRange><MaxItemPriceRange currency=""GBP"">0.0</MaxItemPriceRange><RewardType>7</RewardType><RewardValue>0.0</RewardValue><RewardValue currency=""GBP"">0.0</RewardValue><EffectiveRewardValue>0.0</EffectiveRewardValue><EffectiveRewardValue currency=""GBP"">0.0</EffectiveRewardValue><AccountToDept></AccountToDept><AlertThresholdValue>0.0</AlertThresholdValue><AlertThresholdValue currency=""GBP"">0.0</AlertThresholdValue><UseFixedValueInBestDeal>0</UseFixedValueInBestDeal><GroupDescription>3 for 2 breifs</GroupDescription><Rolling>0</Rolling><UniqueItems>0</UniqueItems><AllItems>0</AllItems><RoundingRule>2</RoundingRule><ProductID>1204071003</ProductID></MultibuyGroup><Timetable><CriterionID></CriterionID><StartDate>1970-01-01T00:00:00+00:00</StartDate><FinishDate>2069-12-31T00:00:00+00:00</FinishDate></Timetable><Notes enabled=""1""></Notes><AlertMessage enabled=""1""></AlertMessage></Promotion>

It defines a store promotion. I want to pull out fields from it such as RewardType=7 RewardValue=0.0

Thanks all.

Paul
LVL 16
PaulCaswellAsked:
Who is Participating?
 
fridomCEO/ProgrammerCommented:
No problem Paul just use libxml2. It's homepage is here
http://xmlsoft.org/

Or if you can stand COM programming with C and are running on Windows you can use msxml

Extracting information can be done with in different ways, you can walk a tree recursively of you can use xpath expression, in you case a xpath expression seems to be appropriate.

Here's some code to give you an idea
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <libxml/tree.h>
#include <libxml/parser.h>
#include <libxml/xmlreader.h>




xmlDocPtr parse_xml_file(char *file_name){
      xmlDocPtr result = NULL;
      assert(file_name);
      result = xmlReadFile(file_name, NULL, XML_PARSE_PEDANTIC);
      if (NULL == result) {
            fprintf(stderr, "Parsing of %s failed\n", file_name);
      }
      return result;
}

static void print_node_information (xmlNode *node){
      switch (node->type) {
      case XML_ELEMENT_NODE:
            printf("type = element, name = %s\n", (char*) node->name);
            break;
      case XML_TEXT_NODE:
            printf("type = text, name = %s, value = %s\n", (char*) node->name,
                              (char*) node->content);
            break;
      case XML_ATTRIBUTE_NODE:
            printf("type = attribute, name = %s, value = %s\n", (char*) node->name,
                             (char*)node->content);
            break;
    default:
            printf("not yet provided\n");
            break;
      }
}

static void walk (xmlNode *node, void (*action) (xmlNode*)){
      xmlNode *cur = NULL;
      for(cur = node; cur; cur = cur->next){
            if (! xmlIsBlankNode(cur)){
                  action(cur);
            }
            walk(cur->children, action);
      }
}


static void print_attributes (xmlNode *node){
      xmlAttr *attr =NULL;
      if (node->type == XML_ELEMENT_NODE){
            if (node->properties){
                  for(attr = node->properties; attr; attr= attr->next){
                        printf("the node named %s has an attribute with\n"
                               "the name %s it's value is %s\n", node->name,
                                       attr->name, attr->children->content);

                  }
            }
      }
}

int main(void){
      char *file_name = "bookstore.xml";
      xmlDoc *doc = NULL;
    xmlNode *root_element = NULL;

    LIBXML_TEST_VERSION

      doc = parse_xml_file(file_name);
      if (NULL == doc){
            exit(EXIT_FAILURE);
      }

      root_element = xmlDocGetRootElement(doc);
      if (NULL == root_element){
            fprintf(stderr, "Could not get the root_element\n");
            goto clean;
      }

      walk(root_element,  print_attributes);

      clean:
            if (doc) xmlFreeDoc(doc);
            xmlCleanupParser();



      return 0;
}

with the following xml file:
?xml version="1.0"?>
<!-- A fragment of a book store inventory database -->
<bookstore>
  <book genre="novel" publicationdate="1997" ISBN="1-861001-57-8">
    <title>Pride And Prejudice</title>
    <author>
      <first-name>Jane</first-name>
      <last-name>Austen</last-name>
    </author>
    <price>24.95</price>
  </book>
  <book genre="novel" publicationdate="1992" ISBN="1-861002-30-1">
    <title>The Handmaid's Tale</title>
    <author>
      <first-name>Margaret</first-name>
      <last-name>Atwood</last-name>
    </author>
    <price>29.95</price>
  </book>
  <book genre="novel" publicationdate="1991" ISBN="1-861001-57-6">
    <title>Emma</title>
    <author>
      <first-name>Jane</first-name>
      <last-name>Austen</last-name>
    </author>
    <price>19.95</price>
  </book>
  <book genre="novel" publicationdate="1982" ISBN="1-861001-45-3">
    <title>Sense and Sensibility</title>
    <author>
      <first-name>Jane</first-name>
      <last-name>Austen</last-name>
    </author>
    <price>19.95</price>
  </book>
</booksto

do I get (partly)
the node named book has an attribute with
the name genre it's value is novel
the node named book has an attribute with
the name publicationdate it's value is 1997
the node named book has an attribute with
the name ISBN it's value is 1-861001-57-8
the node named book has an attribute with
the name genre it's value is novel
the node named book has an attribute with
the name publicationdate it's value is 1992
the node named book has an attribute with
the name ISBN it's value is 1-861002-30-1
the node named book has an attribute with
the name genre it's value is novel

Regards
Friedrich
0
 
fridomCEO/ProgrammerCommented:
Sorry for the follow up in mxsml the same stuff looks like this:
#include <oaidl.h>
#include <objbase.h>
#include <oleauto.h>
#include <assert.h>
#include "helper.h"

#include "xml_utils.h"
#ifdef __LCC__
#include "msxml4.h"
#else
#import <msxml4.dll> raw_interfaces_only
using namespace MSXML2;
#endif
#ifndef HRCALL
#define HRCALL(a, errmsg) \
do { \
    h_rval = (a); \
    if (FAILED(h_rval)) { \
        fprintf(stderr, "%s:%d  HRCALL Failed: %s\n  0x%.8x = %s\n", \
                __FILE__, __LINE__, errmsg, h_rval, #a ); \
        goto clean; \
    } \
} while (0)
#endif

static void print_attributes_fun (IXMLDOMNode *node){
      BSTR node_name = NULL;
      BSTR attribute_name = NULL;
      VARIANT attribute_value;
      HRESULT h_rval;
      LONG count;
      IXMLDOMNamedNodeMap *attributes = NULL;
      IXMLDOMNode *an_attribute = NULL;
      OLECHAR *ole_node_name;
      VariantInit(&attribute_value);
      /* don't forget this or you program will crash */
      HRCALL(node->get_nodeName(&node_name), "getting node name");
      printf("length of node_name = %d\n", SysStringByteLen(node_name));

      if (0 == wcscmp(L"book", node_name)){
            /* according to the docs is it usually ok to treat BSTR as an OLECHAR */
            HRCALL(node->get_attributes(&attributes), "fetching attributes");
            /* you've seen this before in the has_attributes function */
            if (attributes){
                  HRCALL(attributes->get_length(&count), "getting count of attributes");
                  /* how many attributes are there? */
                  for (int i = 0; i < count; ++i){
                        HRCALL(attributes->get_item(i, &an_attribute), "");
                        /* each item is itself an IXMLDOMNode with all
                           the function of it.
                        */

                        HRCALL(an_attribute->get_nodeName(&attribute_name), "");
                        HRCALL(an_attribute->get_nodeValue(&attribute_value), "");
                        /* the value is a VARIANT, so you have to deal whith this
                           union
                        */
                        printf("the node named %ls has an attribute with\n"
                               "the name %ls it's value is %ls\n", node_name,
                                       attribute_name, V_BSTR(&attribute_value));
                        VariantClear(&attribute_value);
                        clean_sys_string(&attribute_name);
                        /* if you do not do the above two lines you application will leak ! */
                  }
            } else {
                  fprintf(stderr, "No attributes?\n");
            }
      }

      clean:
            clean_sys_string(&node_name);
            clean_var(&attributes, "attributes");
            clean_var(&an_attribute, "an attribute");
            clean_sys_string(&attribute_name);
            VariantClear(&attribute_value);

}


static void walk(IXMLDOMNode *el){
      IXMLDOMNode *cur;
      HRESULT h_rval;
      HRCALL(el->get_firstChild(&cur), "firstChild");
      /* get the first child */
      while(cur){
            print_attributes_fun(cur);
            walk(cur); /* recurse down if the node contains other nodes */
            HRCALL(cur->get_nextSibling(&cur), "nextSibling");
                  /* go to the next child */
      }

      clean:
            clean_var (&cur, "cur");
            /* clean up. Question why just one release here? */
}


static void show_reading_from_the_dom (IXMLDOMDocument2 * doc){
      IXMLDOMElement *root = NULL;
      BSTR node_name = NULL;
      HRESULT h_rval;
      HRCALL(doc->get_documentElement(&root), "getdocument");
      HRCALL(root->get_nodeName(&node_name), "getNodeName");
      walk(root);

      if (node_name){
            printf("the name of the root node is %ls\n", node_name);
      }

      clean_sys_string(&node_name);
      clean:
            clean_var(&root, "root element");
            if (node_name) clean_sys_string(&node_name);
}




int main(void){
      int result = EXIT_SUCCESS;
      int i_rval = 0;
      char *file_name = "bookstore.xml";
      char *new_file_name = "bs_changed.xml";
    IXMLDocument2 *doc = dom_from_com();
      doc = dom_from_com();
  if (NULL == doc){
        result = EXIT_FAILURE;
        goto clean;
      }

  i_rval = parse_xml_file(doc, file_name);
  if (i_rval < 0){
        result = EXIT_FAILURE;
        goto clean;
  }
  show_reading_from_the_dom(doc);
  return result;
  clean:
        clean_var(&doc, "cleaning the main COM object");

      return result;



and there a lot of utilities still are not shown here.

If you wan to do yourself a favour you better go for Visual Basic here ;-)

Regards
Friedrich
0
 
PaulCaswellAuthor Commented:
That sounds good Friedrich. Unfortunately I will be pulling the XML out of a CSV file and the XML will be in one of the fields.

Is there a mechanism I can use that allows me to pass the XML as a string to the parser?

Paul
0
The Firewall Audit Checklist

Preparing for a firewall audit today is almost impossible.
AlgoSec, together with some of the largest global organizations and auditors, has created a checklist to follow when preparing for your firewall audit. Simplify risk mitigation while staying compliant all of the time!

 
PaulCaswellAuthor Commented:
Friedrich,

I understand there are two main ways to walk XML, one using DOM and one much faster method (I forget what it is called).

Would it be fairly easy in my situation to use the faster one? I would prefer speed over completeness here. I dont need the integrity of the XML checked, I just want field names, attributes and values and I wouldnt mind walking the whole tree.

Thanks,

Paul
0
 
PaulCaswellAuthor Commented:
Darn!

My boss just said it has to be done in Java! Oh well! Thanks for the help!

Paul
0
 
fridomCEO/ProgrammerCommented:
Ok, one question after the other:
1) is there a mechanism to read xml from a string?
Yes, there is an function which used a string instead of a file
2) DOM and SAX. Sax is a streaming protocol, will say you register call backs while walking the xml, yes you could use the SAX part of libxml2 to do the job, and yes it's supposed to be much faster. DOM has to parse all the file and build an internal tree representation of the XML. SAX does not care a dime it just reacts on reading the part you are interested in.

However you probably have heard from Ajax and there you always have DOM representation. One can inspect the next round of updates to satisfy the coming memory needs.


3) Not too bad

Have a nice day
Friedrich
0
 
Anu2117Commented:
Hi Friedrich,
 
    I was wondering if its possible to have recursive function return the string instead of void.  I want to modify below code such that, currently whatever goes to print stmt, should instead go to string and walk() should return that string.
Thanks,
 
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <libxml/tree.h>
#include <libxml/parser.h>
#include <libxml/xmlreader.h>




xmlDocPtr parse_xml_file(char *file_name){
      xmlDocPtr result = NULL;
      assert(file_name);
      result = xmlReadFile(file_name, NULL, XML_PARSE_PEDANTIC);
      if (NULL == result) {
            fprintf(stderr, "Parsing of %s failed\n", file_name);
      }
      return result;
}

static void print_node_information (xmlNode *node){
      switch (node->type) {
      case XML_ELEMENT_NODE:
            printf("type = element, name = %s\n", (char*) node->name);
            break;
      case XML_TEXT_NODE:
            printf("type = text, name = %s, value = %s\n", (char*) node->name,
                              (char*) node->content);
            break;
      case XML_ATTRIBUTE_NODE:
            printf("type = attribute, name = %s, value = %s\n", (char*) node->name,
                             (char*)node->content);
            break;
    default:
            printf("not yet provided\n");
            break;
      }
}

static void walk (xmlNode *node, void (*action) (xmlNode*)){
      xmlNode *cur = NULL;
      for(cur = node; cur; cur = cur->next){
            if (! xmlIsBlankNode(cur)){
                  action(cur);
            }
            walk(cur->children, action);
      }
}


static void print_attributes (xmlNode *node){
      xmlAttr *attr =NULL;
      if (node->type == XML_ELEMENT_NODE){
            if (node->properties){
                  for(attr = node->properties; attr; attr= attr->next){
                        printf("the node named %s has an attribute with\n"
                               "the name %s it's value is %s\n", node->name,
                                       attr->name, attr->children->content);

                  }
            }
      }
}

int main(void){
      char *file_name = "bookstore.xml";
      xmlDoc *doc = NULL;
    xmlNode *root_element = NULL;

    LIBXML_TEST_VERSION

      doc = parse_xml_file(file_name);
      if (NULL == doc){
            exit(EXIT_FAILURE);
      }

      root_element = xmlDocGetRootElement(doc);
      if (NULL == root_element){
            fprintf(stderr, "Could not get the root_element\n");
            goto clean;
      }

      walk(root_element,  print_attributes);

      clean:
            if (doc) xmlFreeDoc(doc);
            xmlCleanupParser();



      return 0;
}
0
 
fridomCEO/ProgrammerCommented:
Well  it's programming so you can replace void (*action) (xmNode*) with whatever you like.
if you want a string just use char* (*action)(xmlNode*) and rewrite the walk method.

I can't tell what you like to achive and your mail does not state that rally. I've not idea what you mean with the string as  return value.

So you can use something like

char *very_long_string = malloc(1024*1024 * 1024) ; /* around 1 MB */
then in print_attribute you append to the string or return something longer.

 for(attr = node->properties; attr; attr= attr->next){
                       printf("the node named %s has an attribute with\n"
                              "the name %s it's value is %s\n", node->name,
                                      attr->name, attr->children->content);

               

use somethign like appent_to_string which  just appends that stuff to a string. If you do not like to do that "by hand" have a look at e.g glib which will give you quite a few funky stuff for handling strings
http://library.gnome.org/devel/glib/stable/

If you'd clarify what you like to see, and do not have the means  to that you might search for someone beeing able to implement that for you.

Regards
Friedrich


0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.