XmlTextReader or DOM/XPath? Custom forward only parsing class?

Posted on 2004-03-30
Medium Priority
Last Modified: 2012-08-13

My web app uses content parsed from XML files with an XmlTextReader.

I have read that the XmlTextReader employs a forward only pull method that is much less memory intensive than using the DOM.

I am not positive but I think that if my page class uses an XmlTextReader that the content from the reader is not cached. Instead this part of the page is rendered each time with my XmlTextReader code. This actually represents a substantial amount of code because I am storing a bunch of XHTML structural markup in my XML files along with meta data, workflow, etc.

At this point my app is starting to look like a big plate of spaghetti. I've got a bunch of counter variables, collections, and conditions. I don't have any formal OO training, but I am feeling like this would be a good time for me to write a class that parses my XML taken from the XmlTextReader. I'm also using this same logic in a number of places.

What I would really like to know is whether this makes any sense at all? Am I wasting my time doing this? If ASP.NET would serve cached pages until changes were made to the XML files, I could just use the DOM and life would be easier. On the other hand if this is not possible it seems like writing this custom class would be handy?

I have read that performance is improved by up to 5 times with forward only parsing. If this is the case, it seems like it would make sense to write a class to help with this logic? For example, I've noticed that the XmlTextReader counts both the opening and closing tags with the name property. So, I've written logic that keeps track with counter variables and checks to see if I've hit the closing tag with a statement like
if(subSwitchCount % 2 == 0)

This allows me to store the keywords below in a multidimensional array so the categories remain separate.

<item title="Home">
      <keyword>help seeking</keyword>
      <keyword>information seeking</keyword>
      <keyword>social networks</keyword>
      <keyword>community information</keyword>
      <keyword>information psychology</keyword>
      <keyword>information seeking contexts</keyword>
      <keyword>University of Washington</keyword>
      <keyword>University of Michigan</keyword>
      <keyword>Information School</keyword>
      <keyword>School of Information</keyword>
    <description>Information Behavior in Everyday Contexts (IBEC) - </description>
  <item title="About IBEC">
      <keyword>Contact information</keyword>
      <keyword>Funding organizations</keyword>
      <keyword>Best practice</keyword>
      <keyword>Information products</keyword>
      <keyword>Information delivery</keyword>
    <description>The following page contains detailed information about IBEC’s research efforts to maximize the impact of information in communities.</description>
  <item title="Projects">
      <keyword>Field studies</keyword>
      <keyword>United Way</keyword>
      <keyword>Community Programming</keyword>
      <keyword>Info grounds</keyword>
      <keyword>Health Information</keyword>
      <keyword>Tipping points</keyword>
      <keyword>After school</keyword>
    <description>At IBEC we conduct field studies of real people in real situations by partnering with government, corporate and nonprofit organizations.  Descriptions of current and past projects are provided through this page.</description>
  <item title="Publications">
    <description>Publications - </description>
  <item title="Tools and Resources">
      <keyword>IBEC database</keyword>
    <description>Several tools and resources developed and utilized by IBEC are presented on this page.</description>

In my amateur estimation this approach would sort of be like having DOM functionality a la carte.

Any thoughts would be much appreciated.

Thanks in advance.
Question by:coltrane2003
  • 2
  • 2
LVL 12

Expert Comment

ID: 10723702
If you are not modifying you XML doc then you don't need the DOM.

You can get more fuctionality (and avoid the counting) withouth the full cost fo the DOM using an XPathDocument. This allow you to pick out nodes and collections of nodes without hving to count open and close tags.

Re caching:
if your xml doc dos not change freqently, I suggest loading it into a structure that suits your processing needs (could be a dataset) and caching that at the application level. You can chck everytime a new session accesses it whether the source file's date has changed. Alternatively, if changes are made through your app then you can clear your cahced version everytime a change is made and rebuild it on demand the next time a request for it comes in. I do this with a custom XML config file for a web app I've built.


Author Comment

ID: 10725787
It sounds like the XPathDocument can be instantiated with either and XmlTextReader or an XmlDocument? So I am assuming that as long as I instantiate it with the XmlTextReader that I have SAX style processing and memory usage?

Are there any limitations to application level storage? Pros/cons? My XML docs store the main page area markup for pages on my site. This markup is loaded into an RTE editor control in a separate content managment tool. If app level storage is where standard cached pages and controls go, then I don't see why there would be any pitfalls to your suggestion? What do you think?

LVL 12

Accepted Solution

monosodiumg earned 750 total points
ID: 10739148
In .Net you always have a pull not a push model.  XMLTextReader dsoes not actually give you SAX processing. You call methods on it rather than the other way round. Way less complicated than SAX.
 XPathDocument  can be instantiated with an xml reader, string, stream, textreader (check docs) but not xmldocument (don't see the point of building an XPathDocument  once you've already paid the price of building an xmldocument).

>Are there any limitations to application level storage? Pros/cons?
No major limitations I'm aware of. It's shared across all instances; don't store session sepcific data there. It's not like ASP where activeX objects or large objcts could lead to verious problems (thread affinitiy for example).

Are you edting the XML? Are multiple users edting the XML? In either case you need to manage cache validity. Inthe latter case you also need to manage mutual exclusion.

Cached pages and controls are managed separately but a built in mechanism. Your caching is your own process. I thought you were searching the xml doc server-side . If the client accesses the doc via a URL, then you can use the built in caching capability but you will need to set it up and make sure it gets invalidated if you change your xml doc.


Author Comment

ID: 10742680
I have never used SAX. I only mentioned it because it is forward only and not random access.

I have a separate editing webform that employs an XmlDocument object for writing to the XML files. I do not care about performance in this case because there only a small number of editors.

The webform that creates the live pages uses an XmlTextReader. I am going to continue using this method because I have read that it is faster. At some point I might go back and clean this code up to make it more object oriented by creating a class that handles parsing logic. But before I do that, I think I will set up a test scenario with one of those tools that simulates page requests. If I find the XmlTextReader greatly superior to the version running with XPathNavigator, then I will stick with what I've got. Otherwise I will clean things up with XPath.

The cacheing problem is a pain. I am going to allow it for a period of time on the client, not at all on the proxy, and may try your suggestion for the server.  

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

This article describes a simple method to resize a control at runtime.  It includes ready-to-use source code and a complete sample demonstration application.  We'll also talk about C# Extension Methods. Introduction In one of my applications…
Introduction Hi all and welcome to my first article on Experts Exchange. A while ago, someone asked me if i could do some tutorials on object oriented programming. I decided to do them on C#. Now you may ask me, why's that? Well, one of the re…
This video tutorial shows you the steps to go through to set up what I believe to be the best email app on the android platform to read Exchange mail.  Get the app on your phone: The first step is to make sure you have the Samsung Email app on your …
When you have multiple client accounts to manage, it often feels like there aren’t enough hours in the day. With too many applications to juggle, you can’t focus on your clients, much less your growing to-do list. But that doesn’t have to be the cas…

600 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question