XmlTextReader or DOM/XPath? Custom forward only parsing class?

Posted on 2004-03-30
Last Modified: 2012-08-13

My web app uses content parsed from XML files with an XmlTextReader.

I have read that the XmlTextReader employs a forward only pull method that is much less memory intensive than using the DOM.

I am not positive but I think that if my page class uses an XmlTextReader that the content from the reader is not cached. Instead this part of the page is rendered each time with my XmlTextReader code. This actually represents a substantial amount of code because I am storing a bunch of XHTML structural markup in my XML files along with meta data, workflow, etc.

At this point my app is starting to look like a big plate of spaghetti. I've got a bunch of counter variables, collections, and conditions. I don't have any formal OO training, but I am feeling like this would be a good time for me to write a class that parses my XML taken from the XmlTextReader. I'm also using this same logic in a number of places.

What I would really like to know is whether this makes any sense at all? Am I wasting my time doing this? If ASP.NET would serve cached pages until changes were made to the XML files, I could just use the DOM and life would be easier. On the other hand if this is not possible it seems like writing this custom class would be handy?

I have read that performance is improved by up to 5 times with forward only parsing. If this is the case, it seems like it would make sense to write a class to help with this logic? For example, I've noticed that the XmlTextReader counts both the opening and closing tags with the name property. So, I've written logic that keeps track with counter variables and checks to see if I've hit the closing tag with a statement like
if(subSwitchCount % 2 == 0)

This allows me to store the keywords below in a multidimensional array so the categories remain separate.

<item title="Home">
      <keyword>help seeking</keyword>
      <keyword>information seeking</keyword>
      <keyword>social networks</keyword>
      <keyword>community information</keyword>
      <keyword>information psychology</keyword>
      <keyword>information seeking contexts</keyword>
      <keyword>University of Washington</keyword>
      <keyword>University of Michigan</keyword>
      <keyword>Information School</keyword>
      <keyword>School of Information</keyword>
    <description>Information Behavior in Everyday Contexts (IBEC) - </description>
  <item title="About IBEC">
      <keyword>Contact information</keyword>
      <keyword>Funding organizations</keyword>
      <keyword>Best practice</keyword>
      <keyword>Information products</keyword>
      <keyword>Information delivery</keyword>
    <description>The following page contains detailed information about IBEC’s research efforts to maximize the impact of information in communities.</description>
  <item title="Projects">
      <keyword>Field studies</keyword>
      <keyword>United Way</keyword>
      <keyword>Community Programming</keyword>
      <keyword>Info grounds</keyword>
      <keyword>Health Information</keyword>
      <keyword>Tipping points</keyword>
      <keyword>After school</keyword>
    <description>At IBEC we conduct field studies of real people in real situations by partnering with government, corporate and nonprofit organizations.  Descriptions of current and past projects are provided through this page.</description>
  <item title="Publications">
    <description>Publications - </description>
  <item title="Tools and Resources">
      <keyword>IBEC database</keyword>
    <description>Several tools and resources developed and utilized by IBEC are presented on this page.</description>

In my amateur estimation this approach would sort of be like having DOM functionality a la carte.

Any thoughts would be much appreciated.

Thanks in advance.
Question by:coltrane2003
  • 2
  • 2
LVL 12

Expert Comment

ID: 10723702
If you are not modifying you XML doc then you don't need the DOM.

You can get more fuctionality (and avoid the counting) withouth the full cost fo the DOM using an XPathDocument. This allow you to pick out nodes and collections of nodes without hving to count open and close tags.

Re caching:
if your xml doc dos not change freqently, I suggest loading it into a structure that suits your processing needs (could be a dataset) and caching that at the application level. You can chck everytime a new session accesses it whether the source file's date has changed. Alternatively, if changes are made through your app then you can clear your cahced version everytime a change is made and rebuild it on demand the next time a request for it comes in. I do this with a custom XML config file for a web app I've built.


Author Comment

ID: 10725787
It sounds like the XPathDocument can be instantiated with either and XmlTextReader or an XmlDocument? So I am assuming that as long as I instantiate it with the XmlTextReader that I have SAX style processing and memory usage?

Are there any limitations to application level storage? Pros/cons? My XML docs store the main page area markup for pages on my site. This markup is loaded into an RTE editor control in a separate content managment tool. If app level storage is where standard cached pages and controls go, then I don't see why there would be any pitfalls to your suggestion? What do you think?

LVL 12

Accepted Solution

monosodiumg earned 250 total points
ID: 10739148
In .Net you always have a pull not a push model.  XMLTextReader dsoes not actually give you SAX processing. You call methods on it rather than the other way round. Way less complicated than SAX.
 XPathDocument  can be instantiated with an xml reader, string, stream, textreader (check docs) but not xmldocument (don't see the point of building an XPathDocument  once you've already paid the price of building an xmldocument).

>Are there any limitations to application level storage? Pros/cons?
No major limitations I'm aware of. It's shared across all instances; don't store session sepcific data there. It's not like ASP where activeX objects or large objcts could lead to verious problems (thread affinitiy for example).

Are you edting the XML? Are multiple users edting the XML? In either case you need to manage cache validity. Inthe latter case you also need to manage mutual exclusion.

Cached pages and controls are managed separately but a built in mechanism. Your caching is your own process. I thought you were searching the xml doc server-side . If the client accesses the doc via a URL, then you can use the built in caching capability but you will need to set it up and make sure it gets invalidated if you change your xml doc.


Author Comment

ID: 10742680
I have never used SAX. I only mentioned it because it is forward only and not random access.

I have a separate editing webform that employs an XmlDocument object for writing to the XML files. I do not care about performance in this case because there only a small number of editors.

The webform that creates the live pages uses an XmlTextReader. I am going to continue using this method because I have read that it is faster. At some point I might go back and clean this code up to make it more object oriented by creating a class that handles parsing logic. But before I do that, I think I will set up a test scenario with one of those tools that simulates page requests. If I find the XmlTextReader greatly superior to the version running with XPathNavigator, then I will stick with what I've got. Otherwise I will clean things up with XPath.

The cacheing problem is a pain. I am going to allow it for a period of time on the client, not at all on the proxy, and may try your suggestion for the server.  

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

We all know that functional code is the leg that any good program stands on when it comes right down to it, however, if your program lacks a good user interface your product may not have the appeal needed to keep your customers happy. This issue can…
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …
Two types of users will appreciate AOMEI Backupper Pro: 1 - Those with PCIe drives (and haven't found cloning software that works on them). 2 - Those who want a fast clone of their boot drive (no re-boots needed) and it can clone your drive wh…

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question