[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now


XmlTextReader or DOM/XPath? Custom forward only parsing class?

Posted on 2004-03-30
Medium Priority
Last Modified: 2012-08-13

My web app uses content parsed from XML files with an XmlTextReader.

I have read that the XmlTextReader employs a forward only pull method that is much less memory intensive than using the DOM.

I am not positive but I think that if my page class uses an XmlTextReader that the content from the reader is not cached. Instead this part of the page is rendered each time with my XmlTextReader code. This actually represents a substantial amount of code because I am storing a bunch of XHTML structural markup in my XML files along with meta data, workflow, etc.

At this point my app is starting to look like a big plate of spaghetti. I've got a bunch of counter variables, collections, and conditions. I don't have any formal OO training, but I am feeling like this would be a good time for me to write a class that parses my XML taken from the XmlTextReader. I'm also using this same logic in a number of places.

What I would really like to know is whether this makes any sense at all? Am I wasting my time doing this? If ASP.NET would serve cached pages until changes were made to the XML files, I could just use the DOM and life would be easier. On the other hand if this is not possible it seems like writing this custom class would be handy?

I have read that performance is improved by up to 5 times with forward only parsing. If this is the case, it seems like it would make sense to write a class to help with this logic? For example, I've noticed that the XmlTextReader counts both the opening and closing tags with the name property. So, I've written logic that keeps track with counter variables and checks to see if I've hit the closing tag with a statement like
if(subSwitchCount % 2 == 0)

This allows me to store the keywords below in a multidimensional array so the categories remain separate.

<item title="Home">
      <keyword>help seeking</keyword>
      <keyword>information seeking</keyword>
      <keyword>social networks</keyword>
      <keyword>community information</keyword>
      <keyword>information psychology</keyword>
      <keyword>information seeking contexts</keyword>
      <keyword>University of Washington</keyword>
      <keyword>University of Michigan</keyword>
      <keyword>Information School</keyword>
      <keyword>School of Information</keyword>
    <description>Information Behavior in Everyday Contexts (IBEC) - </description>
  <item title="About IBEC">
      <keyword>Contact information</keyword>
      <keyword>Funding organizations</keyword>
      <keyword>Best practice</keyword>
      <keyword>Information products</keyword>
      <keyword>Information delivery</keyword>
    <description>The following page contains detailed information about IBEC’s research efforts to maximize the impact of information in communities.</description>
  <item title="Projects">
      <keyword>Field studies</keyword>
      <keyword>United Way</keyword>
      <keyword>Community Programming</keyword>
      <keyword>Info grounds</keyword>
      <keyword>Health Information</keyword>
      <keyword>Tipping points</keyword>
      <keyword>After school</keyword>
    <description>At IBEC we conduct field studies of real people in real situations by partnering with government, corporate and nonprofit organizations.  Descriptions of current and past projects are provided through this page.</description>
  <item title="Publications">
    <description>Publications - </description>
  <item title="Tools and Resources">
      <keyword>IBEC database</keyword>
    <description>Several tools and resources developed and utilized by IBEC are presented on this page.</description>

In my amateur estimation this approach would sort of be like having DOM functionality a la carte.

Any thoughts would be much appreciated.

Thanks in advance.
Question by:coltrane2003
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
LVL 12

Expert Comment

ID: 10723702
If you are not modifying you XML doc then you don't need the DOM.

You can get more fuctionality (and avoid the counting) withouth the full cost fo the DOM using an XPathDocument. This allow you to pick out nodes and collections of nodes without hving to count open and close tags.

Re caching:
if your xml doc dos not change freqently, I suggest loading it into a structure that suits your processing needs (could be a dataset) and caching that at the application level. You can chck everytime a new session accesses it whether the source file's date has changed. Alternatively, if changes are made through your app then you can clear your cahced version everytime a change is made and rebuild it on demand the next time a request for it comes in. I do this with a custom XML config file for a web app I've built.


Author Comment

ID: 10725787
It sounds like the XPathDocument can be instantiated with either and XmlTextReader or an XmlDocument? So I am assuming that as long as I instantiate it with the XmlTextReader that I have SAX style processing and memory usage?

Are there any limitations to application level storage? Pros/cons? My XML docs store the main page area markup for pages on my site. This markup is loaded into an RTE editor control in a separate content managment tool. If app level storage is where standard cached pages and controls go, then I don't see why there would be any pitfalls to your suggestion? What do you think?

LVL 12

Accepted Solution

monosodiumg earned 750 total points
ID: 10739148
In .Net you always have a pull not a push model.  XMLTextReader dsoes not actually give you SAX processing. You call methods on it rather than the other way round. Way less complicated than SAX.
 XPathDocument  can be instantiated with an xml reader, string, stream, textreader (check docs) but not xmldocument (don't see the point of building an XPathDocument  once you've already paid the price of building an xmldocument).

>Are there any limitations to application level storage? Pros/cons?
No major limitations I'm aware of. It's shared across all instances; don't store session sepcific data there. It's not like ASP where activeX objects or large objcts could lead to verious problems (thread affinitiy for example).

Are you edting the XML? Are multiple users edting the XML? In either case you need to manage cache validity. Inthe latter case you also need to manage mutual exclusion.

Cached pages and controls are managed separately but a built in mechanism. Your caching is your own process. I thought you were searching the xml doc server-side . If the client accesses the doc via a URL, then you can use the built in caching capability but you will need to set it up and make sure it gets invalidated if you change your xml doc.


Author Comment

ID: 10742680
I have never used SAX. I only mentioned it because it is forward only and not random access.

I have a separate editing webform that employs an XmlDocument object for writing to the XML files. I do not care about performance in this case because there only a small number of editors.

The webform that creates the live pages uses an XmlTextReader. I am going to continue using this method because I have read that it is faster. At some point I might go back and clean this code up to make it more object oriented by creating a class that handles parsing logic. But before I do that, I think I will set up a test scenario with one of those tools that simulates page requests. If I find the XmlTextReader greatly superior to the version running with XPathNavigator, then I will stick with what I've got. Otherwise I will clean things up with XPath.

The cacheing problem is a pain. I am going to allow it for a period of time on the client, not at all on the proxy, and may try your suggestion for the server.  

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article introduced a TextBox that supports transparent background.   Introduction TextBox is the most widely used control component in GUI design. Most GUI controls do not support transparent background and more or less do not have the…
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
Sometimes it takes a new vantage point, apart from our everyday security practices, to truly see our Active Directory (AD) vulnerabilities. We get used to implementing the same techniques and checking the same areas for a breach. This pattern can re…
In a question here at Experts Exchange (https://www.experts-exchange.com/questions/29062564/Adobe-acrobat-reader-DC.html), a member asked how to create a signature in Adobe Acrobat Reader DC (the free Reader product, not the paid, full Acrobat produ…

656 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question