How to distinguish between similarly named elements when parsing using XMLReader

Hi, Please see the two attached bits of code. I am using XMLReader to parse an XML document, but I'm having trouble distinguishing between different nodes with similar names; which I would like to treat separately. For example:
Product->RecordSourceIdentifier->IDValue,
Product->ProductIdentifier->IDValue,
Product->Distributors->Distributor->IDValue

Could anyone please describe how I can accomplish this?
testxmlreader.php
products.xml
HonyaAsked:
Who is Participating?
 
HonyaConnect With a Mentor Author Commented:
Hi Ste5an, I finally got it work hopefully with no bugs. Comment ID ID: 42392436 was really helpful. I just needed to include a while loop. Please see below.

<?php
$reader = new XMLReader();
$reader->open('products.xml');
while ($reader->read()) {
    if ($reader->nodeType == XMLREADER::ELEMENT && $reader->name == 'ProductName') {
      $reader->read();
        echo "Product Name: " . $reader->value . "<br/>";
    }

    if ($reader->nodeType == XMLREADER::ELEMENT  && $reader->name == 'RecordSourceIdentifier') {
      while($reader->name <> 'IDValue'){			
	    $reader->read();
          if ($reader->nodeType == XMLREADER::ELEMENT  && $reader->name == 'IDValue') {
            $reader->read();
              echo "Record Source ID Value: " . $reader->value . "<br/>";
          }
      }
    }
	
    if ($reader->nodeType == XMLREADER::ELEMENT  && $reader->name == 'ProductIdentifier') {
	  while($reader->name <> 'IDValue'){
	    $reader->read();  
		  if ($reader->nodeType == XMLREADER::ELEMENT  && $reader->name == 'IDValue') {
		    $reader->read();
			  echo "Product ID Value: " . $reader->value . "<br/>";
		  }
	  }
	}

	if ($reader->nodeType == XMLREADER::ELEMENT  && $reader->name == 'Distributor') {
	  while($reader->name <> 'IDValue'){
	    $reader->read();		   
		  if ($reader->nodeType == XMLREADER::ELEMENT  && $reader->name == 'IDValue') {
		    $reader->read();
			  echo "Distributor ID Value: " . $reader->value . "<br/>";
		  }
	  }
    }

}
?>

Open in new window

0
 
ste5anSenior DeveloperCommented:
The problem is simple. You need to track the levels (parent nodes) your operating on.

Is the file size typical? Then I would use SimpleXML.. which allows using XPath to select those nodes.
0
 
HonyaAuthor Commented:
The Actual file is about 2GB. That is why I wanted to use a stream based parser.
0
Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

 
ste5anConnect With a Mentor Senior DeveloperCommented:
In this case you need to keep track of the nodes you're currently on. Something like this (untested):

<?php
$reader = new XMLReader();
$reader->open('products.xml');
while ($reader->read()) {
    if ($reader->nodeType == XMLREADER::ELEMENT && $reader->name == 'Product') {
        $reader->read();
        if ($reader->nodeType == XMLREADER::ELEMENT && $reader->name == 'ProductName') {
            $reader->read();
            echo "Product Name: " . $reader->value . "<br/>";
        }

        if ($reader->nodeType == XMLREADER::ELEMENT  && $reader->name == 'RecordSourceIdentifier') {
            $reader->read();
            if ($reader->nodeType == XMLREADER::ELEMENT  && $reader->name == 'IDValue') {
                $reader->read();
                echo "Product ID Value: " . $reader->value . "<br/>";
            }
        }

        if ($reader->nodeType == XMLREADER::ELEMENT  && $reader->name == 'ProductIdentifier') {
            $reader->read();
            if ($reader->nodeType == XMLREADER::ELEMENT  && $reader->name == 'IDValue') {
                $reader->read();
                echo "Product ID Value: " . $reader->value . "<br/>";
            }
        }

        if ($reader->nodeType == XMLREADER::ELEMENT  && $reader->name == 'Distributors') {
            $reader->read();
            if ($reader->nodeType == XMLREADER::ELEMENT  && $reader->name == 'Distributor') {
                $reader->read();
                if ($reader->nodeType == XMLREADER::ELEMENT  && $reader->name == 'IDValue') {
                    $reader->read();
                    echo "Product ID Value: " . $reader->value . "<br/>";
                }
            }
        }
    }
}
?>

Open in new window

0
 
HonyaAuthor Commented:
Hi ste5an, I'm not getting any output after running the script, just a blank screen. No information in view source and no error messages.
0
 
ste5anSenior DeveloperCommented:
This wasn't intended to be run-as-is. It should just give you an idea of how you can track on which node you are.

Keep in mind: XMLReader::read() reads the next element. So it traverses the hierarchy. You need to control this traversal.
0
 
HonyaAuthor Commented:
Hi ste5an you mentioned that I will need to control the traversal. The question is how?
0
 
ste5anSenior DeveloperCommented:
When using stream reading you need to realize that you have the "next" element after a read(). "next" in this context means the "nearest" one.

E.g.
[code<root>
    <a></a>
    <b></b>
    <c></c>
</root>

<!-- vs. -->

<root>
    <a>
        <b></b>
    </a>
    <c></c>
</root>[/code]

When you're on node a and call read you'll be on b in both cases. But how do you know? That is the core problem with stream reading: you need to know that already. You need to know the structure of your XML to work with stream reading. Then you can structure your code. Using structural approaches or counting elements and depth. Cause in stream reading you only get data about the current position, you need to construct the context yourself.

So for the product name and the id of record source, it may look like that:

<?php
$reader = new XMLReader();
$reader->open('products.xml');
$path = '';
while ($reader->read()) {
    if ($reader->nodeType == XMLREADER::ELEMENT) {
        $parent = $reader->name;
        $path = $path . '\' . $parent;
        if ($parent = 'Product') {
            $path = 'Products\Product';
        }

        echo "Curent path: " . $path . "<br/>";
        if ($path == 'Products\Product\ProductName') {
            echo "Product Name: " . $reader->value . "<br/>";
            $path == 'Products\Product\';
        }

        if ($path == 'Products\Product\RecordSourceIdentifier\IDValue' ||
            $path == 'Products\Product\RecordSourceIdentifier\RecordSourceIDType\IDValue') {
            echo "Product ID Value: " . $reader->value . "<br/>";
            $path == 'Products\Product\';
        }
    }
}
?>

Open in new window


In the above sample we use our knowledge to reset the path accordingly to the expected structure.
0
 
HonyaAuthor Commented:
My solution expanded just a little on Ste5an's comments which did not provide the exact result that I was looking for. However his comments did put me on the right track.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.