?
Solved

xml parsing in php4

Posted on 2010-09-05
15
Medium Priority
?
410 Views
Last Modified: 2012-05-10
I am trying to parse an xml file in php4 as seen below.  I'm able to parse it via SAX, however, the whole thing is being parsed where as I'm only needing to parse a portion of it depending on which product page is called via a browser.  For example, instead of it parsing the entire xml file, I'm only needing it to parse the section containing the <pageid>74-5002JEE11</pageid> and it's siblings.  This is for seo purposes.  Any help would be greatly appreciated!

Carol

<?xml version="1.0" encoding="UTF-8"?>
<products xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<product xsi:type="ProductWithReviews" locale="en_US">
<pageid>74-5002JEE11</pageid>
<name>HON  Park Avenue 5000 Series Mid-Back Managers Chair, Henna Cherry/Black Vinyl</name>
<smallstarimagelocation>pwr/engine/images/stars_small.gif</smallstarimagelocation>
<largestarimagelocation>pwr/engine/images/stars.gif</largestarimagelocation>
<newestreviewdate>2010-08-30</newestreviewdate>
<oldestreviewdate>2010-08-24</oldestreviewdate>
<averageoverallrating>5</averageoverallrating>
<average_rating_decimal>5</average_rating_decimal>
<fullreviews>2</fullreviews>
<confirmstatusgroup>
<confirmstatus>Verified Reviewer</confirmstatus>
</confirmstatusgroup>
<taggroup key="bestuses" name="Best Uses">
<tag isuseradded="false" count="1">Computer chair</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="2">Midrange shopper</tag>
</taggroup>
<taggroup key="primaryuse" name="Primary use">
<tag isuseradded="false" count="2">Business</tag>
</taggroup>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="2">Attractive</tag>
<tag isuseradded="false" count="2">Comfortable</tag>
<tag isuseradded="false" count="2">Durable</tag>
<tag isuseradded="false" count="1">Easy to assemble</tag>
<tag isuseradded="false" count="2">Ergonomic</tag>
<tag isuseradded="false" count="2">Good lumbar support</tag>
<tag isuseradded="false" count="2">Rolls smoothly</tag>
</taggroup>
<bottom_line_yes_votes>2</bottom_line_yes_votes>
<bottom_line_no_votes>0</bottom_line_no_votes>
<customerimages>false</customerimages>
<customervideos>false</customervideos>
<inlinefiles>
<inlinefile reviewpage="1">pwr/7gdt43ap/inline/06/30/74__5002JEE11-en_US-1-reviews.html</inlinefile>
</inlinefiles>
<reviews>
<fullreview>
<id>xxxxx</id>
<merchant_review_id>xxxxx</merchant_review_id>
<merchantuserid/>
<status>Approved</status>
<createddate>2010-08-24</createddate>
<helpfulvotes>0</helpfulvotes>
<nothelpfulvotes>0</nothelpfulvotes>
<source>web</source>
<confirmstatusgroup>
<confirmstatus>Verified Reviewer</confirmstatus>
</confirmstatusgroup>
<headline>test</headline>
<overallrating>5</overallrating>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="1">Sturdy</tag>
<tag isuseradded="false" count="1">Rolls Smoothly</tag>
<tag isuseradded="false" count="1">Good Lumbar Support</tag>
<tag isuseradded="false" count="1">Comfortable</tag>
<tag isuseradded="false" count="1">Ergonomic</tag>
<tag isuseradded="false" count="1">Attractive Design</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="1">Midrange Shopper</tag>
</taggroup>
<taggroup key="bestuses" name="Best Uses">
<tag isuseradded="false" count="1">Computer Chair</tag>
</taggroup>
<taggroup key="primaryuse" name="Primary use">
<tag isuseradded="false" count="1">Business</tag>
</taggroup>
<bottom_line>recommended</bottom_line>
<comments>test</comments>
<nickname>carol</nickname>
<location>oregon</location>
<email_address_from_user>xxxxx@xxxxx.com</email_address_from_user>
<site_id>1</site_id>
</fullreview>
<fullreview>
<id>xxxxx</id>
<merchant_review_id>xxxxx</merchant_review_id>
<merchantuserid/>
<status>Approved</status>
<createddate>2010-08-30</createddate>
<helpfulvotes>0</helpfulvotes>
<nothelpfulvotes>0</nothelpfulvotes>
<source>web</source>
<confirmstatusgroup>
<confirmstatus>Verified Reviewer</confirmstatus>
</confirmstatusgroup>
<headline>An awesome chair.</headline>
<overallrating>5</overallrating>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="1">Ergonomic</tag>
<tag isuseradded="false" count="1">Comfortable</tag>
<tag isuseradded="false" count="1">Good Lumbar Support</tag>
<tag isuseradded="false" count="1">Easy To Assemble</tag>
<tag isuseradded="false" count="1">Rolls Smoothly</tag>
<tag isuseradded="false" count="1">Attractive Design</tag>
<tag isuseradded="false" count="1">Sturdy</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="1">Midrange Shopper</tag>
</taggroup>
<taggroup key="primaryuse" name="Primary use">
<tag isuseradded="false" count="1">Business</tag>
</taggroup>
<bottom_line>recommended</bottom_line>
<comments>I bought this chair for my office and love it.  I could sit in it all day.</comments>
<nickname>Gerry</nickname>
<location>Grants Pass, OR</location>
<email_address_from_user>xxxxx@xxxxx.com</email_address_from_user>
<site_id>1</site_id>
</fullreview>
</reviews>
</product>

<product xsi:type="ProductWithReviews" locale="en_US">
<pageid>14-PSD-TS-CTM-MS2</pageid>
<name>Dual Pole TV Floor Stand with Tilt Mount - for 37" to 63" Displays</name>
<smallstarimagelocation>pwr/engine/images/stars_small.gif</smallstarimagelocation>
<largestarimagelocation>pwr/engine/images/stars.gif</largestarimagelocation>
<newestreviewdate>2010-08-24</newestreviewdate>
<oldestreviewdate>2010-08-24</oldestreviewdate>
<averageoverallrating>5</averageoverallrating>
<average_rating_decimal>5</average_rating_decimal>
<fullreviews>1</fullreviews>
<confirmstatusgroup>
<confirmstatus>Verified Reviewer</confirmstatus>
</confirmstatusgroup>
<taggroup key="bestuses" name="Best Uses">
<tag isuseradded="false" count="1">Space saver</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="1">Stylish</tag>
</taggroup>
<taggroup key="primaryuse" name="Primary use">
<tag isuseradded="false" count="1">Business</tag>
</taggroup>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="1">Attractive</tag>
<tag isuseradded="false" count="1">Easy to assemble</tag>
<tag isuseradded="false" count="1">Smooth edges</tag>
<tag isuseradded="false" count="1">Supports weight</tag>
</taggroup>
<bottom_line_yes_votes>1</bottom_line_yes_votes>
<bottom_line_no_votes>0</bottom_line_no_votes>
<customerimages>false</customerimages>
<customervideos>false</customervideos>
<inlinefiles>
<inlinefile reviewpage="1">pwr/7gdt43ap/inline/02/48/14__PSD__TS__CTM__MS2-en_US-1-reviews.html</inlinefile>
</inlinefiles>
<reviews>
<fullreview>
<id>xxxxx</id>
<merchant_review_id>xxxxx</merchant_review_id>
<merchantuserid/>
<status>Approved</status>
<createddate>2010-08-24</createddate>
<helpfulvotes>0</helpfulvotes>
<nothelpfulvotes>0</nothelpfulvotes>
<source>web</source>
<confirmstatusgroup>
<confirmstatus>Verified Reviewer</confirmstatus>
</confirmstatusgroup>
<headline>test</headline>
<overallrating>5</overallrating>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="1">Attractive Design</tag>
<tag isuseradded="false" count="1">Smooth Edges</tag>
<tag isuseradded="false" count="1">Easy To Assemble</tag>
<tag isuseradded="false" count="1">Supports Weight</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="1">Stylish</tag>
</taggroup>
<taggroup key="bestuses" name="Best Uses">
<tag isuseradded="false" count="1">Space Saver</tag>
</taggroup>
<taggroup key="primaryuse" name="Primary use">
<tag isuseradded="false" count="1">Business</tag>
</taggroup>
<bottom_line>recommended</bottom_line>
<comments>test</comments>
<nickname>carol</nickname>
<location>oregon</location>
<email_address_from_user>xxxxxl@xxxxx.com</email_address_from_user>
<site_id>1</site_id>
</fullreview>
</reviews>
</product>

<product xsi:type="ProductWithReviews" locale="en_US">
<pageid>18-9414AG-BU</pageid>
<name>Comfort Series Ergonomic Posture Chair - Blue</name>
<smallstarimagelocation>pwr/engine/images/stars_small.gif</smallstarimagelocation>
<largestarimagelocation>pwr/engine/images/stars.gif</largestarimagelocation>
<newestreviewdate>2010-08-24</newestreviewdate>
<oldestreviewdate>2010-08-24</oldestreviewdate>
<averageoverallrating>4</averageoverallrating>
<average_rating_decimal>4</average_rating_decimal>
<fullreviews>1</fullreviews>
<confirmstatusgroup>
<confirmstatus>Unverified</confirmstatus>
</confirmstatusgroup>
<taggroup key="bestuses" name="Best Uses">
<tag isuseradded="false" count="1">Computer chair</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="1">Midrange shopper</tag>
</taggroup>
<taggroup key="primaryuse" name="Primary use">
<tag isuseradded="false" count="1">Business</tag>
</taggroup>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="1">Attractive</tag>
<tag isuseradded="false" count="1">Comfortable</tag>
<tag isuseradded="false" count="1">Durable</tag>
<tag isuseradded="false" count="1">Easy to assemble</tag>
<tag isuseradded="false" count="1">Ergonomic</tag>
<tag isuseradded="false" count="1">Good lumbar support</tag>
<tag isuseradded="false" count="1">Rolls smoothly</tag>
</taggroup>
<bottom_line_yes_votes>1</bottom_line_yes_votes>
<bottom_line_no_votes>0</bottom_line_no_votes>
<customerimages>false</customerimages>
<customervideos>false</customervideos>
<inlinefiles>
<inlinefile reviewpage="1">pwr/7gdt43ap/inline/09/12/18__9414AG__BU-en_US-1-reviews.html</inlinefile>
</inlinefiles>
<reviews>
<fullreview>
<id>xxxxx</id>
<merchant_review_id>xxxxx</merchant_review_id>
<merchantuserid/>
<status>Approved</status>
<createddate>2010-08-24</createddate>
<helpfulvotes>0</helpfulvotes>
<nothelpfulvotes>0</nothelpfulvotes>
<source>web</source>
<confirmstatusgroup>
<confirmstatus>Unverified</confirmstatus>
</confirmstatusgroup>
<headline>test</headline>
<overallrating>4</overallrating>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="1">Attractive Design</tag>
<tag isuseradded="false" count="1">Comfortable</tag>
<tag isuseradded="false" count="1">Sturdy</tag>
<tag isuseradded="false" count="1">Rolls Smoothly</tag>
<tag isuseradded="false" count="1">Ergonomic</tag>
<tag isuseradded="false" count="1">Easy To Assemble</tag>
<tag isuseradded="false" count="1">Good Lumbar Support</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="1">Midrange Shopper</tag>
</taggroup>
<taggroup key="primaryuse" name="Primary use">
<tag isuseradded="false" count="1">Business</tag>
</taggroup>
<taggroup key="bestuses" name="Best Uses">
<tag isuseradded="false" count="1">Computer Chair</tag>
</taggroup>
<bottom_line>recommended</bottom_line>
<comments>test</comments>
<nickname>carol</nickname>
<location>oregon</location>
<site_id>1</site_id>
</fullreview>
</reviews>
</product>

<product xsi:type="ProductWithReviews" locale="en_US">
<pageid>18-640LTL</pageid>
<name>VariTask LT Adjustable Corner Computer Workstation</name>
<smallstarimagelocation>pwr/engine/images/stars_small.gif</smallstarimagelocation>
<largestarimagelocation>pwr/engine/images/stars.gif</largestarimagelocation>
<newestreviewdate>2010-08-23</newestreviewdate>
<oldestreviewdate>2010-08-23</oldestreviewdate>
<averageoverallrating>5</averageoverallrating>
<average_rating_decimal>5</average_rating_decimal>
<fullreviews>1</fullreviews>
<confirmstatusgroup>
<confirmstatus>Unverified</confirmstatus>
</confirmstatusgroup>
<taggroup key="bestuses" name="Best Uses">
<tag isuseradded="false" count="1">Informal use</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="1">Midrange shopper</tag>
</taggroup>
<taggroup key="primaryuse" name="Primary use">
<tag isuseradded="false" count="1">Business</tag>
</taggroup>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="1">Attractive</tag>
<tag isuseradded="false" count="1">Comfortable</tag>
<tag isuseradded="false" count="1">Well built / quality</tag>
</taggroup>
<bottom_line_yes_votes>1</bottom_line_yes_votes>
<bottom_line_no_votes>0</bottom_line_no_votes>
<customerimages>false</customerimages>
<customervideos>false</customervideos>
<inlinefiles>
<inlinefile reviewpage="1">pwr/7gdt43ap/inline/07/73/18__640LTL-en_US-1-reviews.html</inlinefile>
</inlinefiles>
<reviews>
<fullreview>
<id>xxxxx</id>
<merchant_review_id>xxxxx</merchant_review_id>
<merchantuserid/>
<status>Approved</status>
<createddate>2010-08-23</createddate>
<helpfulvotes>0</helpfulvotes>
<nothelpfulvotes>0</nothelpfulvotes>
<source>web</source>
<confirmstatusgroup>
<confirmstatus>Unverified</confirmstatus>
</confirmstatusgroup>
<headline>Test</headline>
<overallrating>5</overallrating>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="1">Quality Construction</tag>
<tag isuseradded="false" count="1">Comfortable</tag>
<tag isuseradded="false" count="1">Attractive Design</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="1">Midrange Shopper</tag>
</taggroup>
<taggroup key="bestuses" name="Best Uses">
<tag isuseradded="false" count="1">Informal Use</tag>
</taggroup>
<taggroup key="primaryuse" name="Primary use">
<tag isuseradded="false" count="1">Business</tag>
</taggroup>
<bottom_line>recommended</bottom_line>
<comments>test</comments>
<nickname>Carol</nickname>
<location>Oregon</location>
<site_id>1</site_id>
</fullreview>
</reviews>
</product>

<product xsi:type="ProductWithReviews" locale="en_US">
<pageid>74-VL601VA</pageid>
<name/>
<smallstarimagelocation>pwr/engine/images/stars_small.gif</smallstarimagelocation>
<largestarimagelocation>pwr/engine/images/stars.gif</largestarimagelocation>
<newestreviewdate>2010-08-24</newestreviewdate>
<oldestreviewdate>2010-08-24</oldestreviewdate>
<averageoverallrating>5</averageoverallrating>
<average_rating_decimal>5</average_rating_decimal>
<fullreviews>1</fullreviews>
<confirmstatusgroup>
<confirmstatus>Verified Reviewer</confirmstatus>
</confirmstatusgroup>
<taggroup key="bestuses" name="Best Uses">
<tag isuseradded="false" count="1">Board meetings</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="1">Midrange shopper</tag>
</taggroup>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="1">Attractive</tag>
<tag isuseradded="false" count="1">Comfortable</tag>
<tag isuseradded="false" count="1">Well built / quality</tag>
</taggroup>
<bottom_line_yes_votes>1</bottom_line_yes_votes>
<bottom_line_no_votes>0</bottom_line_no_votes>
<customerimages>false</customerimages>
<customervideos>false</customervideos>
<inlinefiles>
<inlinefile reviewpage="1">pwr/7gdt43ap/inline/07/80/74__VL601VA-en_US-1-reviews.html</inlinefile>
</inlinefiles>
<reviews>
<fullreview>
<id>xxxxx</id>
<merchant_review_id>xxxxx</merchant_review_id>
<merchantuserid/>
<status>Approved</status>
<createddate>2010-08-24</createddate>
<helpfulvotes>0</helpfulvotes>
<nothelpfulvotes>0</nothelpfulvotes>
<source>web</source>
<confirmstatusgroup>
<confirmstatus>Verified Reviewer</confirmstatus>
</confirmstatusgroup>
<headline>test</headline>
<overallrating>5</overallrating>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="1">Comfortable</tag>
<tag isuseradded="false" count="1">Quality Construction</tag>
<tag isuseradded="false" count="1">Attractive Design</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="1">Midrange Shopper</tag>
</taggroup>
<taggroup key="bestuses" name="Best Uses">
<tag isuseradded="false" count="1">Board Meetings</tag>
</taggroup>
<bottom_line>recommended</bottom_line>
<comments>test</comments>
<nickname>Carol</nickname>
<location>Oregon</location>
<email_address_from_user>xxxxx@xxxxx.com</email_address_from_user>
<site_id>1</site_id>
</fullreview>
</reviews>
</product>

<product xsi:type="ProductWithReviews" locale="en_US">
<pageid>14-FLIP</pageid>
<name>Flip Down Under Cabinet TV Mount for 10" to 18" Displays</name>
<smallstarimagelocation>pwr/engine/images/stars_small.gif</smallstarimagelocation>
<largestarimagelocation>pwr/engine/images/stars.gif</largestarimagelocation>
<newestreviewdate>2010-08-24</newestreviewdate>
<oldestreviewdate>2010-08-24</oldestreviewdate>
<averageoverallrating>4</averageoverallrating>
<average_rating_decimal>4</average_rating_decimal>
<fullreviews>1</fullreviews>
<confirmstatusgroup>
<confirmstatus>Verified Reviewer</confirmstatus>
</confirmstatusgroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="1">Quality oriented</tag>
</taggroup>
<taggroup key="primaryuse" name="Primary use">
<tag isuseradded="false" count="1">Personal</tag>
</taggroup>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="1">Compact</tag>
<tag isuseradded="false" count="1">Durable</tag>
</taggroup>
<bottom_line_yes_votes>1</bottom_line_yes_votes>
<bottom_line_no_votes>0</bottom_line_no_votes>
<customerimages>false</customerimages>
<customervideos>false</customervideos>
<inlinefiles>
<inlinefile reviewpage="1">pwr/7gdt43ap/inline/05/44/14__FLIP-en_US-1-reviews.html</inlinefile>
</inlinefiles>
<reviews>
<fullreview>
<id>xxxxxxxx</id>
<merchant_review_id>xxxxxxxx</merchant_review_id>
<merchantuserid/>
<status>Approved</status>
<createddate>2010-08-24</createddate>
<helpfulvotes>0</helpfulvotes>
<nothelpfulvotes>0</nothelpfulvotes>
<source>web</source>
<confirmstatusgroup>
<confirmstatus>Verified Reviewer</confirmstatus>
</confirmstatusgroup>
<headline>test</headline>
<overallrating>4</overallrating>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="1">Durable</tag>
<tag isuseradded="false" count="1">Compact</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="1">Quality Oriented</tag>
</taggroup>
<taggroup key="primaryuse" name="Primary use">
<tag isuseradded="false" count="1">Personal</tag>
</taggroup>
<bottom_line>recommended</bottom_line>
<comments>test</comments>
<nickname>carol</nickname>
<location>oregon</location>
<email_address_from_user>xxxx@xxxxxxxxx.com</email_address_from_user>
<site_id>1</site_id>
</fullreview>
</reviews>
</product>

</products> 

Open in new window

0
Comment
Question by:ergoindemand
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 6
15 Comments
 
LVL 6

Expert Comment

by:apresence
ID: 33609126
The attached code should do it for you.  It grabs everything between the closing </pageid> and </product> tags only for the pageid you requested and adds an XML header then assigns to $xml which you can then pass off to your xml parser.

Testing (I saved your sample input XML file into test10.in):
root@beta:~/exex/test10 $ php test10.php <test10.in
------
<?xml version="1.0" encoding="UTF-8"?>
<name>HON  Park Avenue 5000 Series Mid-Back Managers Chair, Henna Cherry/Black Vinyl</name>
<smallstarimagelocation>pwr/engine/images/stars_small.gif</smallstarimagelocation>
<largestarimagelocation>pwr/engine/images/stars.gif</largestarimagelocation>
<newestreviewdate>2010-08-30</newestreviewdate>
<oldestreviewdate>2010-08-24</oldestreviewdate>
<averageoverallrating>5</averageoverallrating>
<average_rating_decimal>5</average_rating_decimal>
<fullreviews>2</fullreviews>
<confirmstatusgroup>
<confirmstatus>Verified Reviewer</confirmstatus>
</confirmstatusgroup>
<taggroup key="bestuses" name="Best Uses">
<tag isuseradded="false" count="1">Computer chair</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="2">Midrange shopper</tag>
</taggroup>
<taggroup key="primaryuse" name="Primary use">
<tag isuseradded="false" count="2">Business</tag>
</taggroup>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="2">Attractive</tag>
<tag isuseradded="false" count="2">Comfortable</tag>
<tag isuseradded="false" count="2">Durable</tag>
<tag isuseradded="false" count="1">Easy to assemble</tag>
<tag isuseradded="false" count="2">Ergonomic</tag>
<tag isuseradded="false" count="2">Good lumbar support</tag>
<tag isuseradded="false" count="2">Rolls smoothly</tag>
</taggroup>
<bottom_line_yes_votes>2</bottom_line_yes_votes>
<bottom_line_no_votes>0</bottom_line_no_votes>
<customerimages>false</customerimages>
<customervideos>false</customervideos>
<inlinefiles>
<inlinefile reviewpage="1">pwr/7gdt43ap/inline/06/30/74__5002JEE11-en_US-1-reviews.html</inlinefile>
</inlinefiles>
<reviews>
<fullreview>
<id>xxxxx</id>
<merchant_review_id>xxxxx</merchant_review_id>
<merchantuserid/>
<status>Approved</status>
<createddate>2010-08-24</createddate>
<helpfulvotes>0</helpfulvotes>
<nothelpfulvotes>0</nothelpfulvotes>
<source>web</source>
<confirmstatusgroup>
<confirmstatus>Verified Reviewer</confirmstatus>
</confirmstatusgroup>
<headline>test</headline>
<overallrating>5</overallrating>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="1">Sturdy</tag>
<tag isuseradded="false" count="1">Rolls Smoothly</tag>
<tag isuseradded="false" count="1">Good Lumbar Support</tag>
<tag isuseradded="false" count="1">Comfortable</tag>
<tag isuseradded="false" count="1">Ergonomic</tag>
<tag isuseradded="false" count="1">Attractive Design</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="1">Midrange Shopper</tag>
</taggroup>
<taggroup key="bestuses" name="Best Uses">
<tag isuseradded="false" count="1">Computer Chair</tag>
</taggroup>
<taggroup key="primaryuse" name="Primary use">
<tag isuseradded="false" count="1">Business</tag>
</taggroup>
<bottom_line>recommended</bottom_line>
<comments>test</comments>
<nickname>carol</nickname>
<location>oregon</location>
<email_address_from_user>xxxxx@xxxxx.com</email_address_from_user>
<site_id>1</site_id>
</fullreview>
<fullreview>
<id>xxxxx</id>
<merchant_review_id>xxxxx</merchant_review_id>
<merchantuserid/>
<status>Approved</status>
<createddate>2010-08-30</createddate>
<helpfulvotes>0</helpfulvotes>
<nothelpfulvotes>0</nothelpfulvotes>
<source>web</source>
<confirmstatusgroup>
<confirmstatus>Verified Reviewer</confirmstatus>
</confirmstatusgroup>
<headline>An awesome chair.</headline>
<overallrating>5</overallrating>
<taggroup key="pros" name="Pros">
<tag isuseradded="false" count="1">Ergonomic</tag>
<tag isuseradded="false" count="1">Comfortable</tag>
<tag isuseradded="false" count="1">Good Lumbar Support</tag>
<tag isuseradded="false" count="1">Easy To Assemble</tag>
<tag isuseradded="false" count="1">Rolls Smoothly</tag>
<tag isuseradded="false" count="1">Attractive Design</tag>
<tag isuseradded="false" count="1">Sturdy</tag>
</taggroup>
<taggroup key="describeyourself" name="Describe Yourself">
<tag isuseradded="false" count="1">Midrange Shopper</tag>
</taggroup>
<taggroup key="primaryuse" name="Primary use">
<tag isuseradded="false" count="1">Business</tag>
</taggroup>
<bottom_line>recommended</bottom_line>
<comments>I bought this chair for my office and love it.  I could sit in it all day.</comments>
<nickname>Gerry</nickname>
<location>Grants Pass, OR</location>
<email_address_from_user>xxxxx@xxxxx.com</email_address_from_user>
<site_id>1</site_id>
</fullreview>
</reviews>

root@beta:~/exex/test10 $
<?php
$data = file_get_contents('php://stdin');
preg_match_all('|<pageid>74\-5002JEE11</pageid>(.*)</product>|imsU', $data, $matches);
if (count($matches) >= 1)
{
  // We just want the grouped part...
  $grouped = $matches[1];
  for ($i=0; $i<count($grouped); $i++)
  {
    $xml = '<?xml version="1.0" encoding="UTF-8"?>' . $grouped[$i];
    print "------\n$xml\n";
    // Parse the XML here...
  }
}
?>

Open in new window

0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 33610573
You might want to consider moving to PHP5.  PHP4 has been dead a long time.  Not even security fixes are available any more.  And with PHP5 you get some valuable new functions, like this:
http://us2.php.net/manual/en/function.simplexml-load-file.php
0
 

Author Comment

by:ergoindemand
ID: 33613265
Awesome...thank you so much!
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 

Author Comment

by:ergoindemand
ID: 33613272
And we definitely are considering moving to php5 in the near future.  We are currently running CRE Loaded 6.3 for our store, so once we upload to the new version of cre loaded, we'll definitely be updating to php5.
0
 

Author Comment

by:ergoindemand
ID: 33639602
apresence, I'm unable to get your script to work.  Here is the php4 script I'm using to parse the xml.  Any suggestions...help?  Thanks!

<?php

$file = "http://www.ergoindemand.com/pwr/7gdt43ap/rawdata/review_data_complete.xml";

function trustedFile($file)
{
   // only trust local files owned by ourselves
   if (!eregi("^([a-z]+)://", $file)
       && fileowner($file) == getmyuid()) {
           return true;
   }
   return false;
}


function startElement($parser, $name, $attribs)
{
   echo "<<font color=\"#0000cc\">$name</font>";
   if (sizeof($attribs)) {
       while (list($k, $v) = each($attribs)) {
           echo " <font color=\"#009900\">$k</font>=\"<font
                   color=\"#990000\">$v</font>\"";
       }
   }
   echo ">";
}

function endElement($parser, $name)
{
   echo "</<font color=\"#0000cc\">$name</font>>";
}

function characterData($parser, $data)
{
   echo "<b>$data</b>";
}

function PIHandler($parser, $target, $data)
{
   switch (strtolower($target)) {
       case "php":
           global $parser_file;
           // If the parsed document is "trusted", we say it is safe
           // to execute PHP code inside it.  If not, display the code
           // instead.
           if (trustedFile($parser_file[$parser])) {
               eval($data);
           } else {
               printf("Untrusted PHP code: <i>%s</i>",
                       htmlspecialchars($data));
           }
           break;
   }
}

function defaultHandler($parser, $data)
{
   if (substr($data, 0, 1) == "&" && substr($data, -1, 1) == ";") {
       printf('<font color="#aa00aa">%s</font>',
               htmlspecialchars($data));
   } else {
       printf('<font size="-1">%s</font>',
               htmlspecialchars($data));
   }
}

function externalEntityRefHandler($parser, $openEntityNames, $base, $systemId,
                                 $publicId) {
   if ($systemId) {
       if (!list($parser, $fp) = new_xml_parser($systemId)) {
           printf("Could not open entity %s at %s\n", $openEntityNames,
                   $systemId);
           return false;
       }
       while ($data = fread($fp, 4096)) {
           if (!xml_parse($parser, $data, feof($fp))) {
               printf("XML error: %s at line %d while parsing entity %s\n",
                       xml_error_string(xml_get_error_code($parser)),
                       xml_get_current_line_number($parser), $openEntityNames);
               xml_parser_free($parser);
               return false;
           }
       }
       xml_parser_free($parser);
       return true;
   }
   return false;
}

function new_xml_parser($file)
{
   global $parser_file;

   $xml_parser = xml_parser_create();
   xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, 1);
   xml_set_element_handler($xml_parser, "startElement", "endElement");
   xml_set_character_data_handler($xml_parser, "characterData");
   xml_set_processing_instruction_handler($xml_parser, "PIHandler");
   xml_set_default_handler($xml_parser, "defaultHandler");
   xml_set_external_entity_ref_handler($xml_parser, "externalEntityRefHandler");
 
   if (!($fp = @fopen($file, "r"))) {
       return false;
   }
   if (!is_array($parser_file)) {
       settype($parser_file, "array");
   }
   $parser_file[$xml_parser] = $file;
   return array($xml_parser, $fp);
}

if (!(list($xml_parser, $fp) = new_xml_parser($file))) {
   die("could not open XML input");
}

echo "<pre>";
while ($data = fread($fp, 4096)) {
   if (!xml_parse($xml_parser, $data, feof($fp))) {
       die(sprintf("XML error: %s at line %d\n",
                   xml_error_string(xml_get_error_code($xml_parser)),
                   xml_get_current_line_number($xml_parser)));
   }
}
echo "</pre>";
echo "parse complete\n";
xml_parser_free($xml_parser);

?>

0
 
LVL 6

Accepted Solution

by:
apresence earned 2000 total points
ID: 33640124
Adjusted script attached.  I added apresence_scrub_xml() and changed the code after that function a little bit.

Is there a reason you're using "<<" and ">>" to start/end the output your generating?  For example:
function startElement($parser, $name, $attribs)
{
  echo "<<font color=\"#0000cc\">$name</font>";
  if (sizeof($attribs)) {
      while (list($k, $v) = each($attribs)) {
          echo " <font color=\"#009900\">$k</font>=\"<font
                  color=\"#990000\">$v</font>\"";
      }
  }
  echo ">";
}

Generates:
<<font color ... /font>>


<?php

$file = "http://www.ergoindemand.com/pwr/7gdt43ap/rawdata/review_data_complete.xml";

function trustedFile($file)
{
  // only trust local files owned by ourselves
  if (!eregi("^([a-z]+)://", $file)
      && fileowner($file) == getmyuid()) {
          return true;
  }
  return false;
}


function startElement($parser, $name, $attribs)
{
  echo "<<font color=\"#0000cc\">$name</font>";
  if (sizeof($attribs)) {
      while (list($k, $v) = each($attribs)) {
          echo " <font color=\"#009900\">$k</font>=\"<font
                  color=\"#990000\">$v</font>\"";
      }
  }
  echo ">";
}

function endElement($parser, $name)
{
  echo "</<font color=\"#0000cc\">$name</font>>";
}

function characterData($parser, $data)
{
  echo "<b>$data</b>";
}

function PIHandler($parser, $target, $data)
{
  switch (strtolower($target)) {
      case "php":
          global $parser_file;
          // If the parsed document is "trusted", we say it is safe
          // to execute PHP code inside it.  If not, display the code
          // instead.
          if (trustedFile($parser_file[$parser])) {
              eval($data);
          } else {
              printf("Untrusted PHP code: <i>%s</i>",
                      htmlspecialchars($data));
          }
          break;
  }
}

function defaultHandler($parser, $data)
{
  if (substr($data, 0, 1) == "&" && substr($data, -1, 1) == ";") {
      printf('<font color="#aa00aa">%s</font>',
              htmlspecialchars($data));
  } else {
      printf('<font size="-1">%s</font>',
              htmlspecialchars($data));
  }
}

function externalEntityRefHandler($parser, $openEntityNames, $base, $systemId,
                                $publicId) {
  if ($systemId) {
      if (!list($parser, $fp) = new_xml_parser($systemId)) {
          printf("Could not open entity %s at %s\n", $openEntityNames,
                  $systemId);
          return false;
      }
      while ($data = fread($fp, 4096)) {
          if (!xml_parse($parser, $data, feof($fp))) {
              printf("XML error: %s at line %d while parsing entity %s\n",
                      xml_error_string(xml_get_error_code($parser)),
                      xml_get_current_line_number($parser), $openEntityNames);
              xml_parser_free($parser);
              return false;
          }
      }
      xml_parser_free($parser);
      return true;
  }
  return false;
}

function new_xml_parser($file)
{
  global $parser_file;

  $xml_parser = xml_parser_create();
  xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, 1);
  xml_set_element_handler($xml_parser, "startElement", "endElement");
  xml_set_character_data_handler($xml_parser, "characterData");
  xml_set_processing_instruction_handler($xml_parser, "PIHandler");
  xml_set_default_handler($xml_parser, "defaultHandler");
  xml_set_external_entity_ref_handler($xml_parser, "externalEntityRefHandler");
 
  if (!($fp = @fopen($file, "r"))) {
      return false;
  }
  if (!is_array($parser_file)) {
      settype($parser_file, "array");
  }
  $parser_file[$xml_parser] = $file;
  return array($xml_parser, $fp);
}

function apresence_scrub_xml($xml)
{
  // Grab 2 lines from xml header
  preg_match('|^([^$]+$){2}|imsU', $xml, $matches);
  $new_xml = $matches[0];

  preg_match_all('|(<product [^<]+<pageid>74\-5002JEE11</pageid>.*</product>)|imsU', $xml, $matches);
  if (count($matches) >= 1)
  {
    // We just want the grouped part...
    $grouped = $matches[1];
    for ($i=0; $i<count($grouped); $i++)
    {
      $new_xml .= $grouped[$i];
    }
  }

  // Append the xml footer
  $new_xml .= "\n</products>\n";

  return $new_xml;
}

if (!(list($xml_parser, $fp) = new_xml_parser($file))) {
  die("could not open XML input");
}

// Read all the data into one string
$data = '';
while ($this_data = fread($fp, 4096)) {
  $data .= $this_data;
}

// Just return the stuff we're interested in
$data = apresence_scrub_xml($data);

echo "<pre>";
// Uncomment the next line to see the scrubbed xml before parsing
// print_r($new_xml); echo "\n\n\n--------------------\n\n\n";

if (!xml_parse($xml_parser, $data, true)) {
    die(sprintf("XML error: %s at line %d\n",
                xml_error_string(xml_get_error_code($xml_parser)),
                xml_get_current_line_number($xml_parser)));
}
echo "</pre>";
echo "parse complete\n";
xml_parser_free($xml_parser);

?>

Open in new window

0
 

Author Comment

by:ergoindemand
ID: 33640203
Works perfectly!!!  Thanks!  No special reason for the double angle brackets.  They are going to be removed as I tweak the script so it only shows up in the html code for seo purposes.  Thanks a million!!
0
 
LVL 6

Expert Comment

by:apresence
ID: 33640205
I think I see what you were trying to do with the < and > characters now.  If you want those to show up on your web page, you should use < and >, respectively.

Here's another version of the script with that fixed.
<?php

$file = "http://www.ergoindemand.com/pwr/7gdt43ap/rawdata/review_data_complete.xml";

function trustedFile($file)
{
  // only trust local files owned by ourselves
  if (!eregi("^([a-z]+)://", $file)
      && fileowner($file) == getmyuid()) {
          return true;
  }
  return false;
}


function startElement($parser, $name, $attribs)
{
  echo "&lt;<font color=\"#0000cc\">$name</font>";
  if (sizeof($attribs)) {
      while (list($k, $v) = each($attribs)) {
          echo " <font color=\"#009900\">$k</font>=\"<font
                  color=\"#990000\">$v</font>\"";
      }
  }
  echo "&gt";
}

function endElement($parser, $name)
{
  echo "&lt;/<font color=\"#0000cc\">$name</font>&gt;";
}

function characterData($parser, $data)
{
  echo "<b>$data</b>";
}

function PIHandler($parser, $target, $data)
{
  switch (strtolower($target)) {
      case "php":
          global $parser_file;
          // If the parsed document is "trusted", we say it is safe
          // to execute PHP code inside it.  If not, display the code
          // instead.
          if (trustedFile($parser_file[$parser])) {
              eval($data);
          } else {
              printf("Untrusted PHP code: <i>%s</i>",
                      htmlspecialchars($data));
          }
          break;
  }
}

function defaultHandler($parser, $data)
{
  if (substr($data, 0, 1) == "&" && substr($data, -1, 1) == ";") {
      printf('<font color="#aa00aa">%s</font>',
              htmlspecialchars($data));
  } else {
      printf('<font size="-1">%s</font>',
              htmlspecialchars($data));
  }
}

function externalEntityRefHandler($parser, $openEntityNames, $base, $systemId,
                                $publicId) {
  if ($systemId) {
      if (!list($parser, $fp) = new_xml_parser($systemId)) {
          printf("Could not open entity %s at %s\n", $openEntityNames,
                  $systemId);
          return false;
      }
      while ($data = fread($fp, 4096)) {
          if (!xml_parse($parser, $data, feof($fp))) {
              printf("XML error: %s at line %d while parsing entity %s\n",
                      xml_error_string(xml_get_error_code($parser)),
                      xml_get_current_line_number($parser), $openEntityNames);
              xml_parser_free($parser);
              return false;
          }
      }
      xml_parser_free($parser);
      return true;
  }
  return false;
}

function new_xml_parser($file)
{
  global $parser_file;

  $xml_parser = xml_parser_create();
  xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, 1);
  xml_set_element_handler($xml_parser, "startElement", "endElement");
  xml_set_character_data_handler($xml_parser, "characterData");
  xml_set_processing_instruction_handler($xml_parser, "PIHandler");
  xml_set_default_handler($xml_parser, "defaultHandler");
  xml_set_external_entity_ref_handler($xml_parser, "externalEntityRefHandler");
 
  if (!($fp = @fopen($file, "r"))) {
      return false;
  }
  if (!is_array($parser_file)) {
      settype($parser_file, "array");
  }
  $parser_file[$xml_parser] = $file;
  return array($xml_parser, $fp);
}

function apresence_scrub_xml($xml)
{
  // Grab 2 lines from xml header
  preg_match('|^([^$]+$){2}|imsU', $xml, $matches);
  $new_xml = $matches[0];

  preg_match_all('|(<product [^<]+<pageid>74\-5002JEE11</pageid>.*</product>)|imsU', $xml, $matches);
  if (count($matches) >= 1)
  {
    // We just want the grouped part...
    $grouped = $matches[1];
    for ($i=0; $i<count($grouped); $i++)
    {
      $new_xml .= $grouped[$i];
    }
  }

  // Append the xml footer
  $new_xml .= "\n</products>\n";

  return $new_xml;
}

if (!(list($xml_parser, $fp) = new_xml_parser($file))) {
  die("could not open XML input");
}

// Read all the data into one string
$data = '';
while ($this_data = fread($fp, 4096)) {
  $data .= $this_data;
}

// Just return the stuff we're interested in
$data = apresence_scrub_xml($data);

echo "<pre>";
// Uncomment the next line to see the scrubbed xml before parsing
// print_r($new_xml); echo "\n\n\n--------------------\n\n\n";

if (!xml_parse($xml_parser, $data, true)) {
    die(sprintf("XML error: %s at line %d\n",
                xml_error_string(xml_get_error_code($xml_parser)),
                xml_get_current_line_number($xml_parser)));
}
echo "</pre>";
echo "parse complete\n";
xml_parser_free($xml_parser);

?>

Open in new window

0
 
LVL 6

Expert Comment

by:apresence
ID: 33640236
I put &amp;lt; and &amp;gt; in that last comment, but of course the browser is interpreting them and showing the characters instead.

I meant it to show like this:
Whenever you want to show < or > on an HTML page, you should use &amp;lt; and &amp;gt; instead.
0
 

Author Comment

by:ergoindemand
ID: 33640241
I knew exactly what you meant...and thanks again!
0
 
LVL 6

Expert Comment

by:apresence
ID: 33640250
Still not right, let's try it like this:
Whenever you want to show < or > on an HTML page, you should use &lt; and &gt; instead.

Open in new window

0
 

Author Comment

by:ergoindemand
ID: 33646965
Last issue I have. I need to substitute the sku, "74-5002JEE11" with the variable that represents the sku in mysql since the sku needs to be dynamic for all our various products.  The variable called is $products_model.  I'm trying to figure out how to incorporate that in your function.  Any suggestions?
0
 

Author Comment

by:ergoindemand
ID: 33648485
This is what I've tried adding to the function for the product sku to be called wtihin the ... elements within the preg_match_all function.

$products_model = $product_info['products_model'];

preg_match_all('|(<product [^<]+<pageid>"/^$products_model$/"</pageid>.*</product>)|imsU', $xml, $matches);

$products_model calls the sku...  but I don't think I'm calling it correctly.  I just basically need to echo or return the value of $products_model within the string.  What am I doing wrong?
0
 
LVL 6

Expert Comment

by:apresence
ID: 33650310
Since this is unrelated to the original question, please open up a new question and I'll be happy to answer it for you.
0
 

Author Comment

by:ergoindemand
ID: 33650374
ok...just posted new question
0

Featured Post

Enroll in August's Course of the Month

August's CompTIA IT Fundamentals course includes 19 hours of basic computer principle modules and prepares you for the certification exam. It's free for Premium Members, Team Accounts, and Qualified Experts!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this. Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it i…
Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to count occurrences of each item in an array.
Suggested Courses

801 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question