Link to home
Start Free TrialLog in
Avatar of weekapaug
weekapaug

asked on

simple html dom php accessing table/cell values

Cant get at the table values with simple_html_dom.

Basically I need to first access the table with ID of mainTable, then loop simultaneously through both the TH value and Span Class

I tried the same way I do this with DIVS and it does not pick up any data....

I tried below and also tried using innertext vs plaintext and neither of those do anything.  I also tried just coding it to loop through anything with the 'tr' tag, avoiding asking for a table, and it still does nothing.

foreach ($html->find('table') as $div_element){
      
          foreach ($div_element->find('th') as $pcts){
                 $pct[] = $pcts->plaintext;
           }

          foreach ($div_element->find('span[class=valueIneed]') as $pcus){
                 $pcu[] = $pcus->plaintext;
           }
}

Below is the data example and my code returns empty, but if I use this for divs it works great. It seems like maybe there is something special to do for table data?

<table id="mainTable">
    <tr id="LargeCell">
        <td colspan="2">
            Large Cell
        </td>
    </tr>
    <tr class="NextRow">
        <th>
            Header1
        </th>
        <td>
            <span class="valueIneed">val1</span>
        </td>
    </tr>
    <tr>
        <th>
            Header2
        </th>
        <td>
        <span class="valueIneed">val2</span>
    </td>
    </tr>
    <tr class="NextRow">
        <th>
            Header3
        </th>
        <td>
            <span class="valueIneed">val3</span>
        </td>
    </tr>
    <tr>
        <th>
            Header 4
        </th>
        <td>
            <span class="valueIneed">val4</span> 
        </td>
    </tr>
</table>

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of weekapaug
weekapaug

ASKER

I wrote something exactly like this but it didn't work either.

I narrowed it down to the fact that I'm loading in from a file which is working fine 99% of the time, but in this case something is funky with the loading of the data.  Strangely it will work for div, meta, span, and all that, but when asking for table or tr or td, or anything related to a table, it will just return blank info.

Are there any procedures you can recommend to prep this stored file for loading?  Is there anything you can think of that would corrupt ONLY the function for table info?
...it didn't work either.

Erm ?

This produced a sensible output given the inputs we have to work with.  Eg:
array(4) {
  [0]=>
  string(7) "Header1"
  [1]=>
  string(7) "Header2"
  [2]=>
  string(7) "Header3"
  [3]=>
  string(8) "Header 4"
}
array(4) {
  [0]=>
  string(4) "val1"
  [1]=>
  string(4) "val2"
  [2]=>
  string(4) "val3"
  [3]=>
  string(4) "val4"
}

TH: Header1 SPAN: val1
TH: Header2 SPAN: val2
TH: Header3 SPAN: val3
TH: Header 4 SPAN: val4

Open in new window

There are two concepts I'd like to share with you.  The first is Test-Driven Development.  It teaches us a way of thinking about problems in terms of given inputs and expected outputs.  The second is the SSCCE; it teaches us a way of communicating about these same problems in terms of given inputs and expected outputs.  

In this case, it looks a lot to me like we are dealing with hypothetical data in the HTML document.  And since this is clearly a data-dependent problem, we might be able to get better results if we deal with the actual problem data set, or at least a faithful SSCCE representation of the problem data set.  The abstraction layer of hypothetical HTML may be eclipsing our ability to get a workable solution.  Often the devil is in the details, and our abstraction of the problem may obscure the (important) details.

If you want to post the actual problem inputs and the desired outputs, I'll be glad to take a run at it.   A link to the real input file would be helpful, since that sounds like what you're using in your tests.
This worked, although it was something I already had done.  I jumped the gun and thought the problem was my syntax, but it turns out to be related to improper HTML formation on the page.