We help IT Professionals succeed at work.
Get Started

Scraping html using Domdoc + PHP

Member_2_5230414
on
139 Views
Last Modified: 2015-07-29
I would like to scrape the following HTML
     <div class="venue-event-list " rel="GB">
                                <div class="tracks-list">
    <div class="single-track">
                <a href="//livevideo.betfair.com/Default.do?mi=119408124" target="_blank" class="live-video-link"><div class="bf-icon-live-video tag-i13n i13n-ltxt-LVid i13n-sec-GB i13n-tab-today" title="Watch now on Betfair Live Video"></div></a>
        <div class="info-container">
            <span class="track-name">
                <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today" href="/exchange/plus/#/horse-racing/market/1.119408124">Lingfield</a>
            </span>
            <div class="races-list">
                    
                    
    <div class="single-race" id="m-1_119408124">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408124"
                title="5f Nursery | 7 Runners">14:10</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408128">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408128"
                title="6f Mdn Stks | 11 Runners">14:40</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408132">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408132"
                title="7f Mdn Stks | 6 Runners">15:10</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408136">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408136"
                title="2m Hcap | 12 Runners">15:40</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408140">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408140"
                title="1m2f Sell Stks | 6 Runners">16:10</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408144">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408144"
                title="1m3f Hcap | 8 Runners">16:40</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408148">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408148"
                title="1m1f Hcap | 14 Runners">17:10</a>
        </span>
    </div>
            </div>
        </div>
    </div>
                        </div>
                                <div class="tracks-list">
    <div class="single-track">
                <a href="//livevideo.betfair.com/Default.do?mi=119408153" target="_blank" class="live-video-link"><div class="bf-icon-live-video tag-i13n i13n-ltxt-LVid i13n-sec-GB i13n-tab-today" title="Watch now on Betfair Live Video"></div></a>
        <div class="info-container">
            <span class="track-name">
                <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today" href="/exchange/plus/#/horse-racing/market/1.119408153">Wolverhampton</a>
            </span>
            <div class="races-list">
                    
                    
    <div class="single-race" id="m-1_119408153">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408153"
                title="5f Mdn Stks | 7 Runners">14:20</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408157">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408157"
                title="1m6f Hcap | 7 Runners">14:50</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408161">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408161"
                title="1m4f Sell Stks | 5 Runners">15:20</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408165">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408165"
                title="1m1f Hcap | 13 Runners">15:50</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408169">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408169"
                title="1m1f Hcap | 11 Runners">16:20</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408173">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408173"
                title="1m Mdn Stks | 11 Runners">16:50</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408177">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408177"
                title="1m Hcap | 13 Runners">17:20</a>
        </span>
    </div>
            </div>
        </div>
    </div>
                        </div>

Open in new window

I have used the following code to pull the racename and the time of the race

   
 $url         = ""; 
    $html        = file_get_contents($url);
    $dom         = new DOMDocument();
    @$dom->loadHTML($html);
    $dom->preserveWhiteSpace = false;
    $xpath                   = new DOMXPath($dom);
    //pull the individual cards for the day
    //li class="rac-cardsclass="ix ixc"
    $getdropdown             = '//div[contains(@class, "tracks-list")]';
    $getdropdown2            = $xpath->query($getdropdown);
    //loop through each individual card
    foreach ($getdropdown2 as $dropresults) {
    echo $dropresults->textContent. "<br />";
    }

Open in new window


What i would like to do is pull the meeting name if only the link (shown below) contains "GB" and "today" (this is within the class text) -

    >  <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today"
    > href="/exchange/plus/#/horse-racing/market/1.119408124">Lingfield</a>

Open in new window


so the outcome would be lingfield... if this is true i would like to then pull the time of the race and the market id from the following :
 

    <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408124"
            title="5f Nursery | 7 Runners">14:10</a>

Open in new window


so the outcome would be:

   
 Lingfield 14:10 1.119408124 
    Lingfield 14:40 1.119408144
     ............................. 
    Wolverhampton 14:20 1.119408153

Open in new window

Comment
Watch Question
CERTIFIED EXPERT
Expert of the Year 2008
Top Expert 2008
Commented:
This problem has been solved!
Unlock 1 Answer and 3 Comments.
See Answer
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE