Link to home
Start Free TrialLog in
Avatar of Member_2_5230414
Member_2_5230414

asked on

Scraping html using Domdoc + PHP

I would like to scrape the following HTML
     <div class="venue-event-list " rel="GB">
                                <div class="tracks-list">
    <div class="single-track">
                <a href="//livevideo.betfair.com/Default.do?mi=119408124" target="_blank" class="live-video-link"><div class="bf-icon-live-video tag-i13n i13n-ltxt-LVid i13n-sec-GB i13n-tab-today" title="Watch now on Betfair Live Video"></div></a>
        <div class="info-container">
            <span class="track-name">
                <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today" href="/exchange/plus/#/horse-racing/market/1.119408124">Lingfield</a>
            </span>
            <div class="races-list">
                    
                    
    <div class="single-race" id="m-1_119408124">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408124"
                title="5f Nursery | 7 Runners">14:10</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408128">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408128"
                title="6f Mdn Stks | 11 Runners">14:40</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408132">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408132"
                title="7f Mdn Stks | 6 Runners">15:10</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408136">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408136"
                title="2m Hcap | 12 Runners">15:40</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408140">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408140"
                title="1m2f Sell Stks | 6 Runners">16:10</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408144">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408144"
                title="1m3f Hcap | 8 Runners">16:40</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408148">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408148"
                title="1m1f Hcap | 14 Runners">17:10</a>
        </span>
    </div>
            </div>
        </div>
    </div>
                        </div>
                                <div class="tracks-list">
    <div class="single-track">
                <a href="//livevideo.betfair.com/Default.do?mi=119408153" target="_blank" class="live-video-link"><div class="bf-icon-live-video tag-i13n i13n-ltxt-LVid i13n-sec-GB i13n-tab-today" title="Watch now on Betfair Live Video"></div></a>
        <div class="info-container">
            <span class="track-name">
                <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today" href="/exchange/plus/#/horse-racing/market/1.119408153">Wolverhampton</a>
            </span>
            <div class="races-list">
                    
                    
    <div class="single-race" id="m-1_119408153">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408153"
                title="5f Mdn Stks | 7 Runners">14:20</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408157">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408157"
                title="1m6f Hcap | 7 Runners">14:50</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408161">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408161"
                title="1m4f Sell Stks | 5 Runners">15:20</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408165">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408165"
                title="1m1f Hcap | 13 Runners">15:50</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408169">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408169"
                title="1m1f Hcap | 11 Runners">16:20</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408173">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408173"
                title="1m Mdn Stks | 11 Runners">16:50</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408177">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408177"
                title="1m Hcap | 13 Runners">17:20</a>
        </span>
    </div>
            </div>
        </div>
    </div>
                        </div>

Open in new window

I have used the following code to pull the racename and the time of the race

   
 $url         = ""; 
    $html        = file_get_contents($url);
    $dom         = new DOMDocument();
    @$dom->loadHTML($html);
    $dom->preserveWhiteSpace = false;
    $xpath                   = new DOMXPath($dom);
    //pull the individual cards for the day
    //li class="rac-cardsclass="ix ixc"
    $getdropdown             = '//div[contains(@class, "tracks-list")]';
    $getdropdown2            = $xpath->query($getdropdown);
    //loop through each individual card
    foreach ($getdropdown2 as $dropresults) {
    echo $dropresults->textContent. "<br />";
    }

Open in new window


What i would like to do is pull the meeting name if only the link (shown below) contains "GB" and "today" (this is within the class text) -

    >  <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today"
    > href="/exchange/plus/#/horse-racing/market/1.119408124">Lingfield</a>

Open in new window


so the outcome would be lingfield... if this is true i would like to then pull the time of the race and the market id from the following :
 

    <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408124"
            title="5f Nursery | 7 Runners">14:10</a>

Open in new window


so the outcome would be:

   
 Lingfield 14:10 1.119408124 
    Lingfield 14:40 1.119408144
     ............................. 
    Wolverhampton 14:20 1.119408153

Open in new window

Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Please show us the true URL you are trying to scrape.  We would need to be able to use PHP file_get_contents() (the same way you do) to read the HTML in order to get the test data.
Avatar of Member_2_5230414
Member_2_5230414

ASKER

ASKER CERTIFIED SOLUTION
Avatar of hielo
hielo
Flag of Wallis and Futuna image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial