Avatar of Member_2_5230414
Member_2_5230414
 asked on

Scraping html using Domdoc + PHP

I would like to scrape the following HTML
     <div class="venue-event-list " rel="GB">
                                <div class="tracks-list">
    <div class="single-track">
                <a href="//livevideo.betfair.com/Default.do?mi=119408124" target="_blank" class="live-video-link"><div class="bf-icon-live-video tag-i13n i13n-ltxt-LVid i13n-sec-GB i13n-tab-today" title="Watch now on Betfair Live Video"></div></a>
        <div class="info-container">
            <span class="track-name">
                <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today" href="/exchange/plus/#/horse-racing/market/1.119408124">Lingfield</a>
            </span>
            <div class="races-list">
                    
                    
    <div class="single-race" id="m-1_119408124">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408124"
                title="5f Nursery | 7 Runners">14:10</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408128">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408128"
                title="6f Mdn Stks | 11 Runners">14:40</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408132">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408132"
                title="7f Mdn Stks | 6 Runners">15:10</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408136">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408136"
                title="2m Hcap | 12 Runners">15:40</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408140">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408140"
                title="1m2f Sell Stks | 6 Runners">16:10</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408144">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408144"
                title="1m3f Hcap | 8 Runners">16:40</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408148">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408148"
                title="1m1f Hcap | 14 Runners">17:10</a>
        </span>
    </div>
            </div>
        </div>
    </div>
                        </div>
                                <div class="tracks-list">
    <div class="single-track">
                <a href="//livevideo.betfair.com/Default.do?mi=119408153" target="_blank" class="live-video-link"><div class="bf-icon-live-video tag-i13n i13n-ltxt-LVid i13n-sec-GB i13n-tab-today" title="Watch now on Betfair Live Video"></div></a>
        <div class="info-container">
            <span class="track-name">
                <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today" href="/exchange/plus/#/horse-racing/market/1.119408153">Wolverhampton</a>
            </span>
            <div class="races-list">
                    
                    
    <div class="single-race" id="m-1_119408153">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408153"
                title="5f Mdn Stks | 7 Runners">14:20</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408157">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408157"
                title="1m6f Hcap | 7 Runners">14:50</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408161">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408161"
                title="1m4f Sell Stks | 5 Runners">15:20</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408165">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408165"
                title="1m1f Hcap | 13 Runners">15:50</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408169">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408169"
                title="1m1f Hcap | 11 Runners">16:20</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408173">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408173"
                title="1m Mdn Stks | 11 Runners">16:50</a>
        </span>
            <span class="separator">|</span>
    </div>
                    
                    
    <div class="single-race" id="m-1_119408177">
        <span class="race-time link-text">
            <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
                href="/exchange/plus/#/horse-racing/market/1.119408177"
                title="1m Hcap | 13 Runners">17:20</a>
        </span>
    </div>
            </div>
        </div>
    </div>
                        </div>

Open in new window

I have used the following code to pull the racename and the time of the race

   
 $url         = ""; 
    $html        = file_get_contents($url);
    $dom         = new DOMDocument();
    @$dom->loadHTML($html);
    $dom->preserveWhiteSpace = false;
    $xpath                   = new DOMXPath($dom);
    //pull the individual cards for the day
    //li class="rac-cardsclass="ix ixc"
    $getdropdown             = '//div[contains(@class, "tracks-list")]';
    $getdropdown2            = $xpath->query($getdropdown);
    //loop through each individual card
    foreach ($getdropdown2 as $dropresults) {
    echo $dropresults->textContent. "<br />";
    }

Open in new window


What i would like to do is pull the meeting name if only the link (shown below) contains "GB" and "today" (this is within the class text) -

    >  <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today"
    > href="/exchange/plus/#/horse-racing/market/1.119408124">Lingfield</a>

Open in new window


so the outcome would be lingfield... if this is true i would like to then pull the time of the race and the market id from the following :
 

    <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408124"
            title="5f Nursery | 7 Runners">14:10</a>

Open in new window


so the outcome would be:

   
 Lingfield 14:10 1.119408124 
    Lingfield 14:40 1.119408144
     ............................. 
    Wolverhampton 14:20 1.119408153

Open in new window

PHPJavaScriptHTML

Avatar of undefined
Last Comment
hielo

8/22/2022 - Mon
Ray Paseur

Please show us the true URL you are trying to scrape.  We would need to be able to use PHP file_get_contents() (the same way you do) to read the HTML in order to get the test data.
Member_2_5230414

ASKER
ASKER CERTIFIED SOLUTION
hielo

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck