Member_2_5230414
asked on
Scraping html using Domdoc + PHP
I would like to scrape the following HTML
What i would like to do is pull the meeting name if only the link (shown below) contains "GB" and "today" (this is within the class text) -
so the outcome would be lingfield... if this is true i would like to then pull the time of the race and the market id from the following :
so the outcome would be:
<div class="venue-event-list " rel="GB">
<div class="tracks-list">
<div class="single-track">
<a href="//livevideo.betfair.com/Default.do?mi=119408124" target="_blank" class="live-video-link"><div class="bf-icon-live-video tag-i13n i13n-ltxt-LVid i13n-sec-GB i13n-tab-today" title="Watch now on Betfair Live Video"></div></a>
<div class="info-container">
<span class="track-name">
<a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today" href="/exchange/plus/#/horse-racing/market/1.119408124">Lingfield</a>
</span>
<div class="races-list">
<div class="single-race" id="m-1_119408124">
<span class="race-time link-text">
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408124"
title="5f Nursery | 7 Runners">14:10</a>
</span>
<span class="separator">|</span>
</div>
<div class="single-race" id="m-1_119408128">
<span class="race-time link-text">
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408128"
title="6f Mdn Stks | 11 Runners">14:40</a>
</span>
<span class="separator">|</span>
</div>
<div class="single-race" id="m-1_119408132">
<span class="race-time link-text">
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408132"
title="7f Mdn Stks | 6 Runners">15:10</a>
</span>
<span class="separator">|</span>
</div>
<div class="single-race" id="m-1_119408136">
<span class="race-time link-text">
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408136"
title="2m Hcap | 12 Runners">15:40</a>
</span>
<span class="separator">|</span>
</div>
<div class="single-race" id="m-1_119408140">
<span class="race-time link-text">
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408140"
title="1m2f Sell Stks | 6 Runners">16:10</a>
</span>
<span class="separator">|</span>
</div>
<div class="single-race" id="m-1_119408144">
<span class="race-time link-text">
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408144"
title="1m3f Hcap | 8 Runners">16:40</a>
</span>
<span class="separator">|</span>
</div>
<div class="single-race" id="m-1_119408148">
<span class="race-time link-text">
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408148"
title="1m1f Hcap | 14 Runners">17:10</a>
</span>
</div>
</div>
</div>
</div>
</div>
<div class="tracks-list">
<div class="single-track">
<a href="//livevideo.betfair.com/Default.do?mi=119408153" target="_blank" class="live-video-link"><div class="bf-icon-live-video tag-i13n i13n-ltxt-LVid i13n-sec-GB i13n-tab-today" title="Watch now on Betfair Live Video"></div></a>
<div class="info-container">
<span class="track-name">
<a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today" href="/exchange/plus/#/horse-racing/market/1.119408153">Wolverhampton</a>
</span>
<div class="races-list">
<div class="single-race" id="m-1_119408153">
<span class="race-time link-text">
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408153"
title="5f Mdn Stks | 7 Runners">14:20</a>
</span>
<span class="separator">|</span>
</div>
<div class="single-race" id="m-1_119408157">
<span class="race-time link-text">
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408157"
title="1m6f Hcap | 7 Runners">14:50</a>
</span>
<span class="separator">|</span>
</div>
<div class="single-race" id="m-1_119408161">
<span class="race-time link-text">
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408161"
title="1m4f Sell Stks | 5 Runners">15:20</a>
</span>
<span class="separator">|</span>
</div>
<div class="single-race" id="m-1_119408165">
<span class="race-time link-text">
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408165"
title="1m1f Hcap | 13 Runners">15:50</a>
</span>
<span class="separator">|</span>
</div>
<div class="single-race" id="m-1_119408169">
<span class="race-time link-text">
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408169"
title="1m1f Hcap | 11 Runners">16:20</a>
</span>
<span class="separator">|</span>
</div>
<div class="single-race" id="m-1_119408173">
<span class="race-time link-text">
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408173"
title="1m Mdn Stks | 11 Runners">16:50</a>
</span>
<span class="separator">|</span>
</div>
<div class="single-race" id="m-1_119408177">
<span class="race-time link-text">
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408177"
title="1m Hcap | 13 Runners">17:20</a>
</span>
</div>
</div>
</div>
</div>
</div>
I have used the following code to pull the racename and the time of the race $url = "";
$html = file_get_contents($url);
$dom = new DOMDocument();
@$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
//pull the individual cards for the day
//li class="rac-cardsclass="ix ixc"
$getdropdown = '//div[contains(@class, "tracks-list")]';
$getdropdown2 = $xpath->query($getdropdown);
//loop through each individual card
foreach ($getdropdown2 as $dropresults) {
echo $dropresults->textContent. "<br />";
}
What i would like to do is pull the meeting name if only the link (shown below) contains "GB" and "today" (this is within the class text) -
> <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today"
> href="/exchange/plus/#/horse-racing/market/1.119408124">Lingfield</a>
so the outcome would be lingfield... if this is true i would like to then pull the time of the race and the market id from the following :
<a class="race-link tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
href="/exchange/plus/#/horse-racing/market/1.119408124"
title="5f Nursery | 7 Runners">14:10</a>
so the outcome would be:
Lingfield 14:10 1.119408124
Lingfield 14:40 1.119408144
.............................
Wolverhampton 14:20 1.119408153
Please show us the true URL you are trying to scrape. We would need to be able to use PHP file_get_contents() (the same way you do) to read the HTML in order to get the test data.
ASKER
sorry ray - the url is : https://www.betfair.com/exchange/horse-racing
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.