mmalik15
asked on
how to extract certain rows of a table from html using regex
I need to extract certain records from an html page. And those recrods are in the form of table and I need only those rows which have a "Review" button in them. The link for the web page is http://campbellcollaboration.org/lib/index.php?noframe&go=browse_small&sort=title&view=all
or I have pasted html of three rows from where I need to filter the records
Get only those rows which have <span style="padding: 3px;">Review</span>
thanks
or I have pasted html of three rows from where I need to filter the records
Get only those rows which have <span style="padding: 3px;">Review</span>
thanks
<tr>
<td style="padding-left: 5px; width: 26px;">
7.
</td>
<td style="width: 120px; padding-left: 0px; padding-right: 15px;">
<a class="badge_new badge_new_title_proposal ui-corner-all" rel="nofollow" href="download/61/">
<span style="padding: 3px;">Title proposal</span> </a><a class="badge_new badge_new_protocol ui-corner-all"
rel="nofollow" href="download/62/"><span style="padding: 3px;">Protocol</span>
</a><a class="badge_new badge_new_review ui-corner-all" rel="nofollow" href="download/63/">
<span style="padding: 3px;">Review</span> </a>
</td>
<td>
<table cellspacing="0" cellpadding="0" style="width: 100%;">
<tbody>
<tr>
<td style="width: 70px; color: rgb(0, 102, 153);">
Title:
</td>
<td colspan="3" style="font-weight: bold; color: rgb(0, 0, 0);">
Approaches to parent involvement for improving the academic performance of elementary
school age children
</td>
</tr>
<tr>
<td style="color: rgb(0, 102, 153);">
Authors:
</td>
<td colspan="3">
Chad Nye, Jamie Schwartz, Herbert Turner
</td>
</tr>
<tr>
<td style="color: rgb(0, 102, 153);">
Published:
</td>
<td colspan="3">
21.06.2006
</td>
</tr>
<tr>
<td style="color: rgb(0, 102, 153);">
Group:
</td>
<td style="text-align: left;">
Education
</td>
</tr>
</tbody>
</table>
</td>
<td style="width: 50px; text-align: right; padding-right: 20px;">
<input type="checkbox" value="1" name="export_ris_checkbox[13]" class="checkbox browse_checkbox">
</td>
</tr>
<tr>
<td style="padding-left: 5px; width: 26px;">
10.
</td>
<td style="width: 120px; padding-left: 0px; padding-right: 15px;">
<a class="badge_new badge_new_title_proposal ui-corner-all" rel="nofollow" href="download/351/">
<span style="padding: 3px;">Title proposal</span> </a><a class="badge_new badge_new_protocol ui-corner-all"
rel="nofollow" href="download/352/"><span style="padding: 3px;">Protocol</span>
</a>
</td>
<td>
<table cellspacing="0" cellpadding="0" style="width: 100%;">
<tbody>
<tr>
<td style="width: 70px; color: rgb(0, 102, 153);">
Title:
</td>
<td colspan="3" style="font-weight: bold; color: rgb(0, 0, 0);">
Behavioral stuttering interventions in school age children 4-18 years of age
</td>
</tr>
<tr>
<td style="color: rgb(0, 102, 153);">
Authors:
</td>
<td colspan="3">
Carl Herder, Courtney Howard, Chad Nye, Jamie Schwartz, Herbert Turner, Martine
Vanryckeghem
</td>
</tr>
<tr>
<td style="color: rgb(0, 102, 153);">
Published:
</td>
<td colspan="3">
12.04.2007
</td>
</tr>
<tr>
<td style="color: rgb(0, 102, 153);">
Group:
</td>
<td style="text-align: left;">
Education
</td>
</tr>
</tbody>
</table>
</td>
<td style="width: 50px; text-align: right; padding-right: 20px;">
<input type="checkbox" value="1" name="export_ris_checkbox[71]" class="checkbox browse_checkbox">
</td>
</tr>
<tr>
<td style="padding-left: 5px; width: 26px;">
21.
</td>
<td style="width: 120px; padding-left: 0px; padding-right: 15px;">
<a class="badge_new badge_new_protocol ui-corner-all" rel="nofollow" href="download/92/">
<span style="padding: 3px;">Protocol</span> </a><a class="badge_new badge_new_review ui-corner-all"
rel="nofollow" href="download/93/"><span style="padding: 3px;">Review</span>
</a><a class="badge_new badge_new_abstract ui-corner-all" rel="nofollow" href="download/94/">
<span style="padding: 3px;">User abstract</span> </a>
</td>
<td>
<table cellspacing="0" cellpadding="0" style="width: 100%;">
<tbody>
<tr>
<td style="width: 70px; color: rgb(0, 102, 153);">
Title:
</td>
<td colspan="3" style="font-weight: bold; color: rgb(0, 0, 0);">
Cognitive-behavioural interventions for children who have been sexually abused
</td>
</tr>
<tr>
<td style="color: rgb(0, 102, 153);">
Authors:
</td>
<td colspan="3">
Geraldine Macdonald, Julian Higgins, Paul Ramchandani
</td>
</tr>
<tr>
<td style="color: rgb(0, 102, 153);">
Published:
</td>
<td colspan="3">
06.11.2006
</td>
</tr>
<tr>
<td style="color: rgb(0, 102, 153);">
Group:
</td>
<td style="text-align: left;">
Social Welfare
</td>
</tr>
</tbody>
</table>
</td>
<td style="width: 50px; text-align: right; padding-right: 20px;">
<input type="checkbox" value="1" name="export_ris_checkbox[19]" class="checkbox browse_checkbox">
</td>
</tr>
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
That's only if the HTML follows strict structuring, like XML does. There is still the possibility of having opening tags without matching closing tags in HTML, which would violate XML's well-formedness requirement. HTML is still in transition, AFAIK, to become rigid in structure.