Link to home
Create AccountLog in
Avatar of mmalik15
mmalik15

asked on

how to extract certain rows of a table from html using regex

I need to extract certain records from an html page. And those recrods are in the form of table and I need only those rows which have a "Review" button in them. The link for the web page is http://campbellcollaboration.org/lib/index.php?noframe&go=browse_small&sort=title&view=all
or I have pasted html of three rows from where I need to filter the records

Get only those rows which have <span style="padding: 3px;">Review</span>

thanks
<tr>
    <td style="padding-left: 5px; width: 26px;">
        7.
    </td>
    <td style="width: 120px; padding-left: 0px; padding-right: 15px;">
        <a class="badge_new badge_new_title_proposal ui-corner-all" rel="nofollow" href="download/61/">
            <span style="padding: 3px;">Title proposal</span> </a><a class="badge_new badge_new_protocol ui-corner-all"
                rel="nofollow" href="download/62/"><span style="padding: 3px;">Protocol</span>
        </a><a class="badge_new badge_new_review ui-corner-all" rel="nofollow" href="download/63/">
            <span style="padding: 3px;">Review</span> </a>
    </td>
    <td>
        <table cellspacing="0" cellpadding="0" style="width: 100%;">
            <tbody>
                <tr>
                    <td style="width: 70px; color: rgb(0, 102, 153);">
                        Title:
                    </td>
                    <td colspan="3" style="font-weight: bold; color: rgb(0, 0, 0);">
                        Approaches to parent involvement for improving the academic performance of elementary
                        school age children
                    </td>
                </tr>
                <tr>
                    <td style="color: rgb(0, 102, 153);">
                        Authors:
                    </td>
                    <td colspan="3">
                        Chad Nye, Jamie Schwartz, Herbert Turner
                    </td>
                </tr>
                <tr>
                    <td style="color: rgb(0, 102, 153);">
                        Published:
                    </td>
                    <td colspan="3">
                        21.06.2006
                    </td>
                </tr>
                <tr>
                    <td style="color: rgb(0, 102, 153);">
                        Group:
                    </td>
                    <td style="text-align: left;">
                        Education
                    </td>
                </tr>
            </tbody>
        </table>
    </td>
    <td style="width: 50px; text-align: right; padding-right: 20px;">
        <input type="checkbox" value="1" name="export_ris_checkbox[13]" class="checkbox browse_checkbox">
    </td>
</tr>
<tr>
    <td style="padding-left: 5px; width: 26px;">
        10.
    </td>
    <td style="width: 120px; padding-left: 0px; padding-right: 15px;">
        <a class="badge_new badge_new_title_proposal ui-corner-all" rel="nofollow" href="download/351/">
            <span style="padding: 3px;">Title proposal</span> </a><a class="badge_new badge_new_protocol ui-corner-all"
                rel="nofollow" href="download/352/"><span style="padding: 3px;">Protocol</span>
            </a>
    </td>
    <td>
        <table cellspacing="0" cellpadding="0" style="width: 100%;">
            <tbody>
                <tr>
                    <td style="width: 70px; color: rgb(0, 102, 153);">
                        Title:
                    </td>
                    <td colspan="3" style="font-weight: bold; color: rgb(0, 0, 0);">
                        Behavioral stuttering interventions in school age children 4-18 years of age
                    </td>
                </tr>
                <tr>
                    <td style="color: rgb(0, 102, 153);">
                        Authors:
                    </td>
                    <td colspan="3">
                        Carl Herder, Courtney Howard, Chad Nye, Jamie Schwartz, Herbert Turner, Martine
                        Vanryckeghem
                    </td>
                </tr>
                <tr>
                    <td style="color: rgb(0, 102, 153);">
                        Published:
                    </td>
                    <td colspan="3">
                        12.04.2007
                    </td>
                </tr>
                <tr>
                    <td style="color: rgb(0, 102, 153);">
                        Group:
                    </td>
                    <td style="text-align: left;">
                        Education
                    </td>
                </tr>
            </tbody>
        </table>
    </td>
    <td style="width: 50px; text-align: right; padding-right: 20px;">
        <input type="checkbox" value="1" name="export_ris_checkbox[71]" class="checkbox browse_checkbox">
    </td>
</tr>
<tr>
    <td style="padding-left: 5px; width: 26px;">
        21.
    </td>
    <td style="width: 120px; padding-left: 0px; padding-right: 15px;">
        <a class="badge_new badge_new_protocol ui-corner-all" rel="nofollow" href="download/92/">
            <span style="padding: 3px;">Protocol</span> </a><a class="badge_new badge_new_review ui-corner-all"
                rel="nofollow" href="download/93/"><span style="padding: 3px;">Review</span>
        </a><a class="badge_new badge_new_abstract ui-corner-all" rel="nofollow" href="download/94/">
            <span style="padding: 3px;">User abstract</span> </a>
    </td>
    <td>
        <table cellspacing="0" cellpadding="0" style="width: 100%;">
            <tbody>
                <tr>
                    <td style="width: 70px; color: rgb(0, 102, 153);">
                        Title:
                    </td>
                    <td colspan="3" style="font-weight: bold; color: rgb(0, 0, 0);">
                        Cognitive-behavioural interventions for children who have been sexually abused
                    </td>
                </tr>
                <tr>
                    <td style="color: rgb(0, 102, 153);">
                        Authors:
                    </td>
                    <td colspan="3">
                        Geraldine Macdonald, Julian Higgins, Paul Ramchandani
                    </td>
                </tr>
                <tr>
                    <td style="color: rgb(0, 102, 153);">
                        Published:
                    </td>
                    <td colspan="3">
                        06.11.2006
                    </td>
                </tr>
                <tr>
                    <td style="color: rgb(0, 102, 153);">
                        Group:
                    </td>
                    <td style="text-align: left;">
                        Social Welfare
                    </td>
                </tr>
            </tbody>
        </table>
    </td>
    <td style="width: 50px; text-align: right; padding-right: 20px;">
        <input type="checkbox" value="1" name="export_ris_checkbox[19]" class="checkbox browse_checkbox">
    </td>
</tr>

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
SOLUTION
Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
Recall the HTML is really just XML,

That's only if the HTML follows strict structuring, like XML does. There is still the possibility  of having opening tags without matching closing tags in HTML, which would violate XML's well-formedness requirement. HTML is still in transition, AFAIK, to become rigid in structure.