fakir420
asked on
Parse HTML Table
I need example code to parse the following HTML table into a list or array - I'm not sure the best way as I'm new to Python coming from VB. So far I've been able to get the HTML and split out just the table portion, but I now need to parse the table into (preferrably) a comma delimited list of the tables rows/columns. I would really like to see and example including parsing out all the unwanted text and just the column headings/column data text. Also any comments in the code to show me what is going on would help out alot.
Here is the output I'd like to get - this is the first two rows of the table:
Date|Time|From|Duration (hh:mm:ss)
Jun 02, 2006|02:53 PM|13012684637|00:01:00
<---------------Start Table--------------->
<table width="100%" id="received_calls" border="0" cellpadding="1" cellspacing="2">
<tr class="tableheader2">
<td align=left colspan=5 class="tableheader2">Recei ved Calls</td>
</tr>
<tr>
<TD nowrap class="tableheader9" width="20%" >Date</TD>
<TD nowrap class="tableheader9" width="20%" >Time</TD>
<TD nowrap class="tableheader9" width="35%" >
<div class="iconButton"></div>< div class="iconButton"></div>F rom</TD>
<TD nowrap class="tableheader9" width="25%" >Duration (hh:mm:ss)</TD>
</tr>
<tr class="tablebody1">
<TD nowrap >Jun 02, 2006 </TD>
<TD nowrap >03:56 PM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', STICKY, CSSCLASS);" onmouseover="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', CSSCLASS);" onmouseout="return nd();"><img src="/static/common-web/im ages/activ ity_icons/ icon_calle r_id.gif" border="0" ></a></div>
<div class="iconButton"><img src="/static/common-web/im ages/activ ity_icons/ icon_fille r.gif" border="0" ></div>
<div class="phoneNumber">130126 84637</div >
</TD>
<TD nowrap >
00:02:00</TD>
</tr>
<tr >
<TD nowrap >Jun 02, 2006 </TD>
<TD nowrap >02:53 PM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', STICKY, CSSCLASS);" onmouseover="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', CSSCLASS);" onmouseout="return nd();"><img src="/static/common-web/im ages/activ ity_icons/ icon_calle r_id.gif" border="0" ></a></div>
<div class="iconButton"><img src="/static/common-web/im ages/activ ity_icons/ icon_fille r.gif" border="0" ></div>
<div class="phoneNumber">130126 84637</div >
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
<tr class="tablebody1">
<TD nowrap >Jun 02, 2006 </TD>
<TD nowrap >12:29 PM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', STICKY, CSSCLASS);" onmouseover="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', CSSCLASS);" onmouseout="return nd();"><img src="/static/common-web/im ages/activ ity_icons/ icon_calle r_id.gif" border="0" ></a></div>
<div class="iconButton"><img src="/static/common-web/im ages/activ ity_icons/ icon_fille r.gif" border="0" ></div>
<div class="phoneNumber">130126 84637</div >
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
<tr >
<TD nowrap >Jun 02, 2006 </TD>
<TD nowrap >11:55 AM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>154087741 79<br><br> <b>PCS PHONE VA</b>', STICKY, CSSCLASS);" onmouseover="return overlib('<CENTER>154087741 79<br><br> <b>PCS PHONE VA</b>', CSSCLASS);" onmouseout="return nd();"><img src="/static/common-web/im ages/activ ity_icons/ icon_calle r_id.gif" border="0" ></a></div>
<div class="iconButton"><img src="/static/common-web/im ages/activ ity_icons/ icon_fille r.gif" border="0" ></div>
<div class="phoneNumber">154087 74179</div >
</TD>
<TD nowrap >
00:06:00</TD>
</tr>
<tr class="tablebody1">
<TD nowrap >Jun 02, 2006 </TD>
<TD nowrap >11:41 AM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', STICKY, CSSCLASS);" onmouseover="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', CSSCLASS);" onmouseout="return nd();"><img src="/static/common-web/im ages/activ ity_icons/ icon_calle r_id.gif" border="0" ></a></div>
<div class="iconButton"><img src="/static/common-web/im ages/activ ity_icons/ icon_fille r.gif" border="0" ></div>
<div class="phoneNumber">130126 84637</div >
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
<tr >
<TD nowrap >Jun 02, 2006 </TD>
<TD nowrap >11:41 AM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', STICKY, CSSCLASS);" onmouseover="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', CSSCLASS);" onmouseout="return nd();"><img src="/static/common-web/im ages/activ ity_icons/ icon_calle r_id.gif" border="0" ></a></div>
<div class="iconButton"><img src="/static/common-web/im ages/activ ity_icons/ icon_fille r.gif" border="0" ></div>
<div class="phoneNumber">130126 84637</div >
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
<tr class="tablebody1">
<TD nowrap >Jun 02, 2006 </TD>
<TD nowrap >11:40 AM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', STICKY, CSSCLASS);" onmouseover="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', CSSCLASS);" onmouseout="return nd();"><img src="/static/common-web/im ages/activ ity_icons/ icon_calle r_id.gif" border="0" ></a></div>
<div class="iconButton"><img src="/static/common-web/im ages/activ ity_icons/ icon_fille r.gif" border="0" ></div>
<div class="phoneNumber">130126 84637</div >
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
<tr >
<TD nowrap >Jun 01, 2006 </TD>
<TD nowrap >05:49 PM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>186668941 18<br><br> <b>800 SERVICE</b>', STICKY, CSSCLASS);" onmouseover="return overlib('<CENTER>186668941 18<br><br> <b>800 SERVICE</b>', CSSCLASS);" onmouseout="return nd();"><img src="/static/common-web/im ages/activ ity_icons/ icon_calle r_id.gif" border="0" ></a></div>
<div class="iconButton"><img src="/static/common-web/im ages/activ ity_icons/ icon_fille r.gif" border="0" ></div>
<div class="phoneNumber">186668 94118</div >
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
<tr class="tablebody1">
<TD nowrap >Jun 01, 2006 </TD>
<TD nowrap >04:48 PM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', STICKY, CSSCLASS);" onmouseover="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', CSSCLASS);" onmouseout="return nd();"><img src="/static/common-web/im ages/activ ity_icons/ icon_calle r_id.gif" border="0" ></a></div>
<div class="iconButton"><img src="/static/common-web/im ages/activ ity_icons/ icon_fille r.gif" border="0" ></div>
<div class="phoneNumber">130126 84637</div >
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
<tr >
<TD nowrap >Jun 01, 2006 </TD>
<TD nowrap >04:48 PM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', STICKY, CSSCLASS);" onmouseover="return overlib('<CENTER>130126846 37<br><br> <b>CELL PHONE MD</b>', CSSCLASS);" onmouseout="return nd();"><img src="/static/common-web/im ages/activ ity_icons/ icon_calle r_id.gif" border="0" ></a></div>
<div class="iconButton"><img src="/static/common-web/im ages/activ ity_icons/ icon_fille r.gif" border="0" ></div>
<div class="phoneNumber">130126 84637</div >
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
</table>
<---------------End Table--------------->
Here is the output I'd like to get - this is the first two rows of the table:
Date|Time|From|Duration (hh:mm:ss)
Jun 02, 2006|02:53 PM|13012684637|00:01:00
<---------------Start Table--------------->
<table width="100%" id="received_calls" border="0" cellpadding="1" cellspacing="2">
<tr class="tableheader2">
<td align=left colspan=5 class="tableheader2">Recei
</tr>
<tr>
<TD nowrap class="tableheader9" width="20%" >Date</TD>
<TD nowrap class="tableheader9" width="20%" >Time</TD>
<TD nowrap class="tableheader9" width="35%" >
<div class="iconButton"></div><
<TD nowrap class="tableheader9" width="25%" >Duration (hh:mm:ss)</TD>
</tr>
<tr class="tablebody1">
<TD nowrap >Jun 02, 2006 </TD>
<TD nowrap >03:56 PM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846
<div class="iconButton"><img src="/static/common-web/im
<div class="phoneNumber">130126
</TD>
<TD nowrap >
00:02:00</TD>
</tr>
<tr >
<TD nowrap >Jun 02, 2006 </TD>
<TD nowrap >02:53 PM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846
<div class="iconButton"><img src="/static/common-web/im
<div class="phoneNumber">130126
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
<tr class="tablebody1">
<TD nowrap >Jun 02, 2006 </TD>
<TD nowrap >12:29 PM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846
<div class="iconButton"><img src="/static/common-web/im
<div class="phoneNumber">130126
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
<tr >
<TD nowrap >Jun 02, 2006 </TD>
<TD nowrap >11:55 AM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>154087741
<div class="iconButton"><img src="/static/common-web/im
<div class="phoneNumber">154087
</TD>
<TD nowrap >
00:06:00</TD>
</tr>
<tr class="tablebody1">
<TD nowrap >Jun 02, 2006 </TD>
<TD nowrap >11:41 AM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846
<div class="iconButton"><img src="/static/common-web/im
<div class="phoneNumber">130126
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
<tr >
<TD nowrap >Jun 02, 2006 </TD>
<TD nowrap >11:41 AM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846
<div class="iconButton"><img src="/static/common-web/im
<div class="phoneNumber">130126
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
<tr class="tablebody1">
<TD nowrap >Jun 02, 2006 </TD>
<TD nowrap >11:40 AM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846
<div class="iconButton"><img src="/static/common-web/im
<div class="phoneNumber">130126
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
<tr >
<TD nowrap >Jun 01, 2006 </TD>
<TD nowrap >05:49 PM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>186668941
<div class="iconButton"><img src="/static/common-web/im
<div class="phoneNumber">186668
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
<tr class="tablebody1">
<TD nowrap >Jun 01, 2006 </TD>
<TD nowrap >04:48 PM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846
<div class="iconButton"><img src="/static/common-web/im
<div class="phoneNumber">130126
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
<tr >
<TD nowrap >Jun 01, 2006 </TD>
<TD nowrap >04:48 PM </TD>
<TD nowrap ><!-- forward call setion -->
<!-- from caller setion -->
<div class="iconButton"><a href="javascript:void(0)" onclick="return overlib('<CENTER>130126846
<div class="iconButton"><img src="/static/common-web/im
<div class="phoneNumber">130126
</TD>
<TD nowrap >
00:01:00</TD>
</tr>
</table>
<---------------End Table--------------->
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks, this works perfectly!
ASKER