I have some html files that have at some point a table, delimited in the code with
<tbody>.....</tbody>
I would like to extract the values in each column and each row. The rows are delimited with <tr>.... </tr>
and my pattern is the following:
<td class="img"><a href="/page/abcd/"><img src="/static/asjsjs.png" alt="abcd123" /></a></td><td class="name"><a href="/gsgs/asdatacot/">abcddd</a></td><td class="date">Mar 30, 2008</td><td>dhhfdf</td><td class="pages">104</td></tr>
from each line like the one above I need to extract:
1) the 1st href link: /page/abcd/
2) the 2nd href link: /gsgs/asdatacot/ and its name "abcddd"
3) the date: Mar 30, 2008
4) the column after the date: dhhfdf
5) the number of "pages": 104
what is the best way to do that with perl?
thanks!!!!