Avatar of catalini
catalini

asked on 

extract data from a html table

I have some html files that have at some point a table, delimited in the code with

<tbody>.....</tbody>

I would like to extract the values in each column and each row. The rows are delimited with <tr>.... </tr>

and my pattern is the following:

<td class="img"><a href="/page/abcd/"><img src="/static/asjsjs.png" alt="abcd123" /></a></td><td class="name"><a href="/gsgs/asdatacot/">abcddd</a></td><td class="date">Mar 30, 2008</td><td>dhhfdf</td><td class="pages">104</td></tr>

from each line like the one above I need to extract:

1) the 1st href link: /page/abcd/
2) the 2nd href link: /gsgs/asdatacot/    and its name  "abcddd"
3) the date: Mar 30, 2008
4) the column after the date: dhhfdf
5) the number of "pages": 104

what is the best way to do that with perl?

thanks!!!!
PerlPython

Avatar of undefined
Last Comment
catalini

8/22/2022 - Mon