troubleshooting Question

extract data from a html table

Avatar of catalini
catalini asked on
PerlPython
9 Comments1 Solution604 ViewsLast Modified:
I have some html files that have at some point a table, delimited in the code with

<tbody>.....</tbody>

I would like to extract the values in each column and each row. The rows are delimited with <tr>.... </tr>

and my pattern is the following:

<td class="img"><a href="/page/abcd/"><img src="/static/asjsjs.png" alt="abcd123" /></a></td><td class="name"><a href="/gsgs/asdatacot/">abcddd</a></td><td class="date">Mar 30, 2008</td><td>dhhfdf</td><td class="pages">104</td></tr>

from each line like the one above I need to extract:

1) the 1st href link: /page/abcd/
2) the 2nd href link: /gsgs/asdatacot/    and its name  "abcddd"
3) the date: Mar 30, 2008
4) the column after the date: dhhfdf
5) the number of "pages": 104

what is the best way to do that with perl?

thanks!!!!
ASKER CERTIFIED SOLUTION
Join our community to see this answer!
Unlock 1 Answer and 9 Comments.
Start Free Trial
Learn from the best

Network and collaborate with thousands of CTOs, CISOs, and IT Pros rooting for you and your success.

Andrew Hancock - VMware vExpert
See if this solution works for you by signing up for a 7 day free trial.
Unlock 1 Answer and 9 Comments.
Try for 7 days

”The time we save is the biggest benefit of E-E to our team. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange.

-Mike Kapnisakis, Warner Bros