We help IT Professionals succeed at work.
Get Started

extract data from a html table

catalini asked
Last Modified: 2012-05-05
I have some html files that have at some point a table, delimited in the code with


I would like to extract the values in each column and each row. The rows are delimited with <tr>.... </tr>

and my pattern is the following:

<td class="img"><a href="/page/abcd/"><img src="/static/asjsjs.png" alt="abcd123" /></a></td><td class="name"><a href="/gsgs/asdatacot/">abcddd</a></td><td class="date">Mar 30, 2008</td><td>dhhfdf</td><td class="pages">104</td></tr>

from each line like the one above I need to extract:

1) the 1st href link: /page/abcd/
2) the 2nd href link: /gsgs/asdatacot/    and its name  "abcddd"
3) the date: Mar 30, 2008
4) the column after the date: dhhfdf
5) the number of "pages": 104

what is the best way to do that with perl?

Watch Question
Top Expert 2009
This problem has been solved!
Unlock 1 Answer and 9 Comments.
See Answer
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE