• C

how to extract data from <p class=""> in multiple html files to csv file

I have approx 2400 html pages that I need to extract the value of a <p class=" "> from.
Each page includes the customers company name, address, city, state, zip, phone, fax.
I need these to be on a single row seperated by comma. Is there a program that will do this.
If so i could really use this. Thanks for your time.

example:

<p class="NomEmpresa">Company</p>
</td>
<td valign="Top">
<p class="NomEmpresa" align="right">company number</p>
</td>
<tr>
<td colspan="2">
<p class="Adresa">add1</p>
</td>
</tr>
<tr>
<td colspan="2">
<p><span class="Adresa">csz</span></p>
</td>
</tr>
<tr>
<td colspan="2">
<p><span class="Adresa"></a></span></p>
</td>
</tr>
<tr>
<td colspan="2">
<p><span class="Adresa">Tel.: Fax: </span></p>
</td>
</tr>
<tr>
<td colspan="2">
<p><span class="Adresa">E-mail: </span></p>
christpher7Asked:
Who is Participating?
 
grg99Connect With a Mentor Commented:
Tee hee-- Perl it is.

If it HAS to be in C, it will take a few more lines.



0
 
grg99Commented:
Try this, works fine:

use strict; use warnings;

my( $Fn, $Pat, $t );  $Pat = 'class="Adresa">(.+?)<';

open( FL, "<FileList.txt" ) || die "No FileList.Txt!!\n";

while( <FL> ) {
   $Fn = $_;  open( F, "<$Fn" ) || die "No file $Fn !!!!\n"; $t = '';

while( <F> ) {  if( /$Pat/ ) {    if( $t eq '' ) {$t = $1;} else { $t = "$t,$1"; } }}

close( F );  print "$t\n";
}
close( FL );

0
 
efnCommented:
> Try this, works fine:

I believe it, but since this is the C area, you might confide what language it's written in!  (Perl, I think.)
0
 
christpher7Author Commented:
Thanks grg99.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.