Link to home
Start Free TrialLog in
Avatar of kavlins
kavlinsFlag for United States of America

asked on

Perl script to rip texts from html file

Request is same as my old resolved EE question https://www.experts-exchange.com/questions/24537501/I-need-a-PERL-script-to-eliminate-certain-fields-from-webpage-then-add-Total-bytes-and-save-result-to-a-file.html .
This was working great until recently, Problem occured after updating the application with latest version. I think something has changed within the html page ie tables or some. I have attached file below to show you what results i used to get and the current ones.

Problems facing now:
    1) Endpoints missing
    2) Bytes calculations wrong


error-received-when-running-scri.txt
NTF.txt
Data-before-updates.htm
Results-before-updates.xls
Data-after-updates.htm
Results-after-updates.xls
Avatar of FishMonger
FishMonger
Flag of United States of America image

This is a prime example of why using a series of regex's to parse html is very fragile and in almost all cases is the wrong approach.

You should look at using a module designed for parsing html, such as HTML::Parser.
http://search.cpan.org/~gaas/HTML-Parser-3.64/Parser.pm

Other module choices can be found here.
http://search.cpan.org/modlist/World_Wide_Web/HTML
Avatar of kavlins

ASKER

i am a novice in Perl, so will take time to grasp those...
ASKER CERTIFIED SOLUTION
Avatar of kavlins
kavlins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial