• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 149
  • Last Modified:

Read one value in file

Hello,

I have a file called test.html
I want to read the file and split it into three part and then split to get the variable $correction.

The following is in the file:

---more html---

      <p> <small> </small>
      <table cellpadding=0 cellspacing=0 height="99">
        <tr>
          <td nowrap valign=top height="48" width="300"> <br>
            Wednesday, May 3, 2000 <!-- conversion result starts  -->
            <p><b><font color="#153168"> <font color="#bc0a17">1</font>
              Dutch Guilder = <font color="#bc0a17"> 0.4125</font>
              US Dollar </font></b> <!-- conversion result ends  --> <br>
              1 US Dollar (USD) = 2.4242 Dutch Guilder
              (NLG)
            <p> <small>Median price was 0.4122 / 0.4125 (bid/ask).<BR>
    Minimum price was 0.4098 / 0.4099 <BR>
    Maximum price was 0.4163 / 0.4168 <BR>
</small>
          </td>
          <td colspan="2" height="48">&nbsp;</td>
        </tr>
        <tr>
--- more html------

I now want to split the file into three parts. Above <!-- conversion result starts  -->

Between <!-- conversion result starts  -->  and <!-- conversion result ends  -->

and down <!-- conversion result ends  -->

Now I want to split the middle part:

Here I have three variables:
$test1 = "Dutch Guilder";
$test2 = "US Dollar";

I want to split this part untill I have the value (0.4125) before $test2.
The value has to be called $test3.

Can someone help me??
0
mmcw
Asked:
mmcw
  • 5
  • 2
  • 2
1 Solution
 
maneshrCommented:
try this...

if the HTML tags in you file are more complicated, then use can use the foll piece of code.

REPLACE ...
$html_text =~ s/<[^>]*>//g; ##  Remove any HTML tags

WITH...

use HTML::Parse;
use HTML::FormatText;
$html_text = HTML::FormatText->new->format(parse_html($html_text));


=====================guilder.pl
#!/usr/local/bin/perl


##  Read the ENTIRE html file in a variable.
$/="";
open(HTM,"/tmp/more.html") || die $!;
$file=<HTM>;
close(HTM);
$/="\n";

##  Remove the \n chars from that variable.
$file=~ s/\n//g;
##  Extract the relevent part.
$file=~ /(.*)<\!-- conversion result starts  -->(.*)\!-- conversion result ends
  -->(.*)/;

##  Store the extracted part in a variable.
$html_text=$2;
$html_text =~ s/<[^>]*>//g; ##  Remove any HTML tags
$html_text=~ s/\s+/ /g;     ##  Squeeze multiple white spaces to a single space
..
$html_text=~ /$Dutch Guilder\s+=\s+(\d+\.?\d+)/;  ##  Get the currency value!!

$test3=$1;
print $test3;
0
 
mmcwAuthor Commented:
The part of extracting does not work!!

##  Extract the relevent part.
$file=~ /(.*)<\!-- conversion result starts  -->(.*)\!-- conversion result ends
  -->(.*)/;

The result of this is the same as when not extracting!!

Can you help me!!
0
 
mmcwAuthor Commented:
By the way: I had to change the starting code from: $/="";
to: undef $/; to get the whole html file!!

Otherwise you get only the part till </head>
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
mmcwAuthor Commented:
The extrating works when you change the starting code. See my last comment.

Thank you.

Could you take a look to the question I asked:

http://www.experts-exchange.com/jsp/qShow.jsp?ta=perl&qid=10337436 

You told me you had the answer!!
0
 
maneshrCommented:

glad to know that undef $/; worked for you.

i have a Q for you though.

what version of PERL are you using?. What platform are you running the script on?? what webserver areyou running???

the reason i want to know this is because $/=""; works fine on my solaris box.


"Could you take a look to the question I asked: "

pl. check my comments.
0
 
mmcwAuthor Commented:
After changing the first line thescript workes!!
0
 
ozoCommented:
$/=""; #sets paragraph mode, so it will separate records at empty lines.
0
 
mmcwAuthor Commented:
Question: Sometimes the number looks like this: 0.4125

The script workes fine.

But somethimes the number looks like this: 1,926.22

The script will not work.

I have checked the script:

It will work till:
$html_text =~ /$$base_currency\s+=\s+(\d+\.?\d+)/;

The the sesult will be "";

Can zou fix it to make it work for both format of numbers?
0
 
ozoCommented:
/([\d,]+\.?\d+)/
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 5
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now