modify financial data script


Please take a Good look at the original question kindly answered by expert Ozo.
http://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_26302816.html

I need to modify the script so that the date format of the output file is changed to "20031013" instead of say "10/13/2003", time "20:00" instead of "2000" and also simply append a blank 0 field to represend nil volume.

Desired output:
Date,time,open,high,low,close,volume
20060316,20:00,1.7572,1.7582,1.7561,1.7568,0

Existing output:
Date,time,open,high,low,close
10/13/2003,2000,1.66225,1.66235,1.66175,1.66175

I also need to change the script so that I can easily adjust the bar output time. At the moment I have two scripts, one outputs in 1 minute bars and the other outputs in 4 hour bars. I need some kind of variable I can easily adjust. Say "60" minutes for 1 hour bars and "360" for the 4 hour bars.
You can chose either of these script to modify, which every one is easier for you.
Included below, The first script is the 1 minute bar output:
#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close\n";
my $prev="";
my($open,$high,$low,$close);
while( <> ){
    my @d=split/,/;
   
   my($y,$m,$d,$h,$M,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d):(\d\d)[^,]*,([^,]+),([^,]+)/;
     my $datetime=sprintf"$m/$d/$y,%02d%02d",$h,$M;
     my $mean=($b+$a)/2;

    # my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];
    # my $mean=($d[4]+$d[5])/2;
    # my $mean=$d[4];
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close\n" if $prev;

Open in new window

#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close\n";

my $prev="";
my($open,$high,$low,$close);
while( <> ){

    my($y,$m,$d,$h,$M,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d):(\d\d)[^,]*,([^,]+),([^,]+)/;

    #my($y,$m,$d,$h,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d).*,(.*),(.*)/;
    #my $mean=($b+$a)/2;
    my $datetime=sprintf"$m/$d/$y,%02d00",int($h/4)*4;
    my $mean=$b;


    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close\n" if $prev;

Open in new window

cakesterAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

mwochnickCommented:
change line 11 of the first files as follows to get the reformatted date/time
my $datetime=sprintf"$y$m$d,%02d:%02d",$h,$M;

Open in new window

mwochnickCommented:
I don't see a variable for volume but to append a zero for the volume change lines 18 and 26 of the first file as follows
print "$prev,$open,$high,$low,$close,0\n" if $prev;

Open in new window

mwochnickCommented:
for the second file try this - it uses a variable barsize
#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close,Volume\n";

my $prev="";
my($open,$high,$low,$close);
# change barsize is the hours of the bar - note cannot be 0
my $barsize=4;

while( <> ){

    my($y,$m,$d,$h,$M,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d):(\d\d)[^,]*,([^,]+),([^,]+)/;

    #my($y,$m,$d,$h,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d).*,(.*),(.*)/;
    #my $mean=($b+$a)/2;
    my $datetime=sprintf"$y$m$d,%02d00",int($h/$barsize)*$barsize;
    my $mean=$b;


    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close,0\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close,0\n" if $prev;

Open in new window

Learn SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

cakesterAuthor Commented:

mwochnick,

Thanks. I need to vary the bar size in minutes, anything down to 1 minute as I often use 5,15 and 30 min bars.
I doubt its as simple as this?

my $datetime=sprintf"$y$m$d,%02d:%02d",int($m/$barsize)*$barsize;

note i added in the time change format?

C
cakesterAuthor Commented:

Also when I change line 17 to
my $datetime=sprintf"$y$m$d,%02d:%02d",int($m/$barsize)*$barsize;

I get the following message:

Use of uninitialized value in sprintf at myscript.pl line 17, <> line 1.
" " line  2
" " line  3
etc

Do I need to encase the semicolon in something?

C
cakesterAuthor Commented:
ok I fixed the time format with the following line:
  my $datetime=sprintf"$y$m$d,%02d:00",int($h/$barsize)*$barsize;

Just need the bar size adjustable in mnutes.

C
mwochnickCommented:
you were close on the minutes, to do the bar in minutes add a barsize variable in the first program the same way I did in the second and then make the line as change the sprinf line as follows
my $datetime=sprintf"$y$m$d,%02d:%02d",$h,int($M/$barsize)*$barsize;

Open in new window

note that this will work up to 60 minutes - you'll have to use the 2nd program for over an hour or have more logic if you want to combine the two
cakesterAuthor Commented:
mwochnick,



I need be able to adjust the the bar size in minutes as I have to produce bars of say 90 minutes and 300 minutes and other unusual configurations. I am not bothered if one of the scripts is disused. I only put both of them up in the question on the off chance that one was easier to modify than the other.

Can you tell me how to modify either of the scripts to to produce bars of any minute combination? I thought perhaps the one minute script was a good choice as it already outputs 1 minute bars?

C
mwochnickCommented:
Essentially what you need to do is in the first program
1. convert the hours to minutes - stick in a temp variable
2  add the minutes to the temp variable
3. utilize the barsize technique of any minute size against the total number of minutes
this will give the varying barsize in minutes
you will need to do some logic around the output if you want hours and minutes in the output
I need a little time to work on the code
cakesterAuthor Commented:
mwochnick,

Thanks for your help. I believe the bit at the end with the mean, high, low and >  < signs is the barsize technique. I was wondering how he preserves the open and close price of the bar as this is important.

Sorry i am not a perl person but, is there a function in perl that can look at a range of time say from 10:00 to 11:30 and say that is 90 minute, then do that in reverse so that the script will know which ticks to include in any given bar?

C
mwochnickCommented:
try this  -- note I did not test it so it could have mistakes - and let me know how it goes
#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close,Volume\n";

my $barsize=90;
my $prev="";
my($open,$high,$low,$close);
while( <> ){
    my @d=split/,/;
   
   my($y,$m,$d,$h,$M,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d):(\d\d)[^,]*,([^,]+),([^,]+)/;
   
   my $datetime="";; 
   if( $barsize > 59) {
     my $tempminutes = $h*60 + $M;
     $datetime=sprintf"$y$m$d,%02d:00",int($tempminutes/$barsize)*$barsize;
   }
   else {
     $datetime=sprintf"$y$m$d,%02d:%02d",$h,int($M/$barsize)*$barsize;
   }
   
   
   my $mean=($b+$a)/2;

    # my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];
    # my $mean=($d[4]+$d[5])/2;
    # my $mean=$d[4];
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close,0\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close,0\n" if $prev;

Open in new window

cakesterAuthor Commented:

I gave it a go and run a test using the small sample csv file attached. I set minutes to 240 to match the 4H script I have to do a like for like test.
The time format has some problems along with the data
 output4H.txt newoutput.txt
sample.csv
mwochnickCommented:
the output values differ between the two because they are calculated differently

in the minute based program - the first listing the out put is based on an average being calculated based on the values in the last two columns of the input file

in the hour based program the output sets the average based on the second to last column of values in the input file

which is correct?
mwochnickCommented:
here's some updated code - it currently is using the average algorithm from the hours program - to change you just need to follow the instructions at line 27
#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close,Volume\n";

# set the barzize in minutes
my $interval=240;
my $prev="";
my($open,$high,$low,$close);
while( <> ){
   my @d=split/,/;
   
   my($y,$m,$d,$h,$M,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d):(\d\d)[^,]*,([^,]+),([^,]+)/;

   # format the data and time for output
   my $datetime="";
   if( $interval > 59) {
     # interval of an hour or greater
     my $tempminutes = $h*60 + $M;
	 my $intervalHRS = int(int($tempminutes/$interval)*$interval/60);
	 my $intervalMIN = (int($tempminutes/$interval)*$interval) % 60;
     $datetime=sprintf"$y$m$d,%02d:%02d",$intervalHRS,$intervalMIN;
   }
   else {
     # inteval of less than an hour
     $datetime=sprintf"$y$m$d,%02d:%02d",$h,int($M/$interval)*$interval;
   }
   
   # code from minutes program to use 
   # uncomment line 30 - delete the # character at the beginning of the line
   # and comment the 33 - add a # at the beginning of the line
   #my $mean=($b+$a)/2;
   #or
   #code from hours
   my $mean=$b;

   if( $datetime ne $prev ){
     print "$prev,$open,$high,$low,$close,0\n" if $prev;
     $prev = $datetime;
     $open=$high=$low=$close=$mean;
   }
   $high = $mean if $mean > $high;
   $low = $mean if $mean < $low;
   $close = $mean;
}
print "$prev,$open,$high,$low,$close,0\n" if $prev;

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
mwochnickCommented:
oops - instruction are at line 29
cakesterAuthor Commented:

In my test just above (with the three files posted) I only compaired the old 4 hour script against your new 'minute adjustable' script to give you a like for like against what the data output looks like. I never included anything to do with the other old 1 minute output script, thats completely left out and probably with not use again anyhow.
I.E the code you posted in ID:35330169 is what made newoutput.txt and the trusty old 4 hour script I have output to the output4H.txt file. Both scripts used the sample.csv file.
As you can see in the newoutput.txt file, there are some problems with the time and price data complaired with the 4 old 4 hour output.

The "  my $mean=$b;" is fine and I dont need the a+b/2 thing anymore.

I will try your new code you just posted and get back to you soon. Thanks...

C
cakesterAuthor Commented:
mwochnick,

the newest changes you posted are looking good on the 4 hour data. I will test it on others the weekend. Thanks so much for your help so far.

C
cakesterAuthor Commented:


Yes the script worked well. Thanks.
I was wondering in the line:

my($y,$m,$d,$h,$M,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d):(\d\d)[^,]*,([^,]+),([^,]+)/;

How does the script know to skip the first two fields I.E the code number then the gbp\usd field and go straight on and split up the date time, and price field?

I may come across a situation in the future where I need to parse the data in a different way.
e.g

date, time, X, X, X, ASK, BID, X, X

where the date and time wont always be next to the two price values. But the file will still always be a csv
mwochnickCommented:
You should really ask that last question as a new question - per EE guidelines
add you input file to it
add the two lines below to the question and you'll have an answer quickly
   my @d=split/,/;
   my($y,$m,$d,$h,$M,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d):(\d\d)[^,]*,([^,]+),([^,]+)/;
cakesterAuthor Commented:

Ok mwochnick I will do that. Please look out for my new question.
I have another question relating to updating one text file with another, taking the latest information and appending it to the other file so look out for that one too :)

Thanks again for you help on this problem

C
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.