Link to home
Start Free TrialLog in
Avatar of cakester
cakester

asked on

Converting data (financial)

I have some market data in a CSV that I want to convert to the standard OHLC (Open High Low Close) format with a minimum timeframe of 1 minute, preferably 4 hour bars.
I have been searching around for a utility to do the job as after some extensive searching around there are a lot of other people trying to do the same thing as me.
I have tried utilities like MSConv, Meta2ASCII and Convert2MS but they don’t seem to be able to handle the HH:MM:SS timestamp. I think they expect a minimum of one minute data.

The output I need is like this:
=============================
Date,Time,Open,High,Low,Close
06/16/2010,1213,1.48001,1.48534,1.47252,1.47288
=============================
time in this example is 12.13pm

The same question has been asked in this thread and a coder has posted some code I am unfamiliar with.
See the Authors comments at the bottom..
https://www.experts-exchange.com/questions/22055783/Generate-OHLC-Bars-from-Tick-Data.html?sfQueryTermInfo=1+ohlc


I am looking for a solution to my problem either by some code or script that can be run locally on my PC or a recommended utility that I can buy that must do the job perfectly.
I am not a perl or C# programmer but have had some scripting experience and therefore can make logical modifications to a similar script to do the job if someone has some code like this.

Perhaps the code in the above thread can be adapted outside of a database, just to be run on my PC to process the data as a oneoff.
The tick data files can be as large as 2GB so keep this in mind........

Sample data file attached.


C
1184145144,D,GBP/USD,2010-06-13 17:01:24,1.457400,1.458100
1184145152,D,GBP/USD,2010-06-13 17:01:33,1.457300,1.457900
1184145177,D,GBP/USD,2010-06-13 17:01:40,1.457400,1.458100
1184145189,D,GBP/USD,2010-06-13 17:01:51,1.457300,1.458000
1184145204,D,GBP/USD,2010-06-13 17:01:56,1.457400,1.458100
1184145226,D,GBP/USD,2010-06-13 17:02:02,1.457200,1.457900
1184145281,D,GBP/USD,2010-06-13 17:02:26,1.457100,1.457800
1184145302,D,GBP/USD,2010-06-13 17:02:36,1.457200,1.457900
1184145333,D,GBP/USD,2010-06-13 17:02:43,1.457400,1.458100
1184145340,D,GBP/USD,2010-06-13 17:02:44,1.457200,1.457900
1184145379,D,GBP/USD,2010-06-13 17:02:53,1.457400,1.458100
1184145484,D,GBP/USD,2010-06-13 17:03:50,1.457300,1.458000
1184145490,D,GBP/USD,2010-06-13 17:03:56,1.457400,1.458100
1184145692,D,GBP/USD,2010-06-13 17:05:46,1.457200,1.457900
1184145720,D,GBP/USD,2010-06-13 17:06:03,1.457100,1.457800
1184145728,D,GBP/USD,2010-06-13 17:06:06,1.457200,1.457900
1184145735,D,GBP/USD,2010-06-13 17:06:17,1.457100,1.457800
1184145745,D,GBP/USD,2010-06-13 17:06:20,1.457200,1.457900
1184145750,D,GBP/USD,2010-06-13 17:06:23,1.457100,1.457800
1184145756,D,GBP/USD,2010-06-13 17:06:26,1.457200,1.457900
1184145762,D,GBP/USD,2010-06-13 17:06:32,1.457400,1.458100
1184145769,D,GBP/USD,2010-06-13 17:06:33,1.457200,1.457900
1184145787,D,GBP/USD,2010-06-13 17:06:37,1.457300,1.458000
1184145803,D,GBP/USD,2010-06-13 17:07:47,1.457200,1.457900
1184145861,D,GBP/USD,2010-06-13 17:09:36,1.457100,1.457800
1184145866,D,GBP/USD,2010-06-13 17:09:49,1.457200,1.457900
1184145871,D,GBP/USD,2010-06-13 17:09:49,1.457100,1.457800
1184146148,D,GBP/USD,2010-06-13 17:12:29,1.457200,1.457900
1184146155,D,GBP/USD,2010-06-13 17:12:29,1.457100,1.457800
1184146225,D,GBP/USD,2010-06-13 17:12:54,1.457200,1.457900
1184146251,D,GBP/USD,2010-06-13 17:13:01,1.457100,1.457800
1184146301,D,GBP/USD,2010-06-13 17:13:51,1.457000,1.457700
1184146319,D,GBP/USD,2010-06-13 17:13:51,1.457100,1.457800
1184146410,D,GBP/USD,2010-06-13 17:14:35,1.457200,1.457900

Open in new window

smgbpusd.zip
Avatar of ozo
ozo
Flag of United States of America image

what are the last two columns in the sample data?
#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close\n";
#06/16/2010,1213,1.48001,1.48534,1.47252,1.47288
my $prev;
my($open,$high,$low,$close);
while( <DATA> ){
    chomp;
    my @d=split/,/;
    my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
       $prev = $datetime;
         $open=$high=$low=$close=$d[4];
    }
    $high = $d[5] if $d[5] > $high;
    $low = $d[4] if $d[4] < $low;
    $close = $d[5];
}
print "$prev,$open,$high,$low,$close\n" if $prev;
__DATA__
1184145144,D,GBP/USD,2010-06-13 17:01:24,1.457400,1.458100
1184145152,D,GBP/USD,2010-06-13 17:01:33,1.457300,1.457900
1184145177,D,GBP/USD,2010-06-13 17:01:40,1.457400,1.458100
1184145189,D,GBP/USD,2010-06-13 17:01:51,1.457300,1.458000
1184145204,D,GBP/USD,2010-06-13 17:01:56,1.457400,1.458100
1184145226,D,GBP/USD,2010-06-13 17:02:02,1.457200,1.457900
1184145281,D,GBP/USD,2010-06-13 17:02:26,1.457100,1.457800
1184145302,D,GBP/USD,2010-06-13 17:02:36,1.457200,1.457900
1184145333,D,GBP/USD,2010-06-13 17:02:43,1.457400,1.458100
1184145340,D,GBP/USD,2010-06-13 17:02:44,1.457200,1.457900
1184145379,D,GBP/USD,2010-06-13 17:02:53,1.457400,1.458100
1184145484,D,GBP/USD,2010-06-13 17:03:50,1.457300,1.458000
1184145490,D,GBP/USD,2010-06-13 17:03:56,1.457400,1.458100
1184145692,D,GBP/USD,2010-06-13 17:05:46,1.457200,1.457900
1184145720,D,GBP/USD,2010-06-13 17:06:03,1.457100,1.457800
1184145728,D,GBP/USD,2010-06-13 17:06:06,1.457200,1.457900
1184145735,D,GBP/USD,2010-06-13 17:06:17,1.457100,1.457800
1184145745,D,GBP/USD,2010-06-13 17:06:20,1.457200,1.457900
1184145750,D,GBP/USD,2010-06-13 17:06:23,1.457100,1.457800
1184145756,D,GBP/USD,2010-06-13 17:06:26,1.457200,1.457900
1184145762,D,GBP/USD,2010-06-13 17:06:32,1.457400,1.458100
1184145769,D,GBP/USD,2010-06-13 17:06:33,1.457200,1.457900
1184145787,D,GBP/USD,2010-06-13 17:06:37,1.457300,1.458000
1184145803,D,GBP/USD,2010-06-13 17:07:47,1.457200,1.457900
1184145861,D,GBP/USD,2010-06-13 17:09:36,1.457100,1.457800
1184145866,D,GBP/USD,2010-06-13 17:09:49,1.457200,1.457900
1184145871,D,GBP/USD,2010-06-13 17:09:49,1.457100,1.457800
1184146148,D,GBP/USD,2010-06-13 17:12:29,1.457200,1.457900
1184146155,D,GBP/USD,2010-06-13 17:12:29,1.457100,1.457800
1184146225,D,GBP/USD,2010-06-13 17:12:54,1.457200,1.457900
1184146251,D,GBP/USD,2010-06-13 17:13:01,1.457100,1.457800
1184146301,D,GBP/USD,2010-06-13 17:13:51,1.457000,1.457700
1184146319,D,GBP/USD,2010-06-13 17:13:51,1.457100,1.457800
1184146410,D,GBP/USD,2010-06-13 17:14:35,1.457200,1.457900
Avatar of cakester
cakester

ASKER

Ozo:

The last two columns are the Bid and Ask prices. The Bid price (the lower) is what the market maker will pay you for your stock/currency etc and the Ask price (the higher) is what you pay him to buy the stock/currency etc.

The gap in the middle is called the spread, and thats where he makes his money.

I believe (though am not sure..) that the charting information is built up from taking the Mean value of the Bid &Ask price. Then you would compile all the Ticks in the Minute or 4 Hour period to make their respective OHLC bars.

Once processed I will check the bars against some reliable data to make sure I got it right, but I believe its the Mean value.

The first Column is just a serial number and can be ignored.

C
my $prev;
my($open,$high,$low,$close);
while( <DATA> ){
    my @d=split/,/;
    my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];
    my $mean=($d[4]+$d[5])/2;
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close\n" if $prev;
#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close\n";
#06/16/2010,1213,1.48001,1.48534,1.47252,1.47288
my $prev;
my($open,$high,$low,$close);
while( <DATA> ){
    my @d=split/,/;
    my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];
    my $mean=($d[4]+$d[5])/2;
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close\n" if $prev;

-----------------------------------------------------

The full script as above?
How do I specify the input and output file?
And what is the purpose of the "#06/16/2010,1213,1.48001,1.48534,1.47252,1.47288" ?

C
while( <DATA> ){ #reads from
__DATA__
1184145144,D,GBP/USD,2010-06-13 17:01:24,1.457400,1.458100
1184145152,D,GBP/USD,2010-06-13 17:01:33,1.457300,1.457900

while( <> ){ #reads from @ARGV

#06/16/2010,1213,1.48001,1.48534,1.47252,1.47288" was just a reference so I could look at the output format you wanted

I am running activePerl on windose.. how do I input say from rawdata.txt to the <DATA> variable, then back out to say GBPUSD_OHLC1min.txt once processed?

Is this the 1 minute bar script and/or where can I change the value to 240 miunutes if I want to output 4 hour bars?

Thanks

C
use
while( <> ){
and run the program with < rawdata.txt > GBPUSD_OHLC1min.txt
for 4 hour intervals, it may be easiest to change
    my($y,$m,$d,$h,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d).*,(.*),(.*)/;
    my $datetime=sprintf"$m/$d/$y,%02d00",int($h/4)*4;
    my $mean=($b+$a)/2;
ozo:

you mean run it like this

# perl myscript.pl < rawdata.txt > GBPUSD_OHLC1min.txt

C
yes

Ozo,

I gave it a go and it come up with the following errors:

C:\a\m>perl myscript.pl < smgbpusd.csv > gbp1min.txt
Use of uninitialized value in sprintf at myscript.pl line 10, <> line 1.
Use of uninitialized value in sprintf at myscript.pl line 10, <> line 1.
Use of uninitialized value in sprintf at myscript.pl line 10, <> line 1.
Use of uninitialized value in sprintf at myscript.pl line 10, <> line 1.
Argument "Ask\n" isn't numeric in addition (+) at myscript.pl line 11, <> line 1
.
Argument "Bid" isn't numeric in addition (+) at myscript.pl line 11, <> line 1.
Use of uninitialized value $prev in string ne at myscript.pl line 12, <> line 1.
------------------------

I have attached the output file and the final script I used.

C

gbp1min.txt
myscript.pl.txt
Do you have a line in smgbpusd.csv that does not look like
1184145144,D,GBP/USD,2010-06-13 17:01:24,1.457400,1.458100


I removed the "ID,D,Currency,DateTime,Bid,Ask" line from the input file, this cleared up most of the errors but, I am still getting the following:

"Use of uninitialized value $prev in string ne at myscript.pl.txt line 12, <> line 1." ??

so is line 12 of th script the problem or line 1 of the input file?

line 1 looks just like line 2:

1184145144,D,GBP/USD,2010-06-13 17:01:24,1.457400,1.458100
1184145152,D,GBP/USD,2010-06-13 17:01:33,1.457300,1.457900
Ozo,

did you see my comment above

C
my $prev='';
Ozo,

Adding the "  to    my $prev=; generates lots of errors. Do you mean to encapsulate something between two "" 
?

C
#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close\n";

my $prev";
my($open,$high,$low,$close);
while( <> ){
    my @d=split/,/;
    my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];
    my $mean=($d[4]+$d[5])/2;
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close\n" if $prev;

Open in new window

my $prev='';
#or
my $prev="";

my $prev="";

This does not work either:
--------------------
String found where operator expected at myscript.pl.txt line 5, near "$prev"""
        (Missing operator before ""?)
syntax error at myscript.pl.txt line 5, near "$prev"""
Execution of myscript.pl.txt aborted due to compilation errors.
----------------------------------
Ozo,

sorry my mistake. It works ok.
Thanks
I will test 4 hour now

Ozo,

Great, that seems to be ok too.
One last thing. If I want to use only the Bid or Ask price (instead of averaging the two)  to build the bars, then what do I have to modifty?

C
my $mean=$d[4];
#or
my $mean=$d[5];

178689660,GBP/USD,2010-06-13 17:02:44,1.457200,1.457900,"D"
178689660,D,GBP/USD,2010-06-13 17:02:44,1.457200,1.457900,D
178689660,D,GBP/USD,2010-06-13 17:02:44,1.457200,1.457900

Looking through a few years worth of the data in different files, I have found that the D field jumps around a bit.
As I am not using it, and just needs removing then I can clean the data myself.

How do I adjust the script parsing to assume the data will now be in the following format with the D removed:

178689660,GBP/USD,2010-06-13 17:02:44,1.457200,1.457900 ?

The 4 Hour:
my($y,$m,$d,$h,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d).*,(.*),(.*)/;

The 1 minute:
my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];


Thanks
C

P.S.
Sometimes the GBP/USD also appears like "GBP/USD" but in the same place, will this be a problem?

C
It looks like bid and ask always follows the date regardless of where D is
so
my($y,$m,$d,$h,$M,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d):(\d\d)[^,]*,([^,]+),([^,]+)/;
should work  for all cases
and for one minute intervals
  my $datetime=sprintf"$m/$d/$y,%02d%02d",$h,$M;
  my $mean=($b+$a)/2;

adjusted and tested the 4 hour. seems to work well thanks.

the 1 minute comes up with the following errors:

Global symbol "$m" requires explicit package name at myscript.pl line 9.
Global symbol "$d" requires explicit package name at myscript.pl line 9.
Global symbol "$y" requires explicit package name at myscript.pl line 9.
Global symbol "$h" requires explicit package name at myscript.pl line 9.
Global symbol "$M" requires explicit package name at myscript.pl line 9.
---------------

out of interest, how does the my $mean=($b+$a)/2; reference the sprintf line, as $b and $a dont appear in it?

C



#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close\n";
my $prev="";
my($open,$high,$low,$close);
while( <> ){
    my @d=split/,/;
     my $datetime=sprintf"$m/$d/$y,%02d%02d",$h,$M;
     my $mean=($b+$a)/2;


    # my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];
    # my $mean=($d[4]+$d[5])/2;
    # my $mean=$d[4];
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close\n" if $prev;

Open in new window

ozo, hi did you see the above?

C

OZO,

I adjusted and tested the 4 hour. seems to work well thanks.

the 1 minute comes up with the following errors:

Global symbol "$m" requires explicit package name at myscript.pl line 9.
Global symbol "$d" requires explicit package name at myscript.pl line 9.
Global symbol "$y" requires explicit package name at myscript.pl line 9.
Global symbol "$h" requires explicit package name at myscript.pl line 9.
Global symbol "$M" requires explicit package name at myscript.pl line 9.
---------------

out of interest, how does the my $mean=($b+$a)/2; reference the sprintf line, as $b and $a dont appear in it?

C


Ozo,

Thanks for your help so far, much appreciated. Can you please attend to my last comment then I can go ahead and grade the question and close.

When you made the adjustments after I explained the 'Jumping' "D" problem, the amended script you gave me for the 4 hour worked well:
my($y,$m,$d,$h,$M,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d):(\d\d)[^,]*,([^,]+),([^,]+)/;

The 1 minute script came up with some orrors i can't fix:

  my $datetime=sprintf"$m/$d/$y,%02d%02d",$h,$M;
  my $mean=($b+$a)/2;

This is the output:


Global symbol "$m" requires explicit package name at myscript.pl line 9.
Global symbol "$d" requires explicit package name at myscript.pl line 9.
Global symbol "$y" requires explicit package name at myscript.pl line 9.
Global symbol "$h" requires explicit package name at myscript.pl line 9.
Global symbol "$M" requires explicit package name at myscript.pl line 9.

Attached is the 1 minute script as it stands.



#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close\n";
my $prev="";
my($open,$high,$low,$close);
while( <> ){
    my @d=split/,/;
     my $datetime=sprintf"$m/$d/$y,%02d%02d",$h,$M;
     my $mean=($b+$a)/2;


    # my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];
    # my $mean=($d[4]+$d[5])/2;
    # my $mean=$d[4];
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close\n" if $prev;

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I think this will fix the problem...
#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close\n";
my $prev="";
my($open,$high,$low,$close);
while( <> ){
    my @d=split/,/;

    my ($y,$m,$d,$h,$M,$b,$a) = ($d[3] =~ /(\d{4})-(\d\d)-(\d\d) (\d\d):(\d\d)[^,]*,([^,]+),([^,]+)/);
    my $datetime=sprintf"$m/$d/$y,%02d%02d",$h,$M;
    my $mean=($b+$a)/2;
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close\n" if $prev;

Open in new window


Ozo,
That works great. Thanks a lot for your attention to detail and corrections throughout  this question.

wilcoxon, thanks for your last minute injection into this thread.


C
thanks again