Converting data (financial)

cakester
cakester used Ask the Experts™
on
I have some market data in a CSV that I want to convert to the standard OHLC (Open High Low Close) format with a minimum timeframe of 1 minute, preferably 4 hour bars.
I have been searching around for a utility to do the job as after some extensive searching around there are a lot of other people trying to do the same thing as me.
I have tried utilities like MSConv, Meta2ASCII and Convert2MS but they don’t seem to be able to handle the HH:MM:SS timestamp. I think they expect a minimum of one minute data.

The output I need is like this:
=============================
Date,Time,Open,High,Low,Close
06/16/2010,1213,1.48001,1.48534,1.47252,1.47288
=============================
time in this example is 12.13pm

The same question has been asked in this thread and a coder has posted some code I am unfamiliar with.
See the Authors comments at the bottom..
http://www.experts-exchange.com/Database/MySQL/Q_22055783.html?sfQueryTermInfo=1+ohlc


I am looking for a solution to my problem either by some code or script that can be run locally on my PC or a recommended utility that I can buy that must do the job perfectly.
I am not a perl or C# programmer but have had some scripting experience and therefore can make logical modifications to a similar script to do the job if someone has some code like this.

Perhaps the code in the above thread can be adapted outside of a database, just to be run on my PC to process the data as a oneoff.
The tick data files can be as large as 2GB so keep this in mind........

Sample data file attached.


C
1184145144,D,GBP/USD,2010-06-13 17:01:24,1.457400,1.458100
1184145152,D,GBP/USD,2010-06-13 17:01:33,1.457300,1.457900
1184145177,D,GBP/USD,2010-06-13 17:01:40,1.457400,1.458100
1184145189,D,GBP/USD,2010-06-13 17:01:51,1.457300,1.458000
1184145204,D,GBP/USD,2010-06-13 17:01:56,1.457400,1.458100
1184145226,D,GBP/USD,2010-06-13 17:02:02,1.457200,1.457900
1184145281,D,GBP/USD,2010-06-13 17:02:26,1.457100,1.457800
1184145302,D,GBP/USD,2010-06-13 17:02:36,1.457200,1.457900
1184145333,D,GBP/USD,2010-06-13 17:02:43,1.457400,1.458100
1184145340,D,GBP/USD,2010-06-13 17:02:44,1.457200,1.457900
1184145379,D,GBP/USD,2010-06-13 17:02:53,1.457400,1.458100
1184145484,D,GBP/USD,2010-06-13 17:03:50,1.457300,1.458000
1184145490,D,GBP/USD,2010-06-13 17:03:56,1.457400,1.458100
1184145692,D,GBP/USD,2010-06-13 17:05:46,1.457200,1.457900
1184145720,D,GBP/USD,2010-06-13 17:06:03,1.457100,1.457800
1184145728,D,GBP/USD,2010-06-13 17:06:06,1.457200,1.457900
1184145735,D,GBP/USD,2010-06-13 17:06:17,1.457100,1.457800
1184145745,D,GBP/USD,2010-06-13 17:06:20,1.457200,1.457900
1184145750,D,GBP/USD,2010-06-13 17:06:23,1.457100,1.457800
1184145756,D,GBP/USD,2010-06-13 17:06:26,1.457200,1.457900
1184145762,D,GBP/USD,2010-06-13 17:06:32,1.457400,1.458100
1184145769,D,GBP/USD,2010-06-13 17:06:33,1.457200,1.457900
1184145787,D,GBP/USD,2010-06-13 17:06:37,1.457300,1.458000
1184145803,D,GBP/USD,2010-06-13 17:07:47,1.457200,1.457900
1184145861,D,GBP/USD,2010-06-13 17:09:36,1.457100,1.457800
1184145866,D,GBP/USD,2010-06-13 17:09:49,1.457200,1.457900
1184145871,D,GBP/USD,2010-06-13 17:09:49,1.457100,1.457800
1184146148,D,GBP/USD,2010-06-13 17:12:29,1.457200,1.457900
1184146155,D,GBP/USD,2010-06-13 17:12:29,1.457100,1.457800
1184146225,D,GBP/USD,2010-06-13 17:12:54,1.457200,1.457900
1184146251,D,GBP/USD,2010-06-13 17:13:01,1.457100,1.457800
1184146301,D,GBP/USD,2010-06-13 17:13:51,1.457000,1.457700
1184146319,D,GBP/USD,2010-06-13 17:13:51,1.457100,1.457800
1184146410,D,GBP/USD,2010-06-13 17:14:35,1.457200,1.457900

Open in new window

smgbpusd.zip
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
what are the last two columns in the sample data?
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close\n";
#06/16/2010,1213,1.48001,1.48534,1.47252,1.47288
my $prev;
my($open,$high,$low,$close);
while( <DATA> ){
    chomp;
    my @d=split/,/;
    my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
       $prev = $datetime;
         $open=$high=$low=$close=$d[4];
    }
    $high = $d[5] if $d[5] > $high;
    $low = $d[4] if $d[4] < $low;
    $close = $d[5];
}
print "$prev,$open,$high,$low,$close\n" if $prev;
__DATA__
1184145144,D,GBP/USD,2010-06-13 17:01:24,1.457400,1.458100
1184145152,D,GBP/USD,2010-06-13 17:01:33,1.457300,1.457900
1184145177,D,GBP/USD,2010-06-13 17:01:40,1.457400,1.458100
1184145189,D,GBP/USD,2010-06-13 17:01:51,1.457300,1.458000
1184145204,D,GBP/USD,2010-06-13 17:01:56,1.457400,1.458100
1184145226,D,GBP/USD,2010-06-13 17:02:02,1.457200,1.457900
1184145281,D,GBP/USD,2010-06-13 17:02:26,1.457100,1.457800
1184145302,D,GBP/USD,2010-06-13 17:02:36,1.457200,1.457900
1184145333,D,GBP/USD,2010-06-13 17:02:43,1.457400,1.458100
1184145340,D,GBP/USD,2010-06-13 17:02:44,1.457200,1.457900
1184145379,D,GBP/USD,2010-06-13 17:02:53,1.457400,1.458100
1184145484,D,GBP/USD,2010-06-13 17:03:50,1.457300,1.458000
1184145490,D,GBP/USD,2010-06-13 17:03:56,1.457400,1.458100
1184145692,D,GBP/USD,2010-06-13 17:05:46,1.457200,1.457900
1184145720,D,GBP/USD,2010-06-13 17:06:03,1.457100,1.457800
1184145728,D,GBP/USD,2010-06-13 17:06:06,1.457200,1.457900
1184145735,D,GBP/USD,2010-06-13 17:06:17,1.457100,1.457800
1184145745,D,GBP/USD,2010-06-13 17:06:20,1.457200,1.457900
1184145750,D,GBP/USD,2010-06-13 17:06:23,1.457100,1.457800
1184145756,D,GBP/USD,2010-06-13 17:06:26,1.457200,1.457900
1184145762,D,GBP/USD,2010-06-13 17:06:32,1.457400,1.458100
1184145769,D,GBP/USD,2010-06-13 17:06:33,1.457200,1.457900
1184145787,D,GBP/USD,2010-06-13 17:06:37,1.457300,1.458000
1184145803,D,GBP/USD,2010-06-13 17:07:47,1.457200,1.457900
1184145861,D,GBP/USD,2010-06-13 17:09:36,1.457100,1.457800
1184145866,D,GBP/USD,2010-06-13 17:09:49,1.457200,1.457900
1184145871,D,GBP/USD,2010-06-13 17:09:49,1.457100,1.457800
1184146148,D,GBP/USD,2010-06-13 17:12:29,1.457200,1.457900
1184146155,D,GBP/USD,2010-06-13 17:12:29,1.457100,1.457800
1184146225,D,GBP/USD,2010-06-13 17:12:54,1.457200,1.457900
1184146251,D,GBP/USD,2010-06-13 17:13:01,1.457100,1.457800
1184146301,D,GBP/USD,2010-06-13 17:13:51,1.457000,1.457700
1184146319,D,GBP/USD,2010-06-13 17:13:51,1.457100,1.457800
1184146410,D,GBP/USD,2010-06-13 17:14:35,1.457200,1.457900

Author

Commented:
Ozo:

The last two columns are the Bid and Ask prices. The Bid price (the lower) is what the market maker will pay you for your stock/currency etc and the Ask price (the higher) is what you pay him to buy the stock/currency etc.

The gap in the middle is called the spread, and thats where he makes his money.

I believe (though am not sure..) that the charting information is built up from taking the Mean value of the Bid &Ask price. Then you would compile all the Ticks in the Minute or 4 Hour period to make their respective OHLC bars.

Once processed I will check the bars against some reliable data to make sure I got it right, but I believe its the Mean value.

The first Column is just a serial number and can be ignored.

C
OWASP: Avoiding Hacker Tricks

Learn to build secure applications from the mindset of the hacker and avoid being exploited.

ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
my $prev;
my($open,$high,$low,$close);
while( <DATA> ){
    my @d=split/,/;
    my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];
    my $mean=($d[4]+$d[5])/2;
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close\n" if $prev;

Author

Commented:
#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close\n";
#06/16/2010,1213,1.48001,1.48534,1.47252,1.47288
my $prev;
my($open,$high,$low,$close);
while( <DATA> ){
    my @d=split/,/;
    my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];
    my $mean=($d[4]+$d[5])/2;
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close\n" if $prev;

-----------------------------------------------------

The full script as above?
How do I specify the input and output file?
And what is the purpose of the "#06/16/2010,1213,1.48001,1.48534,1.47252,1.47288" ?

C
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
while( <DATA> ){ #reads from
__DATA__
1184145144,D,GBP/USD,2010-06-13 17:01:24,1.457400,1.458100
1184145152,D,GBP/USD,2010-06-13 17:01:33,1.457300,1.457900

while( <> ){ #reads from @ARGV

#06/16/2010,1213,1.48001,1.48534,1.47252,1.47288" was just a reference so I could look at the output format you wanted

Author

Commented:

I am running activePerl on windose.. how do I input say from rawdata.txt to the <DATA> variable, then back out to say GBPUSD_OHLC1min.txt once processed?

Is this the 1 minute bar script and/or where can I change the value to 240 miunutes if I want to output 4 hour bars?

Thanks

C
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
use
while( <> ){
and run the program with < rawdata.txt > GBPUSD_OHLC1min.txt
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
for 4 hour intervals, it may be easiest to change
    my($y,$m,$d,$h,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d).*,(.*),(.*)/;
    my $datetime=sprintf"$m/$d/$y,%02d00",int($h/4)*4;
    my $mean=($b+$a)/2;

Author

Commented:
ozo:

you mean run it like this

# perl myscript.pl < rawdata.txt > GBPUSD_OHLC1min.txt

C
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
yes

Author

Commented:

Ozo,

I gave it a go and it come up with the following errors:

C:\a\m>perl myscript.pl < smgbpusd.csv > gbp1min.txt
Use of uninitialized value in sprintf at myscript.pl line 10, <> line 1.
Use of uninitialized value in sprintf at myscript.pl line 10, <> line 1.
Use of uninitialized value in sprintf at myscript.pl line 10, <> line 1.
Use of uninitialized value in sprintf at myscript.pl line 10, <> line 1.
Argument "Ask\n" isn't numeric in addition (+) at myscript.pl line 11, <> line 1
.
Argument "Bid" isn't numeric in addition (+) at myscript.pl line 11, <> line 1.
Use of uninitialized value $prev in string ne at myscript.pl line 12, <> line 1.
------------------------

I have attached the output file and the final script I used.

C

gbp1min.txt
myscript.pl.txt
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
Do you have a line in smgbpusd.csv that does not look like
1184145144,D,GBP/USD,2010-06-13 17:01:24,1.457400,1.458100

Author

Commented:


I removed the "ID,D,Currency,DateTime,Bid,Ask" line from the input file, this cleared up most of the errors but, I am still getting the following:

"Use of uninitialized value $prev in string ne at myscript.pl.txt line 12, <> line 1." ??

so is line 12 of th script the problem or line 1 of the input file?

line 1 looks just like line 2:

1184145144,D,GBP/USD,2010-06-13 17:01:24,1.457400,1.458100
1184145152,D,GBP/USD,2010-06-13 17:01:33,1.457300,1.457900

Author

Commented:
Ozo,

did you see my comment above

C
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
my $prev='';

Author

Commented:
Ozo,

Adding the "  to    my $prev=; generates lots of errors. Do you mean to encapsulate something between two "" 
?

C
#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close\n";

my $prev";
my($open,$high,$low,$close);
while( <> ){
    my @d=split/,/;
    my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];
    my $mean=($d[4]+$d[5])/2;
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close\n" if $prev;

Open in new window

ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
my $prev='';
#or
my $prev="";

Author

Commented:

my $prev="";

This does not work either:
--------------------
String found where operator expected at myscript.pl.txt line 5, near "$prev"""
        (Missing operator before ""?)
syntax error at myscript.pl.txt line 5, near "$prev"""
Execution of myscript.pl.txt aborted due to compilation errors.
----------------------------------

Author

Commented:
Ozo,

sorry my mistake. It works ok.
Thanks
I will test 4 hour now

Author

Commented:

Ozo,

Great, that seems to be ok too.
One last thing. If I want to use only the Bid or Ask price (instead of averaging the two)  to build the bars, then what do I have to modifty?

C
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
my $mean=$d[4];
#or
my $mean=$d[5];

Author

Commented:

178689660,GBP/USD,2010-06-13 17:02:44,1.457200,1.457900,"D"
178689660,D,GBP/USD,2010-06-13 17:02:44,1.457200,1.457900,D
178689660,D,GBP/USD,2010-06-13 17:02:44,1.457200,1.457900

Looking through a few years worth of the data in different files, I have found that the D field jumps around a bit.
As I am not using it, and just needs removing then I can clean the data myself.

How do I adjust the script parsing to assume the data will now be in the following format with the D removed:

178689660,GBP/USD,2010-06-13 17:02:44,1.457200,1.457900 ?

The 4 Hour:
my($y,$m,$d,$h,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d).*,(.*),(.*)/;

The 1 minute:
my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];


Thanks
C

Author

Commented:
P.S.
Sometimes the GBP/USD also appears like "GBP/USD" but in the same place, will this be a problem?

C
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
It looks like bid and ask always follows the date regardless of where D is
so
my($y,$m,$d,$h,$M,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d):(\d\d)[^,]*,([^,]+),([^,]+)/;
should work  for all cases
and for one minute intervals
  my $datetime=sprintf"$m/$d/$y,%02d%02d",$h,$M;
  my $mean=($b+$a)/2;

Author

Commented:

adjusted and tested the 4 hour. seems to work well thanks.

the 1 minute comes up with the following errors:

Global symbol "$m" requires explicit package name at myscript.pl line 9.
Global symbol "$d" requires explicit package name at myscript.pl line 9.
Global symbol "$y" requires explicit package name at myscript.pl line 9.
Global symbol "$h" requires explicit package name at myscript.pl line 9.
Global symbol "$M" requires explicit package name at myscript.pl line 9.
---------------

out of interest, how does the my $mean=($b+$a)/2; reference the sprintf line, as $b and $a dont appear in it?

C



#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close\n";
my $prev="";
my($open,$high,$low,$close);
while( <> ){
    my @d=split/,/;
     my $datetime=sprintf"$m/$d/$y,%02d%02d",$h,$M;
     my $mean=($b+$a)/2;


    # my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];
    # my $mean=($d[4]+$d[5])/2;
    # my $mean=$d[4];
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close\n" if $prev;

Open in new window

Author

Commented:
ozo, hi did you see the above?

C

Author

Commented:

OZO,

I adjusted and tested the 4 hour. seems to work well thanks.

the 1 minute comes up with the following errors:

Global symbol "$m" requires explicit package name at myscript.pl line 9.
Global symbol "$d" requires explicit package name at myscript.pl line 9.
Global symbol "$y" requires explicit package name at myscript.pl line 9.
Global symbol "$h" requires explicit package name at myscript.pl line 9.
Global symbol "$M" requires explicit package name at myscript.pl line 9.
---------------

out of interest, how does the my $mean=($b+$a)/2; reference the sprintf line, as $b and $a dont appear in it?

C

Author

Commented:

Ozo,

Thanks for your help so far, much appreciated. Can you please attend to my last comment then I can go ahead and grade the question and close.

When you made the adjustments after I explained the 'Jumping' "D" problem, the amended script you gave me for the 4 hour worked well:
my($y,$m,$d,$h,$M,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d):(\d\d)[^,]*,([^,]+),([^,]+)/;

The 1 minute script came up with some orrors i can't fix:

  my $datetime=sprintf"$m/$d/$y,%02d%02d",$h,$M;
  my $mean=($b+$a)/2;

This is the output:


Global symbol "$m" requires explicit package name at myscript.pl line 9.
Global symbol "$d" requires explicit package name at myscript.pl line 9.
Global symbol "$y" requires explicit package name at myscript.pl line 9.
Global symbol "$h" requires explicit package name at myscript.pl line 9.
Global symbol "$M" requires explicit package name at myscript.pl line 9.

Attached is the 1 minute script as it stands.



#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close\n";
my $prev="";
my($open,$high,$low,$close);
while( <> ){
    my @d=split/,/;
     my $datetime=sprintf"$m/$d/$y,%02d%02d",$h,$M;
     my $mean=($b+$a)/2;


    # my $datetime=sprintf"%s/%s/%s,%s%s",(split/\W/,$d[3])[1,2,0,3,4];
    # my $mean=($d[4]+$d[5])/2;
    # my $mean=$d[4];
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close\n" if $prev;

Open in new window

Most Valuable Expert 2014
Top Expert 2015
Commented:
    my($y,$m,$d,$h,$M,$b,$a)=/(\d{4})-(\d\d)-(\d\d) (\d\d):(\d\d)[^,]*,([^,]+),([^,]+)/;
     my $datetime=sprintf"$m/$d/$y,%02d%02d",$h,$M;
     my $mean=($b+$a)/2;
I think this will fix the problem...
#!/usr/bin/perl
use strict;
use warnings;
print "Date,Time,Open,High,Low,Close\n";
my $prev="";
my($open,$high,$low,$close);
while( <> ){
    my @d=split/,/;

    my ($y,$m,$d,$h,$M,$b,$a) = ($d[3] =~ /(\d{4})-(\d\d)-(\d\d) (\d\d):(\d\d)[^,]*,([^,]+),([^,]+)/);
    my $datetime=sprintf"$m/$d/$y,%02d%02d",$h,$M;
    my $mean=($b+$a)/2;
    if( $datetime ne $prev ){
         print "$prev,$open,$high,$low,$close\n" if $prev;
         $prev = $datetime;
         $open=$high=$low=$close=$mean;
    }
    $high = $mean if $mean > $high;
    $low = $mean if $mean < $low;
    $close = $mean;
}
print "$prev,$open,$high,$low,$close\n" if $prev;

Open in new window

Author

Commented:

Ozo,
That works great. Thanks a lot for your attention to detail and corrections throughout  this question.

wilcoxon, thanks for your last minute injection into this thread.


C

Author

Commented:
thanks again

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial