Solved

Generate OHLC Bars from Tick Data

Posted on 2006-11-09
8
3,409 Views
Last Modified: 2008-01-09
I have a database full of tick data from stocks.  Tick data is basically a timestamped record of all transactions that take place on a certain stock.

Here's a sample from one of the tables:

+-------+------------+---------+------+
| ID    | Timestamp  | Price   | Size |
+-------+------------+---------+------+
| 40001 | 1157722663 |    1308 |  100 |
| 40002 | 1157722663 |    1308 |    2 |
| 40003 | 1157722665 |    1308 |   11 |
| 40004 | 1157722666 | 1307.75 |    1 |
| 40005 | 1157722666 |    1308 |    7 |
| 40006 | 1157722667 | 1308.25 |    1 |
| 40007 | 1157722667 | 1308.25 |    6 |
| 40008 | 1157722668 |    1308 |   10 |
| 40009 | 1157722668 |    1308 |    7 |
| 40010 | 1157722668 |    1308 |    7 |
+-------+------------+---------+------+

What I am trying to do is generate a condensed form of this data known as OHLC bars.  This should be familiar for anyone who has looked at stock charts, but I'll explain it for those who haven't.  If I were generating 1 minute bars, I would take the first price from that minute and call it Open, the last price would be the Close, and the High and Low are the max and min for that range.  There is also a fifth parameter, Volume, that is simply the sum of all of the sizes in that range.  I have written a query that partially works but could not optimize it:

SELECT
 FLOOR(a.Timestamp / 60) * 60 AS Timestamp,
 a.Price AS Open,
 MAX(a.Price) AS High,
 MIN(a.Price) AS Low,
 (SELECT b.Price FROM AAPL_ticks b WHERE FLOOR(b.Timestamp / 60) * 60 = FLOOR(a.Timestamp / 60) * 60 LIMIT 1) AS Close,
 SUM(a.Size) AS Volume FROM AAPL_ticks a GROUP BY FLOOR(a.Timestamp / 60)

This query groups it in 60 second increments.  When I run this query, I get the correct output, but it takes forever (somewhere in the range of 1000 seconds for 390 bars).  I have narrowed down the problem to the subquery that calculates the Close. When I take this statement out the query processes thousands of bars in a few seconds.  I've tried writing a stored procedure for this but failed at that as well.  I've been working on this for a while and have finally given up, so any assistance would be very helpful.  Thanks in advance!

- Paccc
0
Comment
Question by:Paccc
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
8 Comments
 
LVL 35

Expert Comment

by:Raynard7
ID: 17912243
Hi,
I have built these kind of things before - but normally I would do them in a different way

I am not sure why you are flooring your values and dividing them by 60 then multiplying by 60.  I would just do a modular division by the start time and subtract the remainder from the value - this way you can set any increment.

Normally with these queries you would build them with a start and end time - ie select * from table where time between 123032475 and 29387502 for example.

As you have the start time you would do something like

select b.price from aapl_tics b where (b.timestamp - @startTime mod 60) = 0

for everything that is 1 minute from the start time.

The processor intensive bits here seem to be all the divisions and multiplications - that are repeated on both sides of the equation, and also in the top select

The more divisions that you can remove the easier it is.  Also division of large numbers is much slower than the division of small numbers (less processes) so if you can put a - in to make the numbers smaller you will find it should run much faster.
0
 
LVL 35

Expert Comment

by:Raynard7
ID: 17912255
by my comment

>>for everything that is 1 minute from the start time.

I meant everything that is at 1 minute intervals from the start time
0
 
LVL 35

Expert Comment

by:Raynard7
ID: 17912267
Additionally - since timestamp is a datatype in mysql you would be best not using it - it makes things hard to read and could potentially confuse the sql engine.
0
Get Database Help Now w/ Support & Database Audit

Keeping your database environment tuned, optimized and high-performance is key to achieving business goals. If your database goes down, so does your business. Percona experts have a long history of helping enterprises ensure their databases are running smoothly.

 

Author Comment

by:Paccc
ID: 17912359
I'm not sure I understand your suggestion fully.  Firstly, I like the idea of using modulus, I don't know why I didn't think of the originally, and changing the field name would be a good idea as well, thanks for that tip.  Anyway, I'm selecting all available data, not a range of date/times.  Second, I don't understand how I can get the close price from the SQL statement you have.  I tried making changes to my query and it still runs rediculously slow.  This is my new query below:

SELECT
 a.Timestamp - (a.Timestamp mod 60) AS Timestamp,
 a.Price AS Open,
 MAX(a.Price) AS High,
 MIN(a.Price) AS Low,
 (SELECT b.Price FROM AAPL_ticks b WHERE b.Timestamp - (a.Timestamp mod 60) = 0 LIMIT 1) AS Close,
 SUM(a.Size) AS Volume FROM AAPL_ticks a GROUP BY a.Timestamp - (a.Timestamp mod 60)

Thanks.

Paccc
0
 
LVL 35

Expert Comment

by:Raynard7
ID: 17912385
hi,

I guess I am a little confused as this is only half a query

but if a.timestamp is the start time then you want to subtract it from the later time then modulo divide by 60

ie

(b.timestamp - a.timestamp) mod 60 = 0

this is because each 60 seconds the difference between the values will be a multiple of 60 - and exact multiples of 60 mod 60 = 0 - so this way you know that there is one each 60 seconds.

If you are doing (a.timestamp mod 60) you would always get the same result - and also as you are consistantly dividing a very large number it probably will be time consuming
0
 

Author Comment

by:Paccc
ID: 17912428
The close price is the last transaction in that range, it may not be a multiple of 60, for instance, the trade that occurs at 51 seconds after the minute might be the last trade for that minute and would thus be the Close price.  I still do not see how your statement would work.  Thanks for the help.

Paccc
0
 
LVL 35

Accepted Solution

by:
Raynard7 earned 250 total points
ID: 17912465
Ok, sorry - I assumed that you were taking a dump of the last settlement at each second

Well - unfortunatley you are probably never going to get a query that has to check this for each minute for each row to run quickly on any database.

I would suggest that you would be best stinking with the stored procedue idea.  

Create a procedure that goes through the recordset of your table ordered by id and then the timestamp value.
You would need to pass this the start and end time and the interval.

firstly create a temporary table to store the values for the summary for any point of time.
then create a value that holds the time that you are currently up to (@presentTime)
create another variables that holds the information for the previous row's data - time, id and price
you would create a value to see what time you were up to
then run a query to return all the value between the start time and end time ordered by id then timestamp value
You would then set @presentTime with the begin time, and set the data values equal to the first row.
the process would then be to
loop through the records - each row storing the data for the row into set variables and storing the previous data from those into a different set of variables, each time asking, is the timestamp for the current row - @presentTime > interval
if it is then you insert into your temporary table the values from the previous row - add Interval to your @present time and loop
if it is not then you just loop

at the end of this process you would have a temporary table full of the most recent rows for each minute if applicable.

this way you only need to cycle through the table once - and the mathematics is limited - with a query using floor and max and min all the time would prove to be very inefficient.
0
 

Author Comment

by:Paccc
ID: 17913336
Thanks Raynard7 for all your help.  I finally got it working using a stored procedure.  It came out pretty ugly since I had to come up with a workaround in order to use dynamic SQL on the cursor.  I will post the code here in the hopes that it might benfit someone else:

CREATE DEFINER=`root`@`localhost` PROCEDURE `OHLC_BARS`(IN Symbol VARCHAR(20),IN Period INT)
BEGIN
  DROP TEMPORARY TABLE IF EXISTS `tmp_data`;
  CREATE TEMPORARY TABLE `tmp_data` (
    `Timestamp` INT UNSIGNED NOT NULL,
    `Price` DOUBLE NOT NULL,
    KEY `Timestamp` (`Timestamp`)
  ) ENGINE=MEMORY DEFAULT CHARSET=latin1;
  SET @query_str := CONCAT("INSERT INTO `tmp_data` (Timestamp,Price) SELECT `Timestamp` - (`Timestamp` MOD ",Period,") as Timestamp, Price FROM ",Symbol,
                    " ORDER BY Timestamp");
  PREPARE query_str FROM @query_str;
  EXECUTE query_str;
  DEALLOCATE PREPARE query_str;
BEGIN
  DECLARE done INT DEFAULT 0;
  DECLARE bar_ts,this_ts INT DEFAULT 0;
  DECLARE old_price,this_price DOUBLE DEFAULT 0;
  DECLARE cur CURSOR FOR SELECT * FROM `tmp_data`;
  DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done=1;

  DROP TEMPORARY TABLE IF EXISTS `tmp_ohlc`;
  CREATE TEMPORARY TABLE `tmp_ohlc` (
    `Timestamp` INT UNSIGNED NOT NULL,
    `Open` DOUBLE NOT NULL,
    `High` DOUBLE NOT NULL,
    `Low` DOUBLE NOT NULL,
    `Close` DOUBLE,
    `Volume` INT(10) UNSIGNED NOT NULL,
    KEY `Timestamp` (`Timestamp`)
  ) ENGINE=MEMORY DEFAULT CHARSET=latin1;

  SET @query_str := CONCAT("INSERT INTO `tmp_ohlc` (Timestamp,Open,High,Low,Volume) SELECT a.Timestamp - (a.Timestamp mod ",
                    Period,") AS Timestamp, a.Price AS Open, MAX(a.Price) AS High, MIN(a.Price) as Low, SUM(a.Size) AS Volume FROM ",
                    Symbol," a GROUP BY a.Timestamp - (a.Timestamp mod ",
                    Period,") ORDER BY a.Timestamp");
  PREPARE query_str FROM @query_str;
  EXECUTE query_str;
  DEALLOCATE PREPARE query_str;

  OPEN cur;

calc:LOOP
    FETCH cur INTO this_ts,this_price;
    IF done THEN
      LEAVE calc;
    END IF;
    IF NOT old_price THEN SET old_price = this_price; END IF;
    IF this_ts > bar_ts THEN
      UPDATE `tmp_ohlc` SET Close=old_price WHERE `Timestamp` = bar_ts;
      SET bar_ts = this_ts;
    END IF;
    SET old_price = this_price;
  END LOOP;
  UPDATE `tmp_ohlc` SET Close=old_price WHERE `Timestamp` = bar_ts;

  CLOSE cur;

  SELECT * FROM `tmp_ohlc`;
  DROP TEMPORARY TABLE IF EXISTS `tmp_ohlc`;
END;
DROP TEMPORARY TABLE IF EXISTS `tmp_data`;
END
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

I use MySQL for many of my development projects in a Windows environment. To manage my databases (and perform queries) for years I used a tool called MySQL administrator.  This tool has since been replaced by MySQL Workbench. So I decided to m…
Foreword This is an old article.  Instead of using the MySQL extension that was used in the original code examples, please choose one of the currently supported database extensions instead.  More information is available here: MySQLi / PDO (http://…
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question