Solved

How to increase the speed of a FOR loop binning function

Posted on 2014-11-30
5
195 Views
Last Modified: 2015-03-12
Hi All,

I have built a function that is too slow. I was wondering if anyone knew a way to speed up/ vectorize a for loop function by a factor of 5 or 10? Her name is "MakeVolumeBinIdx"

The purpose of my program is to bin my input data according to a fixed Volume size. So each row should have (almost) the same size volume .

My Input data is made of Date,Time, Price,Volume which is tick data (transaction per transaction) of a stock.
My output: Date,Time, Open,High,Low,Close,Volume (the volume should be almost equal for each candles or bins)

I have included the function's code, the program, the data set and the R files as well.
Also, I have a Hp laptop that runs on intel i7 and windows8.

Thank you in advance :)

The function I need major help with
MakeVolumeBinIdx<-function(data1.volume,volBinSize){
  ### PURPOSE: Find the indexes for a given size of volume bin, that is 
  ###            Find the indexes of the data.frame where the sum of the volume is equal
  ###             to the input volBinSize.
  ### INPUT: a vector of trades volume and the desired volume bin size
  ### OUTPUT: a vector indicating where each row belongs to which volume bin
  
  #create index
  volBin<-1
  sumVol<-0
  Volume<-data1.volume
  volBinIdx <- numeric(length(Volume))
  
  #create cutting for each volume bin
  for(i in seq_len(length(Volume))){
    sumVol<-sumVol + Volume[i]  
    if (sumVol<= volBinSize) {
      volBinIdx[i] <- volBin
    } else {
      volBinIdx[i] <-  volBin <- volBin + 1
      sumVol <- Volume[i]
    }
  }
  
  #clean environment
  rm(Volume,i,sumVol,volBinSize,volBin)
  
  return(volBinIdx)
}

Open in new window


My Program
##### put all functions neededin-memory
source("FT_functions_SO.R")

## read data in
data1<-read.table("XYZ_EE.txt",sep=",",stringsAsFactor=F,header=F,
                      colClasses=c("character","character","numeric","numeric"))

#Name Columns
colnames(data1)<-c("Date","Time","Price","Volume")

#Add columns for total amount traded
data1["TT"]<-data1[,"Price"]*data1[,"Volume"]


##### time volBinIdx5K
start.time.volBinIdx5k<-Sys.time()

volBinIdx5k<-MakeVolumeBinIdx(data1$Volume,5000)
## Purpose: find indexes where cumulative volume equal 5,000 shares
## Input: data1$Volume and the size of the volume bin
## Output: vector with indexes for each row signifying which row belong to which 
##          volume bin

##### time it took volBinIdx5k
end.time.volBinIdx5k<-Sys.time()
time.volBinIdx5k<-end.time.volBinIdx5k-start.time.volBinIdx5k
time.volBinIdx5k


##### time MakeBinCandles
start.time.MakeBindcandles.5k<-Sys.time()

data1.5k.dfm<-MakeBinCandles(data1,volBinIdx5k)
## Purpose: Create candles based on volume bins
## Input: data1
## Output: data.frame: Date,Time,OHLC,volume,HighIdx,LowIdx,MF,TT, VWAP

##### time it took MakeBinCandles for 5000 shares
end.time.MakeBinCandles.5k<-Sys.time()
time.MakeBinCandles.5k<-end.time.MakeBinCandles.5k-start.time.MakeBindcandles.5k
time.MakeBinCandles.5k

Open in new window


The functions:
### FT_functions_EE.R

MakeVolumeBinIdx<-function(data1.volume,volBinSize){
  ### PURPOSE: Find the indexes for a given size of volume bin, that is 
  ###            Find the indexes of the data.frame where the sum of the volume is equal
  ###             to the input volBinSize.
  ### INPUT: a vector of trades volume and the desired volume bin size
  ### OUTPUT: a vector indicating where each row belongs to which volume bin
  
  #create index
  volBin<-1
  sumVol<-0
  Volume<-data1.volume
  volBinIdx <- numeric(length(Volume))
  
  #create cutting for each volume bin
  for(i in seq_len(length(Volume))){
    sumVol<-sumVol + Volume[i]  
    if (sumVol<= volBinSize) {
      volBinIdx[i] <- volBin
    } else {
      volBinIdx[i] <-  volBin <- volBin + 1
      sumVol <- Volume[i]
    }
  }
  
  #clean environment
  rm(Volume,i,sumVol,volBinSize,volBin)
  
  return(volBinIdx)
}


MakeBinCandles<-function(data,volBinIdxk){
  ### PURPOSE: Create candles based on bins
  ### INPUT: a new data.frame containing only Date,Time, Price,Volume,TT, AND
  ###          a vector containing the output of MakeVolumeBinIdx
  ### OUTPUT: a data.frame  with Date,Time, OHLC, Volume,
  ###           HighIdx,lowIdx, TT,VWAP (=volume weighted average price)
  
  library(dplyr)
  
  data.return<-data %>%
    mutate(volBinIdxk=volBinIdxk) %>%
    group_by(volBinIdxk) %>%
    summarize(Date=head(Date,1),
              Time=head(Time,1),
              Open=head(Price,1),
              High=max(Price),
              Low=min(Price),
              Close=tail(Price,1),
              Volume=sum(Volume),
              # HighIdx=which.max(Price),
              # LowIdx=which.min(Price),
              TT=sum(TT,na.rm=T),
              VWAP=TT/Volume) %>%
    select(-volBinIdxk) %>%
    as.data.frame()
  
  return(data.return)
  
}


MakeBinCandlesXts<-function(data){
  ### PURPOSE: Turn bin candles from data.frame into xts object
  ### INPUT: data frame outputed by MakeBinCandles()
  ### OUTPUT: xts object
  library(xts)
  data$Date<-strptime(paste(data$Date,data$Time),"%m/%d/%Y %H:%M:%S")
  
  data<-data[,-2] # if I don't remove it, all columns become characters
  data.xts<-xts(data[,-1],order.by=as.POSIXct(data[,1]))
  
  return (data.xts)
  
}

Open in new window

XYZ-EE.txt
FT-functions-EE.txt
FT-program-EE.txt
0
Comment
Question by:pgmerLA
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 62

Expert Comment

by:gheist
ID: 40495971
installed.packages() and sessionInfo() please
is it recent version of R (like 3.x?)
rstudio - is it NOT  running inside it? same with plain R? (again - is it the latest one i.e build 1091?)
Not everybody uses Windows (e.g me)
Any libraries loaded in your code not shown here?
0
 
LVL 14

Accepted Solution

by:
daveslater earned 500 total points
ID: 40496426
instad of using a for loop use a binay chop method;
start 1/2 down the loop and from there go either up/down depending on the result
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40496516
I've been looking at this question for several hours and I'm not sure I understand it.  Wouldn't the Volume just be the average of the volume items for that stock for that day?
= sum(volume)/count(stockID)

Using a stats program like R, I would try to express the output as simply as possible, using R functions.
0
 
LVL 62

Expert Comment

by:gheist
ID: 40496703
R is not deep into multiprocessing.
Depends on speed you want, you might also schedule tasks to parallel library (default on recent versions of R)
like detectCores number of them.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 40496911
Stats based on your sample file:
TradingDate	Open	High	Low	Close	Avg
9/12/2014	79.84	79.95	79.17	79.35	3.95737617753338
9/14/2014	79.27	79.33	79.04	79.15	3.17172646427476
9/15/2014	79.15	79.48	79.1 	79.33	3.6664891173994
9/16/2014	79.33	79.58	79.13	79.48	3.43944172041261

Open in new window

0

Featured Post

[Webinar] Learn How Hackers Steal Your Credentials

Do You Know How Hackers Steal Your Credentials? Join us and Skyport Systems to learn how hackers steal your credentials and why Active Directory must be secure to stop them. Thursday, July 13, 2017 10:00 A.M. PDT

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A Guide to the PMT, FV, IPMT and PPMT Functions In MS Excel we have the PMT, FV, IPMT and PPMT functions, which do a fantastic job for interest rate calculations.  But what if you don't have Excel ? This article is for programmers looking to re…
Have you ever thought of installing a power system that generates solar electricity to power your house? Some may say yes, while others may tell me no. But have you noticed that people around you are now considering installing such systems in their …
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

729 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question