How to increase the speed of a FOR loop binning function

Hi All,

I have built a function that is too slow. I was wondering if anyone knew a way to speed up/ vectorize a for loop function by a factor of 5 or 10? Her name is "MakeVolumeBinIdx"

The purpose of my program is to bin my input data according to a fixed Volume size. So each row should have (almost) the same size volume .

My Input data is made of Date,Time, Price,Volume which is tick data (transaction per transaction) of a stock.
My output: Date,Time, Open,High,Low,Close,Volume (the volume should be almost equal for each candles or bins)

I have included the function's code, the program, the data set and the R files as well.
Also, I have a Hp laptop that runs on intel i7 and windows8.

Thank you in advance :)

The function I need major help with
MakeVolumeBinIdx<-function(data1.volume,volBinSize){
  ### PURPOSE: Find the indexes for a given size of volume bin, that is 
  ###            Find the indexes of the data.frame where the sum of the volume is equal
  ###             to the input volBinSize.
  ### INPUT: a vector of trades volume and the desired volume bin size
  ### OUTPUT: a vector indicating where each row belongs to which volume bin
  
  #create index
  volBin<-1
  sumVol<-0
  Volume<-data1.volume
  volBinIdx <- numeric(length(Volume))
  
  #create cutting for each volume bin
  for(i in seq_len(length(Volume))){
    sumVol<-sumVol + Volume[i]  
    if (sumVol<= volBinSize) {
      volBinIdx[i] <- volBin
    } else {
      volBinIdx[i] <-  volBin <- volBin + 1
      sumVol <- Volume[i]
    }
  }
  
  #clean environment
  rm(Volume,i,sumVol,volBinSize,volBin)
  
  return(volBinIdx)
}

Open in new window


My Program
##### put all functions neededin-memory
source("FT_functions_SO.R")

## read data in
data1<-read.table("XYZ_EE.txt",sep=",",stringsAsFactor=F,header=F,
                      colClasses=c("character","character","numeric","numeric"))

#Name Columns
colnames(data1)<-c("Date","Time","Price","Volume")

#Add columns for total amount traded
data1["TT"]<-data1[,"Price"]*data1[,"Volume"]


##### time volBinIdx5K
start.time.volBinIdx5k<-Sys.time()

volBinIdx5k<-MakeVolumeBinIdx(data1$Volume,5000)
## Purpose: find indexes where cumulative volume equal 5,000 shares
## Input: data1$Volume and the size of the volume bin
## Output: vector with indexes for each row signifying which row belong to which 
##          volume bin

##### time it took volBinIdx5k
end.time.volBinIdx5k<-Sys.time()
time.volBinIdx5k<-end.time.volBinIdx5k-start.time.volBinIdx5k
time.volBinIdx5k


##### time MakeBinCandles
start.time.MakeBindcandles.5k<-Sys.time()

data1.5k.dfm<-MakeBinCandles(data1,volBinIdx5k)
## Purpose: Create candles based on volume bins
## Input: data1
## Output: data.frame: Date,Time,OHLC,volume,HighIdx,LowIdx,MF,TT, VWAP

##### time it took MakeBinCandles for 5000 shares
end.time.MakeBinCandles.5k<-Sys.time()
time.MakeBinCandles.5k<-end.time.MakeBinCandles.5k-start.time.MakeBindcandles.5k
time.MakeBinCandles.5k

Open in new window


The functions:
### FT_functions_EE.R

MakeVolumeBinIdx<-function(data1.volume,volBinSize){
  ### PURPOSE: Find the indexes for a given size of volume bin, that is 
  ###            Find the indexes of the data.frame where the sum of the volume is equal
  ###             to the input volBinSize.
  ### INPUT: a vector of trades volume and the desired volume bin size
  ### OUTPUT: a vector indicating where each row belongs to which volume bin
  
  #create index
  volBin<-1
  sumVol<-0
  Volume<-data1.volume
  volBinIdx <- numeric(length(Volume))
  
  #create cutting for each volume bin
  for(i in seq_len(length(Volume))){
    sumVol<-sumVol + Volume[i]  
    if (sumVol<= volBinSize) {
      volBinIdx[i] <- volBin
    } else {
      volBinIdx[i] <-  volBin <- volBin + 1
      sumVol <- Volume[i]
    }
  }
  
  #clean environment
  rm(Volume,i,sumVol,volBinSize,volBin)
  
  return(volBinIdx)
}


MakeBinCandles<-function(data,volBinIdxk){
  ### PURPOSE: Create candles based on bins
  ### INPUT: a new data.frame containing only Date,Time, Price,Volume,TT, AND
  ###          a vector containing the output of MakeVolumeBinIdx
  ### OUTPUT: a data.frame  with Date,Time, OHLC, Volume,
  ###           HighIdx,lowIdx, TT,VWAP (=volume weighted average price)
  
  library(dplyr)
  
  data.return<-data %>%
    mutate(volBinIdxk=volBinIdxk) %>%
    group_by(volBinIdxk) %>%
    summarize(Date=head(Date,1),
              Time=head(Time,1),
              Open=head(Price,1),
              High=max(Price),
              Low=min(Price),
              Close=tail(Price,1),
              Volume=sum(Volume),
              # HighIdx=which.max(Price),
              # LowIdx=which.min(Price),
              TT=sum(TT,na.rm=T),
              VWAP=TT/Volume) %>%
    select(-volBinIdxk) %>%
    as.data.frame()
  
  return(data.return)
  
}


MakeBinCandlesXts<-function(data){
  ### PURPOSE: Turn bin candles from data.frame into xts object
  ### INPUT: data frame outputed by MakeBinCandles()
  ### OUTPUT: xts object
  library(xts)
  data$Date<-strptime(paste(data$Date,data$Time),"%m/%d/%Y %H:%M:%S")
  
  data<-data[,-2] # if I don't remove it, all columns become characters
  data.xts<-xts(data[,-1],order.by=as.POSIXct(data[,1]))
  
  return (data.xts)
  
}

Open in new window

XYZ-EE.txt
FT-functions-EE.txt
FT-program-EE.txt
pgmerLAAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

gheistCommented:
installed.packages() and sessionInfo() please
is it recent version of R (like 3.x?)
rstudio - is it NOT  running inside it? same with plain R? (again - is it the latest one i.e build 1091?)
Not everybody uses Windows (e.g me)
Any libraries loaded in your code not shown here?
daveslaterCommented:
instad of using a for loop use a binay chop method;
start 1/2 down the loop and from there go either up/down depending on the result

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
aikimarkCommented:
I've been looking at this question for several hours and I'm not sure I understand it.  Wouldn't the Volume just be the average of the volume items for that stock for that day?
= sum(volume)/count(stockID)

Using a stats program like R, I would try to express the output as simply as possible, using R functions.
gheistCommented:
R is not deep into multiprocessing.
Depends on speed you want, you might also schedule tasks to parallel library (default on recent versions of R)
like detectCores number of them.
aikimarkCommented:
Stats based on your sample file:
TradingDate	Open	High	Low	Close	Avg
9/12/2014	79.84	79.95	79.17	79.35	3.95737617753338
9/14/2014	79.27	79.33	79.04	79.15	3.17172646427476
9/15/2014	79.15	79.48	79.1 	79.33	3.6664891173994
9/16/2014	79.33	79.58	79.13	79.48	3.43944172041261

Open in new window

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Math / Science

From novice to tech pro — start learning today.