?
Solved

How to increase the speed of a FOR loop binning function

Posted on 2014-11-30
5
Medium Priority
?
209 Views
Last Modified: 2015-03-12
Hi All,

I have built a function that is too slow. I was wondering if anyone knew a way to speed up/ vectorize a for loop function by a factor of 5 or 10? Her name is "MakeVolumeBinIdx"

The purpose of my program is to bin my input data according to a fixed Volume size. So each row should have (almost) the same size volume .

My Input data is made of Date,Time, Price,Volume which is tick data (transaction per transaction) of a stock.
My output: Date,Time, Open,High,Low,Close,Volume (the volume should be almost equal for each candles or bins)

I have included the function's code, the program, the data set and the R files as well.
Also, I have a Hp laptop that runs on intel i7 and windows8.

Thank you in advance :)

The function I need major help with
MakeVolumeBinIdx<-function(data1.volume,volBinSize){
  ### PURPOSE: Find the indexes for a given size of volume bin, that is 
  ###            Find the indexes of the data.frame where the sum of the volume is equal
  ###             to the input volBinSize.
  ### INPUT: a vector of trades volume and the desired volume bin size
  ### OUTPUT: a vector indicating where each row belongs to which volume bin
  
  #create index
  volBin<-1
  sumVol<-0
  Volume<-data1.volume
  volBinIdx <- numeric(length(Volume))
  
  #create cutting for each volume bin
  for(i in seq_len(length(Volume))){
    sumVol<-sumVol + Volume[i]  
    if (sumVol<= volBinSize) {
      volBinIdx[i] <- volBin
    } else {
      volBinIdx[i] <-  volBin <- volBin + 1
      sumVol <- Volume[i]
    }
  }
  
  #clean environment
  rm(Volume,i,sumVol,volBinSize,volBin)
  
  return(volBinIdx)
}

Open in new window


My Program
##### put all functions neededin-memory
source("FT_functions_SO.R")

## read data in
data1<-read.table("XYZ_EE.txt",sep=",",stringsAsFactor=F,header=F,
                      colClasses=c("character","character","numeric","numeric"))

#Name Columns
colnames(data1)<-c("Date","Time","Price","Volume")

#Add columns for total amount traded
data1["TT"]<-data1[,"Price"]*data1[,"Volume"]


##### time volBinIdx5K
start.time.volBinIdx5k<-Sys.time()

volBinIdx5k<-MakeVolumeBinIdx(data1$Volume,5000)
## Purpose: find indexes where cumulative volume equal 5,000 shares
## Input: data1$Volume and the size of the volume bin
## Output: vector with indexes for each row signifying which row belong to which 
##          volume bin

##### time it took volBinIdx5k
end.time.volBinIdx5k<-Sys.time()
time.volBinIdx5k<-end.time.volBinIdx5k-start.time.volBinIdx5k
time.volBinIdx5k


##### time MakeBinCandles
start.time.MakeBindcandles.5k<-Sys.time()

data1.5k.dfm<-MakeBinCandles(data1,volBinIdx5k)
## Purpose: Create candles based on volume bins
## Input: data1
## Output: data.frame: Date,Time,OHLC,volume,HighIdx,LowIdx,MF,TT, VWAP

##### time it took MakeBinCandles for 5000 shares
end.time.MakeBinCandles.5k<-Sys.time()
time.MakeBinCandles.5k<-end.time.MakeBinCandles.5k-start.time.MakeBindcandles.5k
time.MakeBinCandles.5k

Open in new window


The functions:
### FT_functions_EE.R

MakeVolumeBinIdx<-function(data1.volume,volBinSize){
  ### PURPOSE: Find the indexes for a given size of volume bin, that is 
  ###            Find the indexes of the data.frame where the sum of the volume is equal
  ###             to the input volBinSize.
  ### INPUT: a vector of trades volume and the desired volume bin size
  ### OUTPUT: a vector indicating where each row belongs to which volume bin
  
  #create index
  volBin<-1
  sumVol<-0
  Volume<-data1.volume
  volBinIdx <- numeric(length(Volume))
  
  #create cutting for each volume bin
  for(i in seq_len(length(Volume))){
    sumVol<-sumVol + Volume[i]  
    if (sumVol<= volBinSize) {
      volBinIdx[i] <- volBin
    } else {
      volBinIdx[i] <-  volBin <- volBin + 1
      sumVol <- Volume[i]
    }
  }
  
  #clean environment
  rm(Volume,i,sumVol,volBinSize,volBin)
  
  return(volBinIdx)
}


MakeBinCandles<-function(data,volBinIdxk){
  ### PURPOSE: Create candles based on bins
  ### INPUT: a new data.frame containing only Date,Time, Price,Volume,TT, AND
  ###          a vector containing the output of MakeVolumeBinIdx
  ### OUTPUT: a data.frame  with Date,Time, OHLC, Volume,
  ###           HighIdx,lowIdx, TT,VWAP (=volume weighted average price)
  
  library(dplyr)
  
  data.return<-data %>%
    mutate(volBinIdxk=volBinIdxk) %>%
    group_by(volBinIdxk) %>%
    summarize(Date=head(Date,1),
              Time=head(Time,1),
              Open=head(Price,1),
              High=max(Price),
              Low=min(Price),
              Close=tail(Price,1),
              Volume=sum(Volume),
              # HighIdx=which.max(Price),
              # LowIdx=which.min(Price),
              TT=sum(TT,na.rm=T),
              VWAP=TT/Volume) %>%
    select(-volBinIdxk) %>%
    as.data.frame()
  
  return(data.return)
  
}


MakeBinCandlesXts<-function(data){
  ### PURPOSE: Turn bin candles from data.frame into xts object
  ### INPUT: data frame outputed by MakeBinCandles()
  ### OUTPUT: xts object
  library(xts)
  data$Date<-strptime(paste(data$Date,data$Time),"%m/%d/%Y %H:%M:%S")
  
  data<-data[,-2] # if I don't remove it, all columns become characters
  data.xts<-xts(data[,-1],order.by=as.POSIXct(data[,1]))
  
  return (data.xts)
  
}

Open in new window

XYZ-EE.txt
FT-functions-EE.txt
FT-program-EE.txt
0
Comment
Question by:pgmerLA
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 62

Expert Comment

by:gheist
ID: 40495971
installed.packages() and sessionInfo() please
is it recent version of R (like 3.x?)
rstudio - is it NOT  running inside it? same with plain R? (again - is it the latest one i.e build 1091?)
Not everybody uses Windows (e.g me)
Any libraries loaded in your code not shown here?
0
 
LVL 14

Accepted Solution

by:
daveslater earned 2000 total points
ID: 40496426
instad of using a for loop use a binay chop method;
start 1/2 down the loop and from there go either up/down depending on the result
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40496516
I've been looking at this question for several hours and I'm not sure I understand it.  Wouldn't the Volume just be the average of the volume items for that stock for that day?
= sum(volume)/count(stockID)

Using a stats program like R, I would try to express the output as simply as possible, using R functions.
0
 
LVL 62

Expert Comment

by:gheist
ID: 40496703
R is not deep into multiprocessing.
Depends on speed you want, you might also schedule tasks to parallel library (default on recent versions of R)
like detectCores number of them.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 40496911
Stats based on your sample file:
TradingDate	Open	High	Low	Close	Avg
9/12/2014	79.84	79.95	79.17	79.35	3.95737617753338
9/14/2014	79.27	79.33	79.04	79.15	3.17172646427476
9/15/2014	79.15	79.48	79.1 	79.33	3.6664891173994
9/16/2014	79.33	79.58	79.13	79.48	3.43944172041261

Open in new window

0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article seeks to propel the full implementation of geothermal power plants in Mexico as a renewable energy source.
Lithium-ion batteries area cornerstone of today's portable electronic devices, and even though they are relied upon heavily, their chemistry and origin are not of common knowledge. This article is about a device on which every smartphone, laptop, an…
This is a video describing the growing solar energy use in Utah. This is a topic that greatly interests me and so I decided to produce a video about it.
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question