# How to increase the speed of a FOR loop binning function

Hi All,

I have built a function that is too slow. I was wondering if anyone knew a way to speed up/ vectorize a for loop function by a factor of 5 or 10? Her name is "MakeVolumeBinIdx"

The purpose of my program is to bin my input data according to a fixed Volume size. So each row should have (almost) the same size volume .

My Input data is made of Date,Time, Price,Volume which is tick data (transaction per transaction) of a stock.
My output: Date,Time, Open,High,Low,Close,Volume (the volume should be almost equal for each candles or bins)

I have included the function's code, the program, the data set and the R files as well.
Also, I have a Hp laptop that runs on intel i7 and windows8.

The function I need major help with
``````MakeVolumeBinIdx<-function(data1.volume,volBinSize){
### PURPOSE: Find the indexes for a given size of volume bin, that is
###            Find the indexes of the data.frame where the sum of the volume is equal
###             to the input volBinSize.
### INPUT: a vector of trades volume and the desired volume bin size
### OUTPUT: a vector indicating where each row belongs to which volume bin

#create index
volBin<-1
sumVol<-0
Volume<-data1.volume
volBinIdx <- numeric(length(Volume))

#create cutting for each volume bin
for(i in seq_len(length(Volume))){
sumVol<-sumVol + Volume[i]
if (sumVol<= volBinSize) {
volBinIdx[i] <- volBin
} else {
volBinIdx[i] <-  volBin <- volBin + 1
sumVol <- Volume[i]
}
}

#clean environment
rm(Volume,i,sumVol,volBinSize,volBin)

return(volBinIdx)
}
``````

My Program
``````##### put all functions neededin-memory
source("FT_functions_SO.R")

colClasses=c("character","character","numeric","numeric"))

#Name Columns
colnames(data1)<-c("Date","Time","Price","Volume")

data1["TT"]<-data1[,"Price"]*data1[,"Volume"]

##### time volBinIdx5K
start.time.volBinIdx5k<-Sys.time()

volBinIdx5k<-MakeVolumeBinIdx(data1\$Volume,5000)
## Purpose: find indexes where cumulative volume equal 5,000 shares
## Input: data1\$Volume and the size of the volume bin
## Output: vector with indexes for each row signifying which row belong to which
##          volume bin

##### time it took volBinIdx5k
end.time.volBinIdx5k<-Sys.time()
time.volBinIdx5k<-end.time.volBinIdx5k-start.time.volBinIdx5k
time.volBinIdx5k

##### time MakeBinCandles
start.time.MakeBindcandles.5k<-Sys.time()

data1.5k.dfm<-MakeBinCandles(data1,volBinIdx5k)
## Purpose: Create candles based on volume bins
## Input: data1
## Output: data.frame: Date,Time,OHLC,volume,HighIdx,LowIdx,MF,TT, VWAP

##### time it took MakeBinCandles for 5000 shares
end.time.MakeBinCandles.5k<-Sys.time()
time.MakeBinCandles.5k<-end.time.MakeBinCandles.5k-start.time.MakeBindcandles.5k
time.MakeBinCandles.5k
``````

The functions:
``````### FT_functions_EE.R

MakeVolumeBinIdx<-function(data1.volume,volBinSize){
### PURPOSE: Find the indexes for a given size of volume bin, that is
###            Find the indexes of the data.frame where the sum of the volume is equal
###             to the input volBinSize.
### INPUT: a vector of trades volume and the desired volume bin size
### OUTPUT: a vector indicating where each row belongs to which volume bin

#create index
volBin<-1
sumVol<-0
Volume<-data1.volume
volBinIdx <- numeric(length(Volume))

#create cutting for each volume bin
for(i in seq_len(length(Volume))){
sumVol<-sumVol + Volume[i]
if (sumVol<= volBinSize) {
volBinIdx[i] <- volBin
} else {
volBinIdx[i] <-  volBin <- volBin + 1
sumVol <- Volume[i]
}
}

#clean environment
rm(Volume,i,sumVol,volBinSize,volBin)

return(volBinIdx)
}

MakeBinCandles<-function(data,volBinIdxk){
### PURPOSE: Create candles based on bins
### INPUT: a new data.frame containing only Date,Time, Price,Volume,TT, AND
###          a vector containing the output of MakeVolumeBinIdx
### OUTPUT: a data.frame  with Date,Time, OHLC, Volume,
###           HighIdx,lowIdx, TT,VWAP (=volume weighted average price)

library(dplyr)

data.return<-data %>%
mutate(volBinIdxk=volBinIdxk) %>%
group_by(volBinIdxk) %>%
High=max(Price),
Low=min(Price),
Close=tail(Price,1),
Volume=sum(Volume),
# HighIdx=which.max(Price),
# LowIdx=which.min(Price),
TT=sum(TT,na.rm=T),
VWAP=TT/Volume) %>%
select(-volBinIdxk) %>%
as.data.frame()

return(data.return)

}

MakeBinCandlesXts<-function(data){
### PURPOSE: Turn bin candles from data.frame into xts object
### INPUT: data frame outputed by MakeBinCandles()
### OUTPUT: xts object
library(xts)
data\$Date<-strptime(paste(data\$Date,data\$Time),"%m/%d/%Y %H:%M:%S")

data<-data[,-2] # if I don't remove it, all columns become characters
data.xts<-xts(data[,-1],order.by=as.POSIXct(data[,1]))

return (data.xts)

}
``````
XYZ-EE.txt
FT-functions-EE.txt
FT-program-EE.txt
###### Who is Participating?

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Commented:
rstudio - is it NOT  running inside it? same with plain R? (again - is it the latest one i.e build 1091?)
Not everybody uses Windows (e.g me)
Commented:
instad of using a for loop use a binay chop method;
start 1/2 down the loop and from there go either up/down depending on the result

Experts Exchange Solution brought to you by

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Commented:
I've been looking at this question for several hours and I'm not sure I understand it.  Wouldn't the Volume just be the average of the volume items for that stock for that day?
= sum(volume)/count(stockID)

Using a stats program like R, I would try to express the output as simply as possible, using R functions.
Commented:
R is not deep into multiprocessing.
Depends on speed you want, you might also schedule tasks to parallel library (default on recent versions of R)
like detectCores number of them.
Commented:
Stats based on your sample file:
``````TradingDate	Open	High	Low	Close	Avg
9/12/2014	79.84	79.95	79.17	79.35	3.95737617753338
9/14/2014	79.27	79.33	79.04	79.15	3.17172646427476
9/15/2014	79.15	79.48	79.1 	79.33	3.6664891173994
9/16/2014	79.33	79.58	79.13	79.48	3.43944172041261
``````
###### It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Math / Science

From novice to tech pro — start learning today.