I have built a function that is too slow. I was wondering if anyone knew a way to speed up/ vectorize a for loop function by a factor of 5 or 10? Her name is "MakeVolumeBinIdx"

The purpose of my program is to bin my input data according to a fixed Volume size. So each row should have (almost) the same size volume .

My Input data is made of Date,Time, Price,Volume which is tick data (transaction per transaction) of a stock.
My output: Date,Time, Open,High,Low,Close,Volume (the volume should be almost equal for each candles or bins)

I have included the function's code, the program, the data set and the R files as well.
Also, I have a Hp laptop that runs on intel i7 and windows8.

Thank you in advance :)

The function I need major help with

MakeVolumeBinIdx<-function(data1.volume,volBinSize){ ### PURPOSE: Find the indexes for a given size of volume bin, that is ### Find the indexes of the data.frame where the sum of the volume is equal ### to the input volBinSize. ### INPUT: a vector of trades volume and the desired volume bin size ### OUTPUT: a vector indicating where each row belongs to which volume bin #create index volBin<-1 sumVol<-0 Volume<-data1.volume volBinIdx <- numeric(length(Volume)) #create cutting for each volume bin for(i in seq_len(length(Volume))){ sumVol<-sumVol + Volume[i] if (sumVol<= volBinSize) { volBinIdx[i] <- volBin } else { volBinIdx[i] <- volBin <- volBin + 1 sumVol <- Volume[i] } } #clean environment rm(Volume,i,sumVol,volBinSize,volBin) return(volBinIdx)}

##### put all functions neededin-memorysource("FT_functions_SO.R")## read data indata1<-read.table("XYZ_EE.txt",sep=",",stringsAsFactor=F,header=F, colClasses=c("character","character","numeric","numeric"))#Name Columnscolnames(data1)<-c("Date","Time","Price","Volume")#Add columns for total amount tradeddata1["TT"]<-data1[,"Price"]*data1[,"Volume"]##### time volBinIdx5Kstart.time.volBinIdx5k<-Sys.time()volBinIdx5k<-MakeVolumeBinIdx(data1$Volume,5000)## Purpose: find indexes where cumulative volume equal 5,000 shares## Input: data1$Volume and the size of the volume bin## Output: vector with indexes for each row signifying which row belong to which ## volume bin##### time it took volBinIdx5kend.time.volBinIdx5k<-Sys.time()time.volBinIdx5k<-end.time.volBinIdx5k-start.time.volBinIdx5ktime.volBinIdx5k##### time MakeBinCandlesstart.time.MakeBindcandles.5k<-Sys.time()data1.5k.dfm<-MakeBinCandles(data1,volBinIdx5k)## Purpose: Create candles based on volume bins## Input: data1## Output: data.frame: Date,Time,OHLC,volume,HighIdx,LowIdx,MF,TT, VWAP##### time it took MakeBinCandles for 5000 sharesend.time.MakeBinCandles.5k<-Sys.time()time.MakeBinCandles.5k<-end.time.MakeBinCandles.5k-start.time.MakeBindcandles.5ktime.MakeBinCandles.5k

### FT_functions_EE.RMakeVolumeBinIdx<-function(data1.volume,volBinSize){ ### PURPOSE: Find the indexes for a given size of volume bin, that is ### Find the indexes of the data.frame where the sum of the volume is equal ### to the input volBinSize. ### INPUT: a vector of trades volume and the desired volume bin size ### OUTPUT: a vector indicating where each row belongs to which volume bin #create index volBin<-1 sumVol<-0 Volume<-data1.volume volBinIdx <- numeric(length(Volume)) #create cutting for each volume bin for(i in seq_len(length(Volume))){ sumVol<-sumVol + Volume[i] if (sumVol<= volBinSize) { volBinIdx[i] <- volBin } else { volBinIdx[i] <- volBin <- volBin + 1 sumVol <- Volume[i] } } #clean environment rm(Volume,i,sumVol,volBinSize,volBin) return(volBinIdx)}MakeBinCandles<-function(data,volBinIdxk){ ### PURPOSE: Create candles based on bins ### INPUT: a new data.frame containing only Date,Time, Price,Volume,TT, AND ### a vector containing the output of MakeVolumeBinIdx ### OUTPUT: a data.frame with Date,Time, OHLC, Volume, ### HighIdx,lowIdx, TT,VWAP (=volume weighted average price) library(dplyr) data.return<-data %>% mutate(volBinIdxk=volBinIdxk) %>% group_by(volBinIdxk) %>% summarize(Date=head(Date,1), Time=head(Time,1), Open=head(Price,1), High=max(Price), Low=min(Price), Close=tail(Price,1), Volume=sum(Volume), # HighIdx=which.max(Price), # LowIdx=which.min(Price), TT=sum(TT,na.rm=T), VWAP=TT/Volume) %>% select(-volBinIdxk) %>% as.data.frame() return(data.return)}MakeBinCandlesXts<-function(data){ ### PURPOSE: Turn bin candles from data.frame into xts object ### INPUT: data frame outputed by MakeBinCandles() ### OUTPUT: xts object library(xts) data$Date<-strptime(paste(data$Date,data$Time),"%m/%d/%Y %H:%M:%S") data<-data[,-2] # if I don't remove it, all columns become characters data.xts<-xts(data[,-1],order.by=as.POSIXct(data[,1])) return (data.xts)}

installed.packages() and sessionInfo() please
is it recent version of R (like 3.x?)
rstudio - is it NOT running inside it? same with plain R? (again - is it the latest one i.e build 1091?)
Not everybody uses Windows (e.g me)
Any libraries loaded in your code not shown here?

I've been looking at this question for several hours and I'm not sure I understand it. Wouldn't the Volume just be the average of the volume items for that stock for that day? = sum(volume)/count(stockID)

Using a stats program like R, I would try to express the output as simply as possible, using R functions.

R is not deep into multiprocessing.
Depends on speed you want, you might also schedule tasks to parallel library (default on recent versions of R)
like detectCores number of them.

Lithium-ion batteries area cornerstone of today's portable electronic devices, and even though they are relied upon heavily, their chemistry and origin are not of common knowledge. This article is about a device on which every smartphone, laptop, anâ€¦

This is a video describing the growing solar energy use in Utah. This is a topic that greatly interests me and so I decided to produce a video about it.

Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaacâ€¦