Cartillo
asked on
Process the data more faster
Hi Experts,
I would like to request Experts help. The attached script has been used to crosscheck data from Validation sheet and updates all unique number and dupe data at Total Sheet. The problem is the macro takes more than 3 hours to process 1,200,000,data. Is that any possible to speed up the process? Hope Experts will help me to refine or suggest with other approach to speed up the process. Attached as well the workbook for Experts perusal.
I would like to request Experts help. The attached script has been used to crosscheck data from Validation sheet and updates all unique number and dupe data at Total Sheet. The problem is the macro takes more than 3 hours to process 1,200,000,data. Is that any possible to speed up the process? Hope Experts will help me to refine or suggest with other approach to speed up the process. Attached as well the workbook for Experts perusal.
Sub Total()
Dim WSV As Worksheet
Dim WST As Worksheet
Dim Rng As Range
Dim RowV As Long
Dim ColV As Long
Dim RowT As Long
Dim ColT As Long
Dim MaxCol As Long
Dim Typee As String
Dim Totall As Long
Dim GTotal As Long
Dim FirstNumber As Long
Dim I As Long
Dim J As Long
Set WSV = Sheets("Validation")
WSV.Copy after:=ActiveWorkbook.Worksheets(ActiveWorkbook.Worksheets.Count)
Set WSV = ActiveSheet
Set Rng = WSV.Range("A1:BW65536")
Set WST = Sheets("Total")
RowT = 2
ColT = 1
GTotal = 0
Do
Rng.SpecialCells(xlCellTypeBlanks).Delete Shift:=xlShiftToLeft
If Rng.Cells(2, 1) = "" Then
Rng.Range("2:2").EntireRow.Delete
End If
FirstNumber = Rng.Cells(2, 1)
Totall = 0
With Rng
Set C = .Find(What:=FirstNumber, LookIn:=xlValues, lookat:=xlWhole)
If Not C Is Nothing Then
firstAddress = C.Address
firstType = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
Do
Totall = Totall + 1
GTotal = GTotal + 1
Typee = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
Set C = .FindNext(C)
NewType = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
Loop While Not C Is Nothing And C.Address <> firstAddress And firstType = NewType
WST.Cells(RowT, ColT) = FirstNumber
WST.Cells(RowT, ColT + 1) = Typee
WST.Cells(RowT, ColT + 2) = Totall
DoEvents
'Clear Cells from Range
Set C = .Find(What:=FirstNumber, LookIn:=xlValues, lookat:=xlWhole)
If Not C Is Nothing Then
firstAddress = C.Address
firstType = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
Do
Rng.Cells(C.row, C.Column) = ""
Rng.Cells(C.row, (C.Column + 1)).Value = ""
Set C = .FindNext(C)
On Error Resume Next
NewType = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
Loop While Not C Is Nothing And C.Address <> firstAddress And firstType = NewType
Err.Clear
End If
ColT = ColT + 5
If ColT > 29 Then
ColT = 1
RowT = RowT + 1
End If
End If
End With
Loop While FirstNumber <> 0
Application.DisplayAlerts = False
WSV.Delete
Application.DisplayAlerts = True
WST.Activate
MsgBox ("Total completed for " & GTotal & " unique items")
End Sub
SpeedData.xls
ASKER
Hi peter57r,
Thanks for the feedback. Any chance for you to revise the script with “Dictionary.Scripting” method instead?
Thanks for the feedback. Any chance for you to revise the script with “Dictionary.Scripting” method instead?
I have no idea what “Dictionary.Scripting” is.
Cartillo - this looks familiar, but can you explain what needs doing?
ASKER
Hi StephenJR,
Thanks for the feedback. If you noticed my workbook, “Total” sheet is consist of unique data from “Validation” sheet, extracted unique data from “Number” column and automatically count total “Type” data that were duped (based on first 5 alphabetic). The attached script is able to perform this, but its hung massively when its running with a huge data (you may tested with the given sample data). Therefore, if we deploys “Dictionary.Scripting” object method, we would able to reduce the number of hours that we’ve spent into half. Hope you’ll consider my request.
Thanks for the feedback. If you noticed my workbook, “Total” sheet is consist of unique data from “Validation” sheet, extracted unique data from “Number” column and automatically count total “Type” data that were duped (based on first 5 alphabetic). The attached script is able to perform this, but its hung massively when its running with a huge data (you may tested with the given sample data). Therefore, if we deploys “Dictionary.Scripting” object method, we would able to reduce the number of hours that we’ve spent into half. Hope you’ll consider my request.
See how this goes. It deletes some columns from your starting sheet so make a copy first if you don't want this, and the output is in two columns.
Sub x()
Dim oDic As Object, vOut(), vIn(), i As Long, j As Long, n As Long
Sheets("Validation").Activate
On Error Resume Next
Rows(2).SpecialCells(xlCellTypeBlanks).EntireColumn.Delete shift:=xlToLeft
On Error GoTo 0
vIn = Range("A1").CurrentRegion.Value
ReDim vOut(1 To UBound(vIn, 1) * UBound(vIn, 2), 1 To 2)
Set oDic = CreateObject("Scripting.Dictionary")
With oDic
For i = 1 To UBound(vIn, 1)
For j = 1 To UBound(vIn, 2) - 1 Step 2
If Not .Exists(vIn(i, j)) Then
n = n + 1
vOut(n, 1) = vIn(i, j)
vOut(n, 2) = Left(vIn(i, j + 1), 5)
.Add vIn(i, j), n
End If
Next j
Next i
End With
Sheets("Total").Range("A1").Resize(n, 2) = vOut
End Sub
ASKER
Hi StephenJR,
Is that possible to copy the data evenly in other columns (have 6 columns altogether). This is because I need to crosschecked more than million data (at this moment I have 1,250,589 data). By doing so I'm able to copy all the data from Validation sheet. Also add number of data "Type" that were duped in the Validation sheet. E.g "ATZDI" duped 2x at Validation sheet. Therefore, "2" will be inputted at "Total Type" column automatically. Hope you will help me to add feature.
Is that possible to copy the data evenly in other columns (have 6 columns altogether). This is because I need to crosschecked more than million data (at this moment I have 1,250,589 data). By doing so I'm able to copy all the data from Validation sheet. Also add number of data "Type" that were duped in the Validation sheet. E.g "ATZDI" duped 2x at Validation sheet. Therefore, "2" will be inputted at "Total Type" column automatically. Hope you will help me to add feature.
Try this:
Sub x()
Dim oDic As Object, vOut(), vIn(), i As Long, j As Long, n As Long, p As Long
Sheets("Validation").Activate
On Error Resume Next
Rows(2).SpecialCells(xlCellTypeBlanks).EntireColumn.Delete shift:=xlToLeft
On Error GoTo 0
vIn = Range("A1").CurrentRegion.Value
ReDim vOut(1 To UBound(vIn, 1) * UBound(vIn, 2), 1 To 2)
Set oDic = CreateObject("Scripting.Dictionary")
With oDic
For i = 2 To UBound(vIn, 1)
For j = 1 To UBound(vIn, 2) - 1 Step 2
If Not .Exists(vIn(i, j)) Then
n = n + 1
vOut(n, 1) = vIn(i, j)
vOut(n, 2) = Left(vIn(i, j + 1), 5)
.Add vIn(i, j), n
End If
Next j
Next i
End With
p = WorksheetFunction.RoundUp(n / 3, 0)
With Sheets("Total").Range("A2")
.CurrentRegion.Clear
.Offset(-1).Resize(, 6).Value = Array("Number", "Type", "Number", "Type", "Number", "Type")
.Resize(n, 2) = vOut
.Offset(p).Resize(n - p, 2).Cut .Offset(, 2)
.Offset(p, 2).Resize(n - 2 * p, 2).Cut .Offset(, 4)
End With
End Sub
ASKER
Hi,
Thanks a lot. Its really super fast. Can we add the number of time the registered "Type" at Total sheet at Validation sheet. I add the sample data for better view. Hope you will consider this request.
SpeedData.xls
Thanks a lot. Its really super fast. Can we add the number of time the registered "Type" at Total sheet at Validation sheet. I add the sample data for better view. Hope you will consider this request.
SpeedData.xls
Try this:
Sub x()
Dim oDic As Object, vOut(), vIn(), i As Long, j As Long, n As Long, p As Long
With Sheets("Validation")
On Error Resume Next
.Rows(2).SpecialCells(xlCellTypeBlanks).EntireColumn.Delete shift:=xlToLeft
On Error GoTo 0
vIn = .Range("A1").CurrentRegion.Value
End With
ReDim vOut(1 To UBound(vIn, 1) * UBound(vIn, 2), 1 To 3)
Set oDic = CreateObject("Scripting.Dictionary")
With oDic
For i = 2 To UBound(vIn, 1)
For j = 1 To UBound(vIn, 2) - 1 Step 2
If Not .Exists(vIn(i, j)) Then
n = n + 1
vOut(n, 1) = vIn(i, j)
vOut(n, 2) = Left(vIn(i, j + 1), 5)
vOut(n, 3) = 1
.Add vIn(i, j), n
Else
vOut(.Item(vIn(i, j)), 3) = vOut(.Item(vIn(i, j)), 3) + 1
End If
Next j
Next i
End With
p = WorksheetFunction.RoundUp(n / 3, 0)
With Sheets("Total").Range("A2")
.CurrentRegion.Clear
.Offset(-1).Resize(, 9).Value = Array("Number", "Type", "Count", "Number", "Type", "Count", "Number", "Type", "Count")
.Resize(n, 3) = vOut
.Offset(p).Resize(n - p, 3).Cut .Offset(, 3)
.Offset(p, 3).Resize(n - 2 * p, 3).Cut .Offset(, 6)
End With
End Sub
ASKER
Hi StephenJR,
Thanks a lot for the great help. If I intent to add more data rows, currently its limited to 3 data sets, what should I do? is that any chances the system automatically set a number of data rows based on total data availability at Validation sheet instead of adding this data rows manually?
Thanks a lot for the great help. If I intent to add more data rows, currently its limited to 3 data sets, what should I do? is that any chances the system automatically set a number of data rows based on total data availability at Validation sheet instead of adding this data rows manually?
Cartillo - do you mean extra columns, so you might want the results spread out over more columns? We could probably build this into the code and you could change when you needed to, or the user could input a figure when the macro is run.
ASKER
Hi StephenJR,
Please help to add this function. I'm having millions of data and it's requiring more data rows to allow data spread of smoothly without any missing of data.
Please help to add this function. I'm having millions of data and it's requiring more data rows to allow data spread of smoothly without any missing of data.
Cartillo - please clarify what you mean by extra rows. I don't think I understand what you are asking. Are you talking about data on the Validation sheet or the Total sheet? If the latter which of the two choices above do you prefer?
ASKER
StephenJR,
Sorry for confusion, I'm referring to Total sheet. Currently we have 3 sets of data (9 rows). Can we make it this rows being added easily based on data availability at Validation sheet. Let say,its able to create 7 sets (21 rows) or 9 rows (27 rows) automatically based on data availability at Validation sheet. Hope I am not confusing you again.
Sorry for confusion, I'm referring to Total sheet. Currently we have 3 sets of data (9 rows). Can we make it this rows being added easily based on data availability at Validation sheet. Let say,its able to create 7 sets (21 rows) or 9 rows (27 rows) automatically based on data availability at Validation sheet. Hope I am not confusing you again.
Cartillo- ok I understand, but above I gave you two choices for how you might want to select the number of columns (columns, not rows!).
1) In the code so you change as appropriate
2) User input when the code is run
1) In the code so you change as appropriate
2) User input when the code is run
ASKER
Hi,
Sorry, should be columns not rows. 2nd (user input) is the preferred ones.
Sorry, should be columns not rows. 2nd (user input) is the preferred ones.
Hopefully this is it. You enter the number of sets, e.g. 3 or 7 or 9:
Sub x()
Dim oDic As Object, vOut(), vIn(), i As Long, j As Long, n As Long, p As Long, nCol As Long
nCol = Application.InputBox("How many sets of columns for the results (each set has three columns)?", Type:=1)
With Sheets("Validation")
On Error Resume Next
.Rows(2).SpecialCells(xlCellTypeBlanks).EntireColumn.Delete shift:=xlToLeft
On Error GoTo 0
vIn = .Range("A1").CurrentRegion.Value
End With
ReDim vOut(1 To UBound(vIn, 1) * UBound(vIn, 2), 1 To 3)
Set oDic = CreateObject("Scripting.Dictionary")
With oDic
For i = 2 To UBound(vIn, 1)
For j = 1 To UBound(vIn, 2) - 1 Step 2
If Not .Exists(vIn(i, j)) Then
n = n + 1
vOut(n, 1) = vIn(i, j)
vOut(n, 2) = Left(vIn(i, j + 1), 5)
vOut(n, 3) = 1
.Add vIn(i, j), n
Else
vOut(.Item(vIn(i, j)), 3) = vOut(.Item(vIn(i, j)), 3) + 1
End If
Next j
Next i
End With
p = WorksheetFunction.RoundUp(n / nCol, 0)
With Sheets("Total")
.UsedRange.Clear
With .Range("A1")
.Resize(, 3).Value = Array("Number", "Type", "Count")
.Offset(1).Resize(n, 3) = vOut
For j = 1 To nCol - 1
.Offset(, j * 4).Resize(, 3).Value = Array("Number", "Type", "Count")
.Offset(p + 1, (j - 1) * 4).Resize(n - (p * j), 3).Cut .Offset(1, j * 4)
Next j
End With
End With
End Sub
ASKER
Hi StephenJR,
Thanks a lot. I have tested with huge data and it shows an error as "object define error" at this line: ".Offset(1).Resize(n, 3) = vOut"
I have attached the data that I have used (in text file), can't sent it in excel format since the file is too big. I'm not sure what when wrong.
Sdata.zip
Thanks a lot. I have tested with huge data and it shows an error as "object define error" at this line: ".Offset(1).Resize(n, 3) = vOut"
I have attached the data that I have used (in text file), can't sent it in excel format since the file is too big. I'm not sure what when wrong.
Sdata.zip
If you are using xl 2003 and n is more than 65,000-odd you will run out of rows. Can you move to xl2007? Otherwise will have to come up with a workaround.
ASKER
Hi StephenJR,
The data has been arranged in such a way that after a column reached at 65536 row, the new column will takes over the following data from Validation sheet. From my test the code failed to work after reaching 9,000 rows. What's the workaround that we can use?
The data has been arranged in such a way that after a column reached at 65536 row, the new column will takes over the following data from Validation sheet. From my test the code failed to work after reaching 9,000 rows. What's the workaround that we can use?
Cartillo,
I have been following the threads and due to your persistance of using Scripting.dictionnary I looked it up and looked for the diffrent options there is which are quite intressting.
Meantime I would like to point out to 1 issue:
The issue I mentioned in the previous question re the item 573510 having several Types to which you did not answer is actually repeating in the code proposed by StephanJR. Are you satissfied with compiling all the 573510 and giving you a total of 37 in your initial example or youneed a breakdown by Type that is:
573110 AZNXX 1
573510 BAZWD 3
573510 BBLOU 3
573510 BBLXF 4
573510 BBMBR 3
573510 BBMIB 3
573510 BAZKS 2
573510 BAZKR 2
573510 BAZKO 1
573510 BAZDD 1
573510 BAZCU 2
etc ...
???
As the present code and all suggested ones give you a total by number.
Rgds/gowflow
I have been following the threads and due to your persistance of using Scripting.dictionnary I looked it up and looked for the diffrent options there is which are quite intressting.
Meantime I would like to point out to 1 issue:
The issue I mentioned in the previous question re the item 573510 having several Types to which you did not answer is actually repeating in the code proposed by StephanJR. Are you satissfied with compiling all the 573510 and giving you a total of 37 in your initial example or youneed a breakdown by Type that is:
573110 AZNXX 1
573510 BAZWD 3
573510 BBLOU 3
573510 BBLXF 4
573510 BBMBR 3
573510 BBMIB 3
573510 BAZKS 2
573510 BAZKR 2
573510 BAZKO 1
573510 BAZDD 1
573510 BAZCU 2
etc ...
???
As the present code and all suggested ones give you a total by number.
Rgds/gowflow
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
StephenJR
I am impressed on how efficient and fast processing is the Scripting.Dictionary technique. I never heared of it until Cartillo mentioned it and looked around for info but could hardly find something good.
kindly suggest something where I can find more info on this.
Tks/Rgds
gowflow
I am impressed on how efficient and fast processing is the Scripting.Dictionary technique. I never heared of it until Cartillo mentioned it and looked around for info but could hardly find something good.
kindly suggest something where I can find more info on this.
Tks/Rgds
gowflow
gowflow,
You can start at this very site, with Patrick's excellent article:
https://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/A_3391-Using-the-Dictionary-Class-in-VBA.html?sfQueryTermInfo=1+30+dictionari
I'm not sure how much of the speed increase is due to the Dictionary and how much is due to transferring data from the sheet to an array, processing the array and then writing back to the sheet. The latter also speeds things up considerably.
You can start at this very site, with Patrick's excellent article:
https://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/A_3391-Using-the-Dictionary-Class-in-VBA.html?sfQueryTermInfo=1+30+dictionari
I'm not sure how much of the speed increase is due to the Dictionary and how much is due to transferring data from the sheet to an array, processing the array and then writing back to the sheet. The latter also speeds things up considerably.
Definitivley array handling is the key issue. I guess dictionnary get's its advantage when you need to sort as understand it has the key capability that accesses the item rightaway when you need it. Tks your link and we never cease to learn !
gowflow
gowflow
ASKER
Hi Gowflow,
Thanks for highlighting the "573510" issue which has several Type data, I'm keep on overlooked on this matter.
Hi StephenJR,
Apology for this mistake, is that any chance for us to get this data as well? Hope you will assist.
Thanks for highlighting the "573510" issue which has several Type data, I'm keep on overlooked on this matter.
Hi StephenJR,
Apology for this mistake, is that any chance for us to get this data as well? Hope you will assist.
Cartillo,
By the way the second version of the code I proposed in the previous question dealt with this issue.
StephenJR
Idea is to loop thru Numbers and also loop within the Number found for same occurence of the first 5 digits of the type to which you apply the total. So double loop
Rgds/gowflow
By the way the second version of the code I proposed in the previous question dealt with this issue.
StephenJR
Idea is to loop thru Numbers and also loop within the Number found for same occurence of the first 5 digits of the type to which you apply the total. So double loop
Rgds/gowflow
ASKER
Hi gowflow,
Yes, you're right but if we use "Scripting.Dictionary" together that will speedup the process extremely.
Yes, you're right but if we use "Scripting.Dictionary" together that will speedup the process extremely.
Cartillo,
I totally agree I checked the routine displayed by StephenJR and indeed its amazingly faster. Am looking at the thread that StephenJR displayed about scritiingDictionnary and its indeed a good method.
Rgds/gowflow
I totally agree I checked the routine displayed by StephenJR and indeed its amazingly faster. Am looking at the thread that StephenJR displayed about scritiingDictionnary and its indeed a good method.
Rgds/gowflow
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
By the way I tested this code with over 60000 row on 18 sets of columns and its amazing how fast it run it finishes in less than 1minute !!!
StepehJR your a genius in finding this process !!!
Rgds/gowflow
StepehJR your a genius in finding this process !!!
Rgds/gowflow
Cartillo,
Also a minor change in the For i loop start it at 2 instead of 1 as 1 would be calculating the header which is useless !
also in your original post in Validation sheet you have 2 empty columns for every set of 3 columns so to have the data aligned you should replace
.Offset(, (r - 1) * 4).Resize(, 3).Value = Array("Number", "Type", "Count")
.Offset(1, (r - 1) * 4).Resize(p, 3) = vOut2
By this
.Offset(, (r - 1) * 5).Resize(, 3).Value = Array("Number", "Type", "Count")
.Offset(1, (r - 1) * 5).Resize(p, 3) = vOut2
Rgds/gowflow
gowflow
Also a minor change in the For i loop start it at 2 instead of 1 as 1 would be calculating the header which is useless !
also in your original post in Validation sheet you have 2 empty columns for every set of 3 columns so to have the data aligned you should replace
.Offset(, (r - 1) * 4).Resize(, 3).Value = Array("Number", "Type", "Count")
.Offset(1, (r - 1) * 4).Resize(p, 3) = vOut2
By this
.Offset(, (r - 1) * 5).Resize(, 3).Value = Array("Number", "Type", "Count")
.Offset(1, (r - 1) * 5).Resize(p, 3) = vOut2
Rgds/gowflow
gowflow
Cartillo,
Any news ? Is your issue solved ?
gowflow
Any news ? Is your issue solved ?
gowflow
ASKER
Hi,
Thanks a lot for the great help
Thanks a lot for the great help
Application.ScreenUpdating
after the Dim statements and
Application.ScreenUpdating
before the msgbox statement at the end,
that just doing this will reduce the time by half.