Link to home
Start Free TrialLog in
Avatar of Cartillo
CartilloFlag for Malaysia

asked on

Process the data more faster

Hi Experts,

I would like to request Experts help. The attached script has been used to crosscheck data from Validation sheet and updates all unique number and dupe data at Total Sheet. The problem is the macro takes more than  3 hours to process 1,200,000,data. Is that any possible to speed up the process? Hope Experts will help me to refine or suggest with other approach to speed up the process. Attached as well the workbook for Experts perusal.



Sub Total()
Dim WSV As Worksheet
Dim WST As Worksheet
Dim Rng As Range
Dim RowV As Long
Dim ColV As Long
Dim RowT As Long
Dim ColT As Long
Dim MaxCol As Long
Dim Typee As String
Dim Totall As Long
Dim GTotal As Long
Dim FirstNumber As Long
Dim I As Long
Dim J As Long

Set WSV = Sheets("Validation")
WSV.Copy after:=ActiveWorkbook.Worksheets(ActiveWorkbook.Worksheets.Count)
Set WSV = ActiveSheet
Set Rng = WSV.Range("A1:BW65536")

Set WST = Sheets("Total")

RowT = 2
ColT = 1
GTotal = 0

Do
Rng.SpecialCells(xlCellTypeBlanks).Delete Shift:=xlShiftToLeft
If Rng.Cells(2, 1) = "" Then
    Rng.Range("2:2").EntireRow.Delete
End If
FirstNumber = Rng.Cells(2, 1)
    Totall = 0
    With Rng
        Set C = .Find(What:=FirstNumber, LookIn:=xlValues, lookat:=xlWhole)
        If Not C Is Nothing Then
                firstAddress = C.Address
                firstType = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
                Do
                    Totall = Totall + 1
                    GTotal = GTotal + 1
                    Typee = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
                    Set C = .FindNext(C)
                    NewType = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
                Loop While Not C Is Nothing And C.Address <> firstAddress And firstType = NewType
                WST.Cells(RowT, ColT) = FirstNumber
                WST.Cells(RowT, ColT + 1) = Typee
                WST.Cells(RowT, ColT + 2) = Totall
                DoEvents
                'Clear Cells from Range
                Set C = .Find(What:=FirstNumber, LookIn:=xlValues, lookat:=xlWhole)
                If Not C Is Nothing Then
                    firstAddress = C.Address
                    firstType = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
                    Do
                        Rng.Cells(C.row, C.Column) = ""
                        Rng.Cells(C.row, (C.Column + 1)).Value = ""
                        Set C = .FindNext(C)
                        On Error Resume Next
                        NewType = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
                    Loop While Not C Is Nothing And C.Address <> firstAddress And firstType = NewType
                Err.Clear
                End If
                ColT = ColT + 5
                If ColT > 29 Then
                    ColT = 1
                    RowT = RowT + 1
                End If
        End If
    End With
Loop While FirstNumber <> 0

Application.DisplayAlerts = False
WSV.Delete
Application.DisplayAlerts = True
WST.Activate
MsgBox ("Total completed for " & GTotal & " unique items")

End Sub

Open in new window

SpeedData.xls
Avatar of peter57r
peter57r
Flag of United Kingdom of Great Britain and Northern Ireland image

I suspect that just by adding the lines..
Application.ScreenUpdating = False  
after the Dim statements and
Application.ScreenUpdating = True
before the msgbox statement at the end,
that just doing this will reduce the time by half.
Avatar of Cartillo

ASKER

Hi peter57r,

Thanks for the feedback. Any chance for you to revise the script with  “Dictionary.Scripting” method instead?  
I have no idea what  “Dictionary.Scripting” is.
Cartillo - this looks familiar, but can you explain what needs doing?
Hi StephenJR,

Thanks for the feedback. If you noticed my workbook, “Total” sheet is consist of unique data from “Validation” sheet, extracted unique data from “Number” column and automatically count total “Type” data that were duped (based on first 5 alphabetic). The attached script is able to perform this,  but its hung massively when its running with a huge data (you may tested with the given sample data). Therefore, if we deploys “Dictionary.Scripting” object method, we would able to reduce the number of hours that we’ve spent into half. Hope you’ll consider my request.  
See how this goes. It deletes some columns from your starting sheet so make a copy first if you don't want this, and the output is in two columns.
Sub x()

Dim oDic As Object, vOut(), vIn(), i As Long, j As Long, n As Long

Sheets("Validation").Activate
On Error Resume Next
Rows(2).SpecialCells(xlCellTypeBlanks).EntireColumn.Delete shift:=xlToLeft
On Error GoTo 0
vIn = Range("A1").CurrentRegion.Value
ReDim vOut(1 To UBound(vIn, 1) * UBound(vIn, 2), 1 To 2)

Set oDic = CreateObject("Scripting.Dictionary")

With oDic
    For i = 1 To UBound(vIn, 1)
        For j = 1 To UBound(vIn, 2) - 1 Step 2
            If Not .Exists(vIn(i, j)) Then
                n = n + 1
                vOut(n, 1) = vIn(i, j)
                vOut(n, 2) = Left(vIn(i, j + 1), 5)
                .Add vIn(i, j), n
            End If
        Next j
    Next i
End With

Sheets("Total").Range("A1").Resize(n, 2) = vOut

End Sub

Open in new window

Hi StephenJR,

Is that possible to copy the data evenly in other columns (have 6 columns altogether). This is because I need to crosschecked more than million data (at this moment I have 1,250,589 data). By doing so I'm able to copy all the data from Validation sheet. Also add number of data "Type" that were duped in the Validation sheet. E.g "ATZDI" duped 2x at Validation sheet. Therefore, "2" will be inputted at "Total Type" column automatically. Hope you will help me to add feature.
Try this:
Sub x()

Dim oDic As Object, vOut(), vIn(), i As Long, j As Long, n As Long, p As Long

Sheets("Validation").Activate
On Error Resume Next
Rows(2).SpecialCells(xlCellTypeBlanks).EntireColumn.Delete shift:=xlToLeft
On Error GoTo 0
vIn = Range("A1").CurrentRegion.Value
ReDim vOut(1 To UBound(vIn, 1) * UBound(vIn, 2), 1 To 2)

Set oDic = CreateObject("Scripting.Dictionary")

With oDic
    For i = 2 To UBound(vIn, 1)
        For j = 1 To UBound(vIn, 2) - 1 Step 2
            If Not .Exists(vIn(i, j)) Then
                n = n + 1
                vOut(n, 1) = vIn(i, j)
                vOut(n, 2) = Left(vIn(i, j + 1), 5)
                .Add vIn(i, j), n
            End If
        Next j
    Next i
End With

p = WorksheetFunction.RoundUp(n / 3, 0)

With Sheets("Total").Range("A2")
    .CurrentRegion.Clear
    .Offset(-1).Resize(, 6).Value = Array("Number", "Type", "Number", "Type", "Number", "Type")
    .Resize(n, 2) = vOut
    .Offset(p).Resize(n - p, 2).Cut .Offset(, 2)
    .Offset(p, 2).Resize(n - 2 * p, 2).Cut .Offset(, 4)
End With

End Sub

Open in new window

Hi,

Thanks a lot. Its really super fast. Can we add the number of time the registered "Type" at Total sheet at Validation sheet. I add the sample data for better view. Hope you will consider this request.  
SpeedData.xls
Try this:
Sub x()

Dim oDic As Object, vOut(), vIn(), i As Long, j As Long, n As Long, p As Long

With Sheets("Validation")
    On Error Resume Next
    .Rows(2).SpecialCells(xlCellTypeBlanks).EntireColumn.Delete shift:=xlToLeft
    On Error GoTo 0
    vIn = .Range("A1").CurrentRegion.Value
End With

ReDim vOut(1 To UBound(vIn, 1) * UBound(vIn, 2), 1 To 3)

Set oDic = CreateObject("Scripting.Dictionary")

With oDic
    For i = 2 To UBound(vIn, 1)
        For j = 1 To UBound(vIn, 2) - 1 Step 2
            If Not .Exists(vIn(i, j)) Then
                n = n + 1
                vOut(n, 1) = vIn(i, j)
                vOut(n, 2) = Left(vIn(i, j + 1), 5)
                vOut(n, 3) = 1
                .Add vIn(i, j), n
            Else
                vOut(.Item(vIn(i, j)), 3) = vOut(.Item(vIn(i, j)), 3) + 1
            End If
        Next j
    Next i
End With

p = WorksheetFunction.RoundUp(n / 3, 0)

With Sheets("Total").Range("A2")
    .CurrentRegion.Clear
    .Offset(-1).Resize(, 9).Value = Array("Number", "Type", "Count", "Number", "Type", "Count", "Number", "Type", "Count")
    .Resize(n, 3) = vOut
    .Offset(p).Resize(n - p, 3).Cut .Offset(, 3)
    .Offset(p, 3).Resize(n - 2 * p, 3).Cut .Offset(, 6)
End With

End Sub

Open in new window

Hi StephenJR,

Thanks a lot for the great help. If I intent to add more data rows, currently its limited to 3 data sets, what should I do? is that any chances the system automatically set a number of data rows based on total data availability at Validation sheet instead of adding this data rows manually?
Cartillo - do you mean extra columns, so you might want the results spread out over more columns? We could probably build this into the code and you could change when you needed to, or the user could input a figure when the macro is run.
Hi StephenJR,

Please help to add this function. I'm having millions of data and it's requiring more data rows to allow data spread of smoothly without any missing of data.
Cartillo - please clarify what you mean by extra rows. I don't think I understand what you are asking. Are you talking about data on the Validation sheet or the Total sheet? If the latter which of the two choices above do you prefer?
StephenJR,

Sorry for confusion, I'm referring to Total sheet. Currently we have 3 sets of data (9 rows). Can we make it this rows being added easily based on data availability at Validation sheet. Let say,its able to create 7 sets (21 rows) or 9 rows (27 rows) automatically based on data availability at Validation sheet. Hope I am not confusing you again.  
Cartillo- ok I understand, but above I gave you two choices for how you might want to select the number of columns (columns, not rows!).

1) In the code so you change as appropriate
2) User input when the code is run
Hi,

Sorry, should be columns not rows. 2nd (user input) is the preferred ones.
Hopefully this is it. You enter the number of sets, e.g. 3 or 7 or 9:
Sub x()

Dim oDic As Object, vOut(), vIn(), i As Long, j As Long, n As Long, p As Long, nCol As Long

nCol = Application.InputBox("How many sets of columns for the results (each set has three columns)?", Type:=1)

With Sheets("Validation")
    On Error Resume Next
    .Rows(2).SpecialCells(xlCellTypeBlanks).EntireColumn.Delete shift:=xlToLeft
    On Error GoTo 0
    vIn = .Range("A1").CurrentRegion.Value
End With

ReDim vOut(1 To UBound(vIn, 1) * UBound(vIn, 2), 1 To 3)

Set oDic = CreateObject("Scripting.Dictionary")

With oDic
    For i = 2 To UBound(vIn, 1)
        For j = 1 To UBound(vIn, 2) - 1 Step 2
            If Not .Exists(vIn(i, j)) Then
                n = n + 1
                vOut(n, 1) = vIn(i, j)
                vOut(n, 2) = Left(vIn(i, j + 1), 5)
                vOut(n, 3) = 1
                .Add vIn(i, j), n
            Else
                vOut(.Item(vIn(i, j)), 3) = vOut(.Item(vIn(i, j)), 3) + 1
            End If
        Next j
    Next i
End With

p = WorksheetFunction.RoundUp(n / nCol, 0)

With Sheets("Total")
    .UsedRange.Clear
    With .Range("A1")
        .Resize(, 3).Value = Array("Number", "Type", "Count")
        .Offset(1).Resize(n, 3) = vOut
        For j = 1 To nCol - 1
            .Offset(, j * 4).Resize(, 3).Value = Array("Number", "Type", "Count")
            .Offset(p + 1, (j - 1) * 4).Resize(n - (p * j), 3).Cut .Offset(1, j * 4)
        Next j
    End With
End With

End Sub

Open in new window

Hi StephenJR,

Thanks a lot. I have tested with huge data and it shows an error as "object define error" at this line: ".Offset(1).Resize(n, 3) = vOut"

I have attached the data that I have used (in text file), can't sent it in excel format since the file is too big. I'm not sure what when wrong.
Sdata.zip
If you are using xl 2003 and n is more than 65,000-odd you will run out of rows. Can you move to xl2007? Otherwise will have to come up with a workaround.
Hi StephenJR,

The data has been arranged in such a way that after a column reached at 65536 row, the new column will takes over the following data from Validation sheet. From my test the code failed to work after reaching 9,000 rows. What's the workaround that we can use?
Cartillo,

I have been following the threads and due to your persistance of using Scripting.dictionnary I looked it up and looked for the diffrent options there is which are quite intressting.

Meantime I would like to point out to 1 issue:

The issue I mentioned in the previous question re the item 573510 having several Types to which you did not answer is actually repeating in the code proposed by StephanJR. Are you satissfied with compiling all the 573510 and giving you a total of 37 in your initial example or youneed a breakdown by Type that is:
573110      AZNXX      1
573510      BAZWD      3
573510      BBLOU      3
573510      BBLXF      4
573510      BBMBR      3
573510      BBMIB      3
573510      BAZKS      2
573510      BAZKR      2
573510      BAZKO      1
573510      BAZDD      1
573510      BAZCU      2
etc ...
???
As the present code and all suggested ones give you a total by number.

Rgds/gowflow
ASKER CERTIFIED SOLUTION
Avatar of StephenJR
StephenJR
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
StephenJR

I am impressed on how efficient and fast processing is the Scripting.Dictionary technique. I never heared of it until Cartillo mentioned it and looked around for info but could hardly find something good.

kindly suggest something where I can find more info on this.
Tks/Rgds
gowflow
gowflow,
You can start at this very site, with Patrick's excellent article:

https://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/A_3391-Using-the-Dictionary-Class-in-VBA.html?sfQueryTermInfo=1+30+dictionari

I'm not sure how much of the speed increase is due to the Dictionary and how much is due to transferring data from the sheet to an array, processing the array and then writing back to the sheet. The latter also speeds things up considerably.
Definitivley array handling is the key issue. I guess dictionnary get's its advantage when you need to sort as understand it has the key capability that accesses the item rightaway when you need it. Tks your link and we never cease to learn !
gowflow
Hi Gowflow,

Thanks for highlighting the "573510" issue which has several Type data, I'm keep on overlooked on this matter.

Hi  StephenJR,

Apology for this mistake, is that any chance for us to get this data as well? Hope you will assist.
Cartillo,

By the way the second version of the code I proposed in the previous question dealt with this issue.

StephenJR
Idea is to loop thru Numbers and also loop within the Number found for same occurence of the first 5 digits of the type to which you apply the total. So double loop

Rgds/gowflow
Hi gowflow,

Yes, you're right but if we use "Scripting.Dictionary" together that will speedup the process extremely.
Cartillo,
I totally agree I checked the routine displayed by StephenJR and indeed its amazingly faster. Am looking at the thread that StephenJR displayed about scritiingDictionnary and its indeed a good method.

Rgds/gowflow
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
By the way I tested this code with over 60000 row on 18 sets of columns and its amazing how fast it run it finishes in less than 1minute !!!

StepehJR your a genius in finding this process !!!
Rgds/gowflow
Cartillo,

Also a minor change in the For i loop start it at 2 instead of 1 as 1 would be calculating the header which is useless !

also in your original post in Validation sheet you have 2 empty columns for every set of 3 columns so to have the data aligned you should replace
.Offset(, (r - 1) * 4).Resize(, 3).Value = Array("Number", "Type", "Count")
.Offset(1, (r - 1) * 4).Resize(p, 3) = vOut2

By this
.Offset(, (r - 1) * 5).Resize(, 3).Value = Array("Number", "Type", "Count")
.Offset(1, (r - 1) * 5).Resize(p, 3) = vOut2

Rgds/gowflow

gowflow
Cartillo,
Any news ? Is your issue solved ?
gowflow
Hi,

Thanks a lot for the great help