Solved

Process the data more faster

Posted on 2011-03-02
36
690 Views
Last Modified: 2012-05-11
Hi Experts,

I would like to request Experts help. The attached script has been used to crosscheck data from Validation sheet and updates all unique number and dupe data at Total Sheet. The problem is the macro takes more than  3 hours to process 1,200,000,data. Is that any possible to speed up the process? Hope Experts will help me to refine or suggest with other approach to speed up the process. Attached as well the workbook for Experts perusal.



Sub Total()
Dim WSV As Worksheet
Dim WST As Worksheet
Dim Rng As Range
Dim RowV As Long
Dim ColV As Long
Dim RowT As Long
Dim ColT As Long
Dim MaxCol As Long
Dim Typee As String
Dim Totall As Long
Dim GTotal As Long
Dim FirstNumber As Long
Dim I As Long
Dim J As Long

Set WSV = Sheets("Validation")
WSV.Copy after:=ActiveWorkbook.Worksheets(ActiveWorkbook.Worksheets.Count)
Set WSV = ActiveSheet
Set Rng = WSV.Range("A1:BW65536")

Set WST = Sheets("Total")

RowT = 2
ColT = 1
GTotal = 0

Do
Rng.SpecialCells(xlCellTypeBlanks).Delete Shift:=xlShiftToLeft
If Rng.Cells(2, 1) = "" Then
    Rng.Range("2:2").EntireRow.Delete
End If
FirstNumber = Rng.Cells(2, 1)
    Totall = 0
    With Rng
        Set C = .Find(What:=FirstNumber, LookIn:=xlValues, lookat:=xlWhole)
        If Not C Is Nothing Then
                firstAddress = C.Address
                firstType = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
                Do
                    Totall = Totall + 1
                    GTotal = GTotal + 1
                    Typee = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
                    Set C = .FindNext(C)
                    NewType = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
                Loop While Not C Is Nothing And C.Address <> firstAddress And firstType = NewType
                WST.Cells(RowT, ColT) = FirstNumber
                WST.Cells(RowT, ColT + 1) = Typee
                WST.Cells(RowT, ColT + 2) = Totall
                DoEvents
                'Clear Cells from Range
                Set C = .Find(What:=FirstNumber, LookIn:=xlValues, lookat:=xlWhole)
                If Not C Is Nothing Then
                    firstAddress = C.Address
                    firstType = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
                    Do
                        Rng.Cells(C.row, C.Column) = ""
                        Rng.Cells(C.row, (C.Column + 1)).Value = ""
                        Set C = .FindNext(C)
                        On Error Resume Next
                        NewType = Mid(Rng.Cells(C.row, (C.Column + 1)), 1, 5)
                    Loop While Not C Is Nothing And C.Address <> firstAddress And firstType = NewType
                Err.Clear
                End If
                ColT = ColT + 5
                If ColT > 29 Then
                    ColT = 1
                    RowT = RowT + 1
                End If
        End If
    End With
Loop While FirstNumber <> 0

Application.DisplayAlerts = False
WSV.Delete
Application.DisplayAlerts = True
WST.Activate
MsgBox ("Total completed for " & GTotal & " unique items")

End Sub

Open in new window

SpeedData.xls
0
Comment
Question by:Cartillo
  • 13
  • 11
  • 9
  • +2
36 Comments
 
LVL 77

Expert Comment

by:peter57r
Comment Utility
I suspect that just by adding the lines..
Application.ScreenUpdating = False  
after the Dim statements and
Application.ScreenUpdating = True
before the msgbox statement at the end,
that just doing this will reduce the time by half.
0
 

Author Comment

by:Cartillo
Comment Utility
Hi peter57r,

Thanks for the feedback. Any chance for you to revise the script with  “Dictionary.Scripting” method instead?  
0
 
LVL 77

Expert Comment

by:peter57r
Comment Utility
I have no idea what  “Dictionary.Scripting” is.
0
 
LVL 24

Expert Comment

by:StephenJR
Comment Utility
Cartillo - this looks familiar, but can you explain what needs doing?
0
 
LVL 9

Expert Comment

by:Ramanhp
Comment Utility
0
 

Author Comment

by:Cartillo
Comment Utility
Hi StephenJR,

Thanks for the feedback. If you noticed my workbook, “Total” sheet is consist of unique data from “Validation” sheet, extracted unique data from “Number” column and automatically count total “Type” data that were duped (based on first 5 alphabetic). The attached script is able to perform this,  but its hung massively when its running with a huge data (you may tested with the given sample data). Therefore, if we deploys “Dictionary.Scripting” object method, we would able to reduce the number of hours that we’ve spent into half. Hope you’ll consider my request.  
0
 
LVL 24

Expert Comment

by:StephenJR
Comment Utility
See how this goes. It deletes some columns from your starting sheet so make a copy first if you don't want this, and the output is in two columns.
Sub x()

Dim oDic As Object, vOut(), vIn(), i As Long, j As Long, n As Long

Sheets("Validation").Activate
On Error Resume Next
Rows(2).SpecialCells(xlCellTypeBlanks).EntireColumn.Delete shift:=xlToLeft
On Error GoTo 0
vIn = Range("A1").CurrentRegion.Value
ReDim vOut(1 To UBound(vIn, 1) * UBound(vIn, 2), 1 To 2)

Set oDic = CreateObject("Scripting.Dictionary")

With oDic
    For i = 1 To UBound(vIn, 1)
        For j = 1 To UBound(vIn, 2) - 1 Step 2
            If Not .Exists(vIn(i, j)) Then
                n = n + 1
                vOut(n, 1) = vIn(i, j)
                vOut(n, 2) = Left(vIn(i, j + 1), 5)
                .Add vIn(i, j), n
            End If
        Next j
    Next i
End With

Sheets("Total").Range("A1").Resize(n, 2) = vOut

End Sub

Open in new window

0
 

Author Comment

by:Cartillo
Comment Utility
Hi StephenJR,

Is that possible to copy the data evenly in other columns (have 6 columns altogether). This is because I need to crosschecked more than million data (at this moment I have 1,250,589 data). By doing so I'm able to copy all the data from Validation sheet. Also add number of data "Type" that were duped in the Validation sheet. E.g "ATZDI" duped 2x at Validation sheet. Therefore, "2" will be inputted at "Total Type" column automatically. Hope you will help me to add feature.
0
 
LVL 24

Expert Comment

by:StephenJR
Comment Utility
Try this:
Sub x()

Dim oDic As Object, vOut(), vIn(), i As Long, j As Long, n As Long, p As Long

Sheets("Validation").Activate
On Error Resume Next
Rows(2).SpecialCells(xlCellTypeBlanks).EntireColumn.Delete shift:=xlToLeft
On Error GoTo 0
vIn = Range("A1").CurrentRegion.Value
ReDim vOut(1 To UBound(vIn, 1) * UBound(vIn, 2), 1 To 2)

Set oDic = CreateObject("Scripting.Dictionary")

With oDic
    For i = 2 To UBound(vIn, 1)
        For j = 1 To UBound(vIn, 2) - 1 Step 2
            If Not .Exists(vIn(i, j)) Then
                n = n + 1
                vOut(n, 1) = vIn(i, j)
                vOut(n, 2) = Left(vIn(i, j + 1), 5)
                .Add vIn(i, j), n
            End If
        Next j
    Next i
End With

p = WorksheetFunction.RoundUp(n / 3, 0)

With Sheets("Total").Range("A2")
    .CurrentRegion.Clear
    .Offset(-1).Resize(, 6).Value = Array("Number", "Type", "Number", "Type", "Number", "Type")
    .Resize(n, 2) = vOut
    .Offset(p).Resize(n - p, 2).Cut .Offset(, 2)
    .Offset(p, 2).Resize(n - 2 * p, 2).Cut .Offset(, 4)
End With

End Sub

Open in new window

0
 

Author Comment

by:Cartillo
Comment Utility
Hi,

Thanks a lot. Its really super fast. Can we add the number of time the registered "Type" at Total sheet at Validation sheet. I add the sample data for better view. Hope you will consider this request.  
SpeedData.xls
0
 
LVL 24

Expert Comment

by:StephenJR
Comment Utility
Try this:
Sub x()

Dim oDic As Object, vOut(), vIn(), i As Long, j As Long, n As Long, p As Long

With Sheets("Validation")
    On Error Resume Next
    .Rows(2).SpecialCells(xlCellTypeBlanks).EntireColumn.Delete shift:=xlToLeft
    On Error GoTo 0
    vIn = .Range("A1").CurrentRegion.Value
End With

ReDim vOut(1 To UBound(vIn, 1) * UBound(vIn, 2), 1 To 3)

Set oDic = CreateObject("Scripting.Dictionary")

With oDic
    For i = 2 To UBound(vIn, 1)
        For j = 1 To UBound(vIn, 2) - 1 Step 2
            If Not .Exists(vIn(i, j)) Then
                n = n + 1
                vOut(n, 1) = vIn(i, j)
                vOut(n, 2) = Left(vIn(i, j + 1), 5)
                vOut(n, 3) = 1
                .Add vIn(i, j), n
            Else
                vOut(.Item(vIn(i, j)), 3) = vOut(.Item(vIn(i, j)), 3) + 1
            End If
        Next j
    Next i
End With

p = WorksheetFunction.RoundUp(n / 3, 0)

With Sheets("Total").Range("A2")
    .CurrentRegion.Clear
    .Offset(-1).Resize(, 9).Value = Array("Number", "Type", "Count", "Number", "Type", "Count", "Number", "Type", "Count")
    .Resize(n, 3) = vOut
    .Offset(p).Resize(n - p, 3).Cut .Offset(, 3)
    .Offset(p, 3).Resize(n - 2 * p, 3).Cut .Offset(, 6)
End With

End Sub

Open in new window

0
 

Author Comment

by:Cartillo
Comment Utility
Hi StephenJR,

Thanks a lot for the great help. If I intent to add more data rows, currently its limited to 3 data sets, what should I do? is that any chances the system automatically set a number of data rows based on total data availability at Validation sheet instead of adding this data rows manually?
0
 
LVL 24

Expert Comment

by:StephenJR
Comment Utility
Cartillo - do you mean extra columns, so you might want the results spread out over more columns? We could probably build this into the code and you could change when you needed to, or the user could input a figure when the macro is run.
0
 

Author Comment

by:Cartillo
Comment Utility
Hi StephenJR,

Please help to add this function. I'm having millions of data and it's requiring more data rows to allow data spread of smoothly without any missing of data.
0
 
LVL 24

Expert Comment

by:StephenJR
Comment Utility
Cartillo - please clarify what you mean by extra rows. I don't think I understand what you are asking. Are you talking about data on the Validation sheet or the Total sheet? If the latter which of the two choices above do you prefer?
0
 

Author Comment

by:Cartillo
Comment Utility
StephenJR,

Sorry for confusion, I'm referring to Total sheet. Currently we have 3 sets of data (9 rows). Can we make it this rows being added easily based on data availability at Validation sheet. Let say,its able to create 7 sets (21 rows) or 9 rows (27 rows) automatically based on data availability at Validation sheet. Hope I am not confusing you again.  
0
 
LVL 24

Expert Comment

by:StephenJR
Comment Utility
Cartillo- ok I understand, but above I gave you two choices for how you might want to select the number of columns (columns, not rows!).

1) In the code so you change as appropriate
2) User input when the code is run
0
 

Author Comment

by:Cartillo
Comment Utility
Hi,

Sorry, should be columns not rows. 2nd (user input) is the preferred ones.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 24

Expert Comment

by:StephenJR
Comment Utility
Hopefully this is it. You enter the number of sets, e.g. 3 or 7 or 9:
Sub x()

Dim oDic As Object, vOut(), vIn(), i As Long, j As Long, n As Long, p As Long, nCol As Long

nCol = Application.InputBox("How many sets of columns for the results (each set has three columns)?", Type:=1)

With Sheets("Validation")
    On Error Resume Next
    .Rows(2).SpecialCells(xlCellTypeBlanks).EntireColumn.Delete shift:=xlToLeft
    On Error GoTo 0
    vIn = .Range("A1").CurrentRegion.Value
End With

ReDim vOut(1 To UBound(vIn, 1) * UBound(vIn, 2), 1 To 3)

Set oDic = CreateObject("Scripting.Dictionary")

With oDic
    For i = 2 To UBound(vIn, 1)
        For j = 1 To UBound(vIn, 2) - 1 Step 2
            If Not .Exists(vIn(i, j)) Then
                n = n + 1
                vOut(n, 1) = vIn(i, j)
                vOut(n, 2) = Left(vIn(i, j + 1), 5)
                vOut(n, 3) = 1
                .Add vIn(i, j), n
            Else
                vOut(.Item(vIn(i, j)), 3) = vOut(.Item(vIn(i, j)), 3) + 1
            End If
        Next j
    Next i
End With

p = WorksheetFunction.RoundUp(n / nCol, 0)

With Sheets("Total")
    .UsedRange.Clear
    With .Range("A1")
        .Resize(, 3).Value = Array("Number", "Type", "Count")
        .Offset(1).Resize(n, 3) = vOut
        For j = 1 To nCol - 1
            .Offset(, j * 4).Resize(, 3).Value = Array("Number", "Type", "Count")
            .Offset(p + 1, (j - 1) * 4).Resize(n - (p * j), 3).Cut .Offset(1, j * 4)
        Next j
    End With
End With

End Sub

Open in new window

0
 

Author Comment

by:Cartillo
Comment Utility
Hi StephenJR,

Thanks a lot. I have tested with huge data and it shows an error as "object define error" at this line: ".Offset(1).Resize(n, 3) = vOut"

I have attached the data that I have used (in text file), can't sent it in excel format since the file is too big. I'm not sure what when wrong.
Sdata.zip
0
 
LVL 24

Expert Comment

by:StephenJR
Comment Utility
If you are using xl 2003 and n is more than 65,000-odd you will run out of rows. Can you move to xl2007? Otherwise will have to come up with a workaround.
0
 

Author Comment

by:Cartillo
Comment Utility
Hi StephenJR,

The data has been arranged in such a way that after a column reached at 65536 row, the new column will takes over the following data from Validation sheet. From my test the code failed to work after reaching 9,000 rows. What's the workaround that we can use?
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
Cartillo,

I have been following the threads and due to your persistance of using Scripting.dictionnary I looked it up and looked for the diffrent options there is which are quite intressting.

Meantime I would like to point out to 1 issue:

The issue I mentioned in the previous question re the item 573510 having several Types to which you did not answer is actually repeating in the code proposed by StephanJR. Are you satissfied with compiling all the 573510 and giving you a total of 37 in your initial example or youneed a breakdown by Type that is:
573110      AZNXX      1
573510      BAZWD      3
573510      BBLOU      3
573510      BBLXF      4
573510      BBMBR      3
573510      BBMIB      3
573510      BAZKS      2
573510      BAZKR      2
573510      BAZKO      1
573510      BAZDD      1
573510      BAZCU      2
etc ...
???
As the present code and all suggested ones give you a total by number.

Rgds/gowflow
0
 
LVL 24

Accepted Solution

by:
StephenJR earned 300 total points
Comment Utility
Try this:
Sub x()

Dim oDic As Object, vOut(), vIn(), vOut2()
Dim i As Long, j As Long, k As Long, n As Long, p As Long, nCol As Long, r As Long

nCol = Application.InputBox("How many sets of columns for the results (each set has three columns)?", Type:=1)

With Sheets("Validation")
    On Error Resume Next
    .Rows(2).SpecialCells(xlCellTypeBlanks).EntireColumn.Delete shift:=xlToLeft
    On Error GoTo 0
    vIn = .Range("A1").CurrentRegion.Value
End With

ReDim vOut(1 To UBound(vIn, 1) * UBound(vIn, 2), 1 To 3)

Set oDic = CreateObject("Scripting.Dictionary")

With oDic
    For i = 1 To UBound(vIn, 1)
        For j = 1 To UBound(vIn, 2) - 1 Step 2
            If Not .Exists(vIn(i, j)) Then
                n = n + 1
                vOut(n, 1) = vIn(i, j)
                vOut(n, 2) = Left(vIn(i, j + 1), 5)
                vOut(n, 3) = 1
                .Add vIn(i, j), n
            Else
                vOut(.Item(vIn(i, j)), 3) = vOut(.Item(vIn(i, j)), 3) + 1
            End If
        Next j
    Next i
End With

p = WorksheetFunction.RoundUp(n / nCol, 0)

With Sheets("Total")
    .Activate
    .UsedRange.Clear
    With .Range("A1")
        .Resize(, 3).Value = Array("Number", "Type", "Count")
        For r = 1 To nCol
            ReDim vOut2(1 To p, 1 To 3)
            For i = (r - 1) * p + 1 To (r - 1) * p + p
                k = k + 1
                For j = 1 To 3
                    vOut2(k, j) = vOut(i, j)
                Next j
            Next i
            .Offset(, (r - 1) * 4).Resize(, 3).Value = Array("Number", "Type", "Count")
            .Offset(1, (r - 1) * 4).Resize(p, 3) = vOut2
            k = 0
        Next r
    End With
End With

End Sub

Open in new window

0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
StephenJR

I am impressed on how efficient and fast processing is the Scripting.Dictionary technique. I never heared of it until Cartillo mentioned it and looked around for info but could hardly find something good.

kindly suggest something where I can find more info on this.
Tks/Rgds
gowflow
0
 
LVL 24

Expert Comment

by:StephenJR
Comment Utility
gowflow,
You can start at this very site, with Patrick's excellent article:

http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/A_3391-Using-the-Dictionary-Class-in-VBA.html?sfQueryTermInfo=1+30+dictionari

I'm not sure how much of the speed increase is due to the Dictionary and how much is due to transferring data from the sheet to an array, processing the array and then writing back to the sheet. The latter also speeds things up considerably.
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
Definitivley array handling is the key issue. I guess dictionnary get's its advantage when you need to sort as understand it has the key capability that accesses the item rightaway when you need it. Tks your link and we never cease to learn !
gowflow
0
 

Author Comment

by:Cartillo
Comment Utility
Hi Gowflow,

Thanks for highlighting the "573510" issue which has several Type data, I'm keep on overlooked on this matter.

Hi  StephenJR,

Apology for this mistake, is that any chance for us to get this data as well? Hope you will assist.
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
Cartillo,

By the way the second version of the code I proposed in the previous question dealt with this issue.

StephenJR
Idea is to loop thru Numbers and also loop within the Number found for same occurence of the first 5 digits of the type to which you apply the total. So double loop

Rgds/gowflow
0
 

Author Comment

by:Cartillo
Comment Utility
Hi gowflow,

Yes, you're right but if we use "Scripting.Dictionary" together that will speedup the process extremely.
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
Cartillo,
I totally agree I checked the routine displayed by StephenJR and indeed its amazingly faster. Am looking at the thread that StephenJR displayed about scritiingDictionnary and its indeed a good method.

Rgds/gowflow
0
 
LVL 29

Assisted Solution

by:gowflow
gowflow earned 200 total points
Comment Utility
Cartillo,

Do not want to interveen in StephenJR's routine, but if you need to have the output broken down by Type to handle the case of multiple types for one same number ie 573510 then do the following in StephenJR's code.

Replace the routine starting with
==========
With oDic
...
End With
=============

by the following routine:
=================
With oDic
    For i = 1 To UBound(vIn, 1)
        For j = 1 To UBound(vIn, 2) - 1 Step 2
            If Not .Exists(vIn(i, j)) And Not .Exists(Left(vIn(i, j + 1), 5)) Then
                n = n + 1
                vOut(n, 1) = vIn(i, j)
                vOut(n, 2) = Left(vIn(i, j + 1), 5)
                vOut(n, 3) = 1
                .Add Left(vIn(i, j + 1), 5), n
            Else
                vOut(.Item(Left(vIn(i, j + 1), 5)), 3) = vOut(.Item(Left(vIn(i, j + 1), 5)), 3) + 1
            End If
        Next j
    Next i
End With
=================

It should give you the expect result.
gowflow
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
By the way I tested this code with over 60000 row on 18 sets of columns and its amazing how fast it run it finishes in less than 1minute !!!

StepehJR your a genius in finding this process !!!
Rgds/gowflow
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
Cartillo,

Also a minor change in the For i loop start it at 2 instead of 1 as 1 would be calculating the header which is useless !

also in your original post in Validation sheet you have 2 empty columns for every set of 3 columns so to have the data aligned you should replace
.Offset(, (r - 1) * 4).Resize(, 3).Value = Array("Number", "Type", "Count")
.Offset(1, (r - 1) * 4).Resize(p, 3) = vOut2

By this
.Offset(, (r - 1) * 5).Resize(, 3).Value = Array("Number", "Type", "Count")
.Offset(1, (r - 1) * 5).Resize(p, 3) = vOut2

Rgds/gowflow

gowflow
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
Cartillo,
Any news ? Is your issue solved ?
gowflow
0
 

Author Closing Comment

by:Cartillo
Comment Utility
Hi,

Thanks a lot for the great help
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Drop Down List with Unique/Distinct Values (enhancing the Combo-Box with a few steps and a little code) David miller (dlmille) Intro Have you ever created a data validation list from a database field or spreadsheet column (e.g., Zip Codes or Co…
Introduction This Article is a follow-up to my Mappit! Addin Article (http://www.experts-exchange.com/A_2613.html), it was inspired by an email posting I made to EUSPRIG (http://www.eusprig.org/index.htm), I will briefly cover: 1) An overvie…
Viewers will learn the basics of slicers and timelines for both PivotTables and standard Excel tables in Excel 2013.
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!

Get 1:1 Help Now