Solved

Excel VBA filtering

Posted on 2013-06-26
20
250 Views
Last Modified: 2013-07-24
Hi everyone,

    I have Excel worksheets that have 14 columns (and can have over 100,000 rows). I usually have 5-10 of these worksheets in a single workbook, so I would like to have a module where I can copy and paste in the name of the worksheet that I am working with.
    The details of what I would like to do with each worksheet are below.

  First, Column 7 (G) has to be checked for the presence of 'OK'. This column will be either 'OK', 'FAIL', 'NOTEST', or 'HIDATA' (no single quotes). Only OK is acceptable, all rows that have other than 'OK' can be deleted. (One exception; row 1, the header row for the worksheet, has in this column the single word 'status' in it, and I would like to keep the header row.  

   Next, Column 8(H) needs to be checked for single zero's (by that I mean a lone 0, not a 0 as in 10 or 101 etc.). If there is a lone 0, then it needs to be changed to a 1. Column 9 (I) then needs to checked. If it is non-zero, then the log to base 2 ratio of column 9/8 (I/H) needs to be calculated in column 10. If there is a lone zero in column 9, then it should be changed to a 1 and the log base 2 calculation done in column 10.

 Then, the same has to be done for Column 9 (I). That is, check for a lone zero. If present, change to a 1, check column 8 (H) for non-zero, then calculate log to base 2 ratio of columns 9/8 (I/H) in Column 10 (J). Again, if after changing to a 1 in column 9 and there is a lone zero in column 8, then that has to be changed to a 1 and the calculation done in column 10.

   When this is finished, adjust all of column 10 to a single decimal place, such as 50.1, 0.1 etc.

   Columns 8 and 9 then have to be checked again. This time for any row in which the value in both columns 8 and 9 are less than or equal to 3, the entire row is to be deleted.

   Column 10 then needs to be checked. For any row, that has a value in column 10 of less than 1, the entire row is to be deleted.

Finally, but not necessary if this is too difficult to do, the whole sheet (excluding the header) beginning in column 1 row 2 is to be sorted from high value to low value on column 10.

   I have attached an Excel workbook with 2 worksheets. The fist worksheet (before) is an example, although real sheet could have over 100,000 rows, and the second worksheet (after) is what the before sheet would look like after having done the above.

  Thank you in advance for your efforts.
EE-example-before-after-workshee.xlsm
0
Comment
Question by:vbaabv
  • 9
  • 8
20 Comments
 
LVL 26

Expert Comment

by:MacroShadow
Comment Utility
I think you should break this up into several questions.

This will answer your first question.

Sub DeleteNonOK()
    Last = Cells(Rows.Count, "G").End(xlUp).Row
    For i = Last To 2 Step -1
        If (Cells(i, "G").Value) <> "OK" Then
            Cells(i, "G").EntireRow.Delete
        End If
    Next i
End Sub

Open in new window

0
 

Author Comment

by:vbaabv
Comment Utility
That is a good start. It filters the status column and will get rid of quite a few rows.
0
 
LVL 68

Expert Comment

by:Qlemo
Comment Utility
Please note that usually, the Experts help in learning how to do something, or to get around an error, bug, or change unwanted behaviour, or simplify the logic, but in the end you should be able to code that yourself - with help, if needed. Learning how to do things is the main point.

Starting with changing the logic: Checking H and I can be done in one go - we do not need to replace zero by one for checking if the content is below 3, and there is no other calculation done in those columns. Those rows get deleted, so we should do that as early as possible. And since we already are going thru the rows and cells, let's perform the calculations and other checks at the same time:
Option Explicit
Sub ProcessSheet(sheet As Worksheet)
  Dim CurRow As Range
  Dim Last As Integer
  Last = sheet.Cells(Rows.Count, "G").End(xlUp).Row
  Set CurRow = sheet.[2:2]
  While CurRow.Cells(, 1) <> ""
    With CurRow
      Set CurRow = CurRow.Offset(1)
      If .Cells(, "G").Value <> "OK" _
      Or (.Cells(, "H") < 3 And .Cells(, "I") < 3) Then
        .Delete
      Else
        If .Cells(, "H") = 0 Then .Cells(, "H") = 1
        If .Cells(, "I") = 0 Then .Cells(, "I") = 1
        .Cells(, "J") = Round(Log(.Cells(, "I") / .Cells(, "H")) / Log(2), 1)
        If .Cells(, "J") < 1# Then .Delete
      End If
    End With
  Wend
  
  With sheet.Sort
    .SortFields.Clear
    .SortFields.Add Key:=Range("J:J"), _
      SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
    .SetRange Range("A:N")
    .Header = xlYes
    .MatchCase = False
    .Orientation = xlTopToBottom
    .Apply
  End With
End Sub

Sub Process()
  ProcessSheet Worksheets("before")
End Sub

Open in new window

This code should be located in the main module (ThisWorkbook), and the last sub enlists the Worksheets to process.
Please note that the "after" does not fit with your description - log2 values below 1 are still in it.
0
 

Author Comment

by:vbaabv
Comment Utility
Hi Qlemo,

     Thank you very much for your code. When I tried it out I get a Compile error that there are the wrong number of arguments or invalid property assignment, and Sub Proscess () gets highlighted yellow. Inbetween the parentheses after ProcessSheet Worksheets, I have added "NLP7", "NLP205A", "NAC049", "NLP7-NLP205A" because these are the names of my worksheets.
0
 
LVL 68

Expert Comment

by:Qlemo
Comment Utility
You'll need to write one line per sheet:
Sub Process()
  ProcessSheet Worksheets("NLP7")
  ProcessSheet Worksheets("NLP205A")
  ProcessSheet Worksheets("NAC049")
  ProcessSheet Worksheets("NLP7-NLP205A")
End Sub

Open in new window

0
 

Author Comment

by:vbaabv
Comment Utility
Thank you, howerver, after cutting and pasting you code, I get an 'overflow' error.
0
 
LVL 68

Expert Comment

by:Qlemo
Comment Utility
In the VBA editor? Or when processing one of the sheets?
0
 

Author Comment

by:vbaabv
Comment Utility
I cut and pasted the latest Sub Process() in the place of the earlier Sub Process():
Sub Process()
  ProcessSheet Worksheets("before")
End Sub

And ran in the VBA editor, by clicking the green arrow. I then chose ThisWorkbook.Process and clicked Run, and it comes up with a window stating 'Overflow'.
0
 
LVL 68

Expert Comment

by:Qlemo
Comment Utility
One of the sheets' values, probably - I reckon the log function gets "invalid" data.
To check, you can either put your cursor on the first ProcessSheet call, and press F9 - that sets a breakpoint. Do the same for the other calls. Then (while in the sub Process with your cursor) press F5. Execution should now halt before each worksheet is processed - press F5 again to continue.
Or put only one ProcessSheet line into the sub, until you get the error when executing.

That way you should have located the failing worksheet. You should then check the column values of H and I for the "ok" rows (until a non-"ok" one - this one has not been processed, the bug has to be above that row).
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 

Author Comment

by:vbaabv
Comment Utility
I keep getting: Run-time error '6': Overflow, even if I use only 1 sheet.
0
 
LVL 68

Expert Comment

by:Qlemo
Comment Utility
If you look at the sheet, has it been processed to a certain extend? For test, you can fill column J with something obvious (e,g, 1234), and the run the Process sub again. Processed rows will change the value in "J".
0
 

Author Comment

by:vbaabv
Comment Utility
It does not appear that there has been any processing. There are the same number of rows (33,260) as in original and when I sort for H, I can see many 0's in columns H and I.
0
 
LVL 68

Expert Comment

by:Qlemo
Comment Utility
But it works with the example sheet you provided? If yes, I have no better idea than to set a breakpoint (with F9) on line 5 (last = ...), call the Process sub with a single sheet, and then step thru with F8 until you get the error. (I do not assume you want to share the original workbook for testing myself.)
0
 

Author Comment

by:vbaabv
Comment Utility
I set the breakpoint and stepped through with F8 and I get the Overflow error when it tries going through line 5.

I have looked over the code and it all looks fine to me. The whole sub is very concise and logical.  I really like the way it has been written.
 
I have attached the original workbook. Unless you know what the treatment conditions are, they data is useless, so not really a problem with attaching it.
0
 
LVL 68

Expert Comment

by:Qlemo
Comment Utility
Ooops, even without having the original workbook I see now - you have more than 32767 rows, and I've used an Integer for the rowcount, which can only hold numbers up to 32767. If we replace the corresponding DIM, it should be fine. Didn't know that restriction still exists in Excel VBA ... So line 4 should be:
  Dim Last As Long

Open in new window

0
 

Author Comment

by:vbaabv
Comment Utility
Hi Qlemo,

  This is fantastic !  Just one small problem. In column J, you have selected for everything 1 or greater. However, what I want is a little bit trickier. I would like everything that is 1 and greater and also everything that is -1 and lesser. So, in other words, eliminate values between 1 and -1, (not including 1 and -1).  

   I think this can be done by changing the following line:
If .Cells(, "J") < 1# Then .Delete  
to :
If .Cells(, "J") < 1# AND .Cells(, "J") > -1 Then .Delete
?
0
 
LVL 68

Accepted Solution

by:
Qlemo earned 500 total points
Comment Utility
Correct. The # after a number means "decimal" (or floating point) value. It shouldn't be necessary, though.
0
 

Author Closing Comment

by:vbaabv
Comment Utility
Great job, Qlemo !

 This is fantastic !  I just had to first check to see how results from the sub compared to my manual processing and there were a few discrepancies in one of the worksheets. There were 7 extra rows from the Sub process than in my manual processing, and I found out that I had for some reason deleted those 7 rows when I should not have. So, your sub produced more accurate results. This sub will be extremely helpful and save a lot of time.  Thanks again, great job coding !

Matthew
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Workbook link problems after copying tabs to a new workbook? David Miller (dlmille) Intro Have you either copied sheets to a new workbook, and after having saved and opened that workbook, you find that there are links back to the original sou…
This article is the result of a quest to better understand Task Scheduler 2.0 and all the newer objects available in vbscript in this version over  the limited options we had scripting in Task Scheduler 1.0.  As I started my journey of knowledge I f…
The viewer will learn how to use a discrete random variable to simulate the return on an investment over a period of years, create a Monte Carlo simulation using the discrete random variable, and create a graph to represent the possible returns over…
This Micro Tutorial will demonstrate in Microsoft Excel how to add style and sexy appeal to horizontal bar charts.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now