Solved

PROBLEM WITH SORTING LARGE QUANTITY OF DATA

Posted on 2011-09-21
66
291 Views
Last Modified: 2012-05-12
I have an Excel 2010 file which was created by dmilie to sort data I have in Excel.  It worked great when I had a certain amoiunt of data but now that I have much more it is not putting the border around approximately half the blocks and I think it's miscounting.  I was wondering if sokeone could help me with this.

I have attached a sample file, sorry it's qutie big but it's one of the smallest one's I have.  It has not been sorted yet, I thought you might want to try it for yourself.  I press CTRL I to initiate the count and CTRL Y to sort the columns in numerical order and highlight the yellow blocks.

Thank you,

Blueyes727.
EXAMPLE.xlsm
0
Comment
Question by:blueyes727
  • 26
  • 19
  • 15
  • +1
66 Comments
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
BlueEyes727 - I'm looking at this.

Which sheet is giving you the problem?  Sheet1, test, or Sheet3?  I take it its probably Sheet3, but just checking.

Dave
0
 

Author Comment

by:blueyes727
Comment Utility
Yes it is sheet 3.

It worked fine but once the data started getting larger I started having these problems.

Thank you,

Blueyes727
0
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
I just wanted to point out that none of your key macros are in the Example.xlsm file.  It might be hard for other folks to help without setting up the question (in spite of the fact there's a solution in the related question) - just thought I'd share that for your future questions...

I loaded Sheet3 from your Example.xlsm file into the latest file from the related question, "COUNT-2-r7.xlsm" to create a test case...

I ran the "Initialize" macro (CTRL-i) to generate counts (needed for sorting).  It took 2.72 minutes on my machine.

I then ran "SortYellow" macro (CTRL-y) to sort the output.  It took 40.43 minutes on my machine...

You're going to have to tell me where in Sheet3 you're not getting proper blocks, as it appears to be correct to me - I spent only 5 minutes going through it, but looked at all 15000 or so rows of data...

PS - if the file I'm attaching looks right, then ensure you've run the steps (I believe you probably have) above to generate the sorted output.  Then, it could be potentially due to memory leaks.  I think the code is pretty tight, so you might start this up with a fresh reboot.

If you run on yours and get different results (when this file looks right), please upload your results - but also ZIP the file to make it smaller...  17 MB is pretty large to be passing back and forth.

I'll ask moderator for alternatives...

See attached,

Dave
COUNT-2-r7.zip
0
 

Author Comment

by:blueyes727
Comment Utility
Thank you, I'll do that and let you know.

Blueyes727
0
 

Author Comment

by:blueyes727
Comment Utility
I'll zip big files from now on.

Thank you,

Blueyes727
0
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
BlueEyes727 - you can login here, with your account (only need to login once, then you get a cookie to autologin from exchange :)

Here, you can upload large files, or whatever - note the question reference, etc...

http://www.ee-stuff.com


Cheers,

Dave
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
disclaimer: I only have Excel2003, so I'm only seeing a partial view of the data.

I see some #REF and #VALUE items in the HOLDFORMULASHT worksheet cells.  I don't know if that is a result of the 2007->2003 conversion or might be a clue to the sorting problem.

@blueyes727
Do you need to see all the combinations or do you need to know the combination that has the lowest sort (key) values?

Is there some threshold beyond which the sort (key) values would allow the process to stop early?

========
@dlmille

It doesn't matter for this problem, but vbBlack = 0.  So you are using a valid color as the missing/not-supplied value for your parameter.

You've done a magnificent job with these questions.  This wasn't an easy problem to solve.
0
 

Author Comment

by:blueyes727
Comment Utility
I'm still testing the file to make sure it's working ok seems to be working perfectly but I just want to make absolutely sure.

The 40 minutes it takes to sort is really slowing me down but that is only because of all the work you have done to help me speed up the process so if I have to live with it so be it..  I'm way ahead of where I was when I furst started.

Thank you,

Blueyes727
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@blueyes727

did you see my questions?
0
 

Author Comment

by:blueyes727
Comment Utility
Yes, sorry I was not sure if the dlmille had answered you because I have no clue what he said in that post.  :)

I do not think it's all the combinations, it's only for example, if I had A, B, C, D, E, I would need ABC, ABD, ABE, BCD, BCE, CDE.

I really do not have any problems with the combinations, the problem is after the combinations have been generated I must get a total of the white and yellow blocks then sort all those and add a border around the columns which are all the same color.  That's the problem I'm having now.  When I did not have too many combinations it was not a problem but now that |I have much more it's taking a very lomg time to sort those combinations and I thought it was missing some colored blocks and miscounting but I'm testing the file I was sent and it seems to be working ok.  It's still taking a very long time to sort but I might have to live with that, II really do not know it depends of what you guy's tell me..
 
Thank you,

Blueyes727
0
 

Author Comment

by:blueyes727
Comment Utility
Forgot to mention I have a seperate file to generate the combinations.

Blueyes727
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@blueyes727

I know that you might need to enumerate all the possible combinations.  I wanted to know if you needed all those possible combinations in your worksheet.  Once you've done your sorting, do you need to see any of the combinations other than the lowest?

What Dave has done is exactly what you asked for.  I'm looking at the request and thinking "Why?!?  What could you possibly want with 15000+ rows?"  I'm trying to ascertain your actual information extraction problem and get around your implementation and tool decisions.  Too many times, members will ask "How do I get X to work" questions when their best solutions would result from a "How do/should I solve this X problem" question.  The later question gives experts the context to use the most appropriate algorithm, tool(s), data structures, and code.
0
 

Author Comment

by:blueyes727
Comment Utility
Now I get it and you're absolutely correct.  I do not need the combinations which have no values (i.e. the yellow count is zero) and that is a lot of them.

Can't see the forest for the trees.  Go figure.

Thank you,

Blueyes727
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
What do you mean by "yellow count"?  (sets where there is at least one all-yellow column or something else?) -- same question for "white count"

The way Dave's code is presenting the data, every combination has a non-zero yellow count.

Even removing "zero yellow count" sets, there is still a lot of combination sets to look at.  Please describe your problem and its context for me.  I'm sure the letters mean something to you as well as the cell color.  I don't know if there is some correlation of these two data.

Hypothetically, your problem might be stated as "I need to know the dates when all the combination cells are yellow."  This is a very different problem description, since you don't really need to see all the columns' worth of combinations, just the pertinent dates associated with each combination.

Also, you might know that when the "white count" or "yellow count" for a combination exceeds some threshold, it is not worth inclusion in the output set of data.
0
 

Author Comment

by:blueyes727
Comment Utility
Yes that is what I mean, sets where there is at least one all-yellow column.

Sets which do not have one all-yellow column are not needed.  Forget the all-white columns at this point.  If a set does not have at least one all-yellow column I do not need it.

Blueyes727
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@Blueyes727

Where does the yellow color originate?
Is there a relationship between the dates and the yellow color?
What do you do with these combinations once they are generated, filtered, and sorted?  Thousands of sets is too much for the human mind to retain or process.  This can't (shouldn't) be the final process.  What's next?

Don't be too brief with your responses.  I'm trying to catch up with this problem.
0
 

Author Comment

by:blueyes727
Comment Utility
I keep all the files and based on information I need in the future I will have to go back to them for reference.  I do not use all the data at once but need all of it for future reference as I do not know which sets I will need in the future.

If the sets which have no all-yellow columns can be removed all the better, I do not need those.  If they cannot then I will live with it.

Blueyes727
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@Blueyes727

Take a look at this workbook.  I extracted four combination sets from Dave's workbook.  This gives the same information that you had without the need to look at data you do not need.  Of course, I expect that each workbook will still retain the source cells that are used to create the combinations.

In addition to using yellow cell color, I also included the count (4 in this case).  This gives you a quick navigation to each "all yellow" column with Ctrl+rightarrow or Ctrl+leftarrow keyboard combination.

Also, notice that there is a Sortkey column that is computed from the allyellowcount * 1000 + allwhitecount.
SortProblemCounterExample.xls
0
 

Author Comment

by:blueyes727
Comment Utility
I'm not sure what happened but all I see in the file is the following in rows 2, 3, 4 and 5:

A;E;F;O
C;D;Q;R
A;K;L;P
B;K;P;T

I do not see anything else in the other worksheets.

Blueyes727
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
there was only data in Sheet1.  If you are on one of these (data) rows, you should be able to navigate to the next allyellow column with Ctrl+rightarrow or Ctrl+leftarrow.

Did you scroll to the right on sheet1?
0
 

Author Comment

by:blueyes727
Comment Utility
This will not do at all sorry.  I do not even know what I am looking at.  We are deviating way off course with this.

It's ok, do not worry about it, I will work with what I have.

Thank you,

Blueyes727
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@Blueyes727

>>I do not even know what I am looking at.
In column A, you are looking at the concatenated values of the letter codes.

In columns B through EK, you are looking at the allyellowcount cells.  Since the original letter and color data still exists, you could recreate the combination once you located the exception.  I'm only showing you a very small snippet of the sorted combination data.

Column EL is the allyellowcount for the row

Column EM is the allwhitecount for the row

Column EN is a calculated sort value for the row

>>We are deviating way off course with this.
I'm trying to discuss your actual data and information needs.  I thought that the example workbook would help us.  Please continue this discussion.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Depending on your system, this whole process could only require 1-2 minutes, not 40 minutes.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Also, note that the data in my SortProblemCounterExample workbook can be filtered on any of the columns, allowing you to 'play' with the results.
0
 

Author Comment

by:blueyes727
Comment Utility
I know you are trying to help me but I really do not know what I am looking at, it makes no sense to me.  I do not know if it's because you saved this in a format other than Excel 2010 but this is surely not what |I need.

All I need is for dimilie to take the last file he sent me and set it to delete the blocks which have a yellow total of zero before it does the sorting.  You might be wondering what I am talking about but if I where talking to him he would know.

I know for sure you are tyring to help me and I am very grateful but this is not what I need and I do not even know how we got here.  This is way over my head, I know you are trying to help  me but I really, realy don't have a clue where you're going with this and I really do not have the time to learn any of this.  If I did I would be trying to do this myself.  That is why I have a subsciption to this site, so I can get what I need done and learn this when I have the time and time is something I do not have much of these day's.

I really do not know how else to explain this.  I don't know what you're tying to do, I don't know what I'm looking at, I don't know what I'm supposed to see.   All I see is the number 4  on the right here and there and a bunch of numbers at the end.  I don't need numbers I need the cells with the colors.

Maybe dmilie does not wish to help me anymore and that is ok, just let me know and we will call it a day.

Thank you,

Blueyes727
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
the 4 is there to make navigation faster and easier.  You can't use the keyboard shortcut to stop on a cell.  If you notice, these 4 cells have a yellow color to help make them stand out visually.  

If you scroll down in Dave's workbook to the place where the yellow counts change from 3 to 4, you will see the AEFO through BKPT combination rows that I used to create what you are seeing -- the last two yellowcount=3 sets and the first two yellowcount=4 sets.

I'm not trying to circumvent Dave's work in getting you a solution.  He is very close.  While awaiting a solution from Dave, why not continue our discussion?
0
 

Author Comment

by:blueyes727
Comment Utility
I wish I could, I really do but you do not seem to realize I do not understand a word of what you are explaining to me.  It's like we are speaking 2 different languages.  I try to do what you are saying,   I press CTRL rightarrow or CTRLleftarrow and see nothing except for 4's here and there and sometimes I'l lsee numbers at the end which do not make any sense to me.  I do no tknow what to do with this file and I'm trying to figure this out while I'm working.

It's very frustrating for me because I know you are trying to help and I can't wrap my head around this.  Seriously.

Blueyes727
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Each of these four rows is the same as a set of combined data rows.  Where you see four yellow cells in your worksheet with a border, you will find a single yellow cell with the number 4.
0
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
BrightEyes727 -  I'm here and monitoring.  I see the post telling me what you'd like to eliminate and I also have a few ideas.  No worries...  I'm working on it.

There are some days I'm not as active on E-E as others, and that's what I'm going through right now.  But this particular puzzle has been on my mind, as has aikimark's probing...

In the meantime, aikimark is trying to understand what you're trying to accomplish, so he can help as well.

Cheers,

Dave
0
 

Author Comment

by:blueyes727
Comment Utility
I just need the blocks with a yellow total of "zero" to be deleted before the sort so once the sort is done I will have blocks of 1 or more yellow columns.

That is all I need, nothing more.

Thank you,

Blueyes727
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@Blueyes727

>>I just need...That is all I need, nothing more.

It is acceptable to say you don't understand why you need this and understandable if you have been asked by someone else to do this.  Dave and I want to solve this problem.  I will help Dave and will gladly work with you on this and future questions.  However, if you don't participate in the resolution of your own problem the experts will 'learn' to avoid questions.

There are very many times when I enter an active question thread and have a perspective that leads to a much better solution than those being pursued.
0
 

Author Comment

by:blueyes727
Comment Utility
I don't understand, that really is all I need.

When you and I first started talking about this you made me realize that and I said that was all I needed.

I'm just re-stating what I said in my previous post.

blueyes727
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 41

Expert Comment

by:dlmille
Comment Utility
blueeyes727 - I'm on the verge of a breakthru, re: timing.  I'll also eliminate zero-yellow valued blocks from the output.

My solution will generate a separate worksheet for the output, in a tab called "Output".  Is that OK?

Cheers,

Dave
0
 

Author Comment

by:blueyes727
Comment Utility
yes.

b;lueys727
0
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
Do you need an option to initialize, then sort yellow - or do you do analysis between those?

Dave
0
 

Author Comment

by:blueyes727
Comment Utility
I'm not sure what you mean.  In the last file you gave me "COUNT-2-r7" (which worked great) I had 3 options.  I would press CTRL I to initialize, that would do the count, then CTRL Y to sort the yellow blocks or CTRL W for the white blocks.  That works fine for me.
0
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
Just was wondering if you needed a ctrl sequence that would do Initialize THEN do the Yellow Blocks right after...

I'm almost done - running a few tests...

Dave
0
 

Author Comment

by:blueyes727
Comment Utility
ok, thank you.

Blueyes727
0
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
It turns out Excel is not all that great at sorting.  I've had something bugging me since the start of this, but dismissed it (and it came back when I found out how long this processing was taking - its important to me that you run many of these a day and waiting 40 minutes on a run can be frustrating, even if its "saving" you time).

As a result, I reverted back and tested sorting once again in VBA - it only takes a few seconds.  The next problem was the need to generate to another tab.  All prior work I'd done had been working "in place".  That was the paradigm - how to sort "in place".  The first time I sorted, I sorted the ranges "in place", with the Excel sort, it was also a sort "in place".  While faster, its obviously not fast enough.

As I broke down the problem into its fundamental elements,

1)  "Initialize" takes 2-3 minutes on the 15K rows - now, that's counting colors and its an area I'm going to be exploring, next - how to speed that up.
2)  Sorting via Excel (and probably my VBA 'in place' methods, was taking the majority of the 40 minutes.  Well, by restructuring (literally) the array I was working with, and leveraging a quicksort, that takes about 1/2 a second.
3)  I looked at the blocking code and timed it - it also takes a chunk of time - and I'll be looking at how to speed that up - it currently takes about 6 minutes.
4) Now that I stepped "outside the box" - allowing myself to output to a different place, I can generate the output tab results in about 10 seconds.

So, give me a bit more to look at the Initialize and BOXing step.  No promises, here, but I have a few ideas I want to try...

Solution forthcoming, soon!

Dave
0
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
>>Excel is not all that great at sorting

let me rephrase that.  The way I was using Excel to sort was not all that great...

:)
0
 

Author Comment

by:blueyes727
Comment Utility
Thank you so much, that is great.  Looking forward to trying it.

Blueyes727
0
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
Here's what I have right now, if you want to try it out.  Its NOT the final solution :)

Dave
COUNT-2-r9.xlsm
0
 

Author Comment

by:blueyes727
Comment Utility
I'll try it out for sure.

Thank you,

Blueyes727
0
 
LVL 41

Accepted Solution

by:
dlmille earned 500 total points
Comment Utility
@aikimark - if you're still monitoring - I'd appreciate if you'd do a few tests on this.  I ran it through the ringer, but just trying to put it "to bed"...
-----------------------------------------------------------------------------------------
@blueeyes727

I'm very curious at this point.  If you can share, please let us know what this process is all about.  What are you using the data for?

Its been fun - and pushed my thought processes a bit on how to "skin this cat".  Unfortunately, there's no faster way to count colors (I have some ideas/tricks to try, but need to deliver this and move on - if I come up with something, I'll certainly advise at some point in the future!).  

WHAT HAS BEEN DONE:

1) We've greatly improved the time it takes to Initialize (worst file = 4 minutes probably about the same) sort and generate / block output (worst file = 40 mins down to about 1.5 minutes - significant!)
2) We don't touch the original tab that you're Initializing - so all your single/double row data or extra lines won't be affected.  The output will be all blocks >=3 rows high, and will have a nice 3 blank row separation between each.  Also, no blocks with yellow count = 0 are generated in the output.
3) There's a status bar notification so you'll know what's happening (see lower left corner at the bottom of excel, if your status bar is active, you'll see progress notifications).
3) We now have a CommandBar menu (more on that) to run each of the options you've asked for in the past questions.

Hopefully, you can get a lot more work done with lower blood pressure, in the days coming!

HOW TO RUN THE MACROS - the Addins Command Bar:

When you open the file, you'll find 4 of the test file (tabs) in the workbook.  You should also notice you have an ADDINS-MENU on the Ribbon.  
 Getting to the Add-ins CommandBar Area
Click Addins and you should see the "CountColor!" menu.  From there, you have the option to Initialize, Sort (will be based on Yellow), Block Yellow, Block White, and Block Both Yellow and White...
 Running CountColor! Commands from the CommandBar
A COUPLE POINTS ON HOW THE COMMANDS WORK:

If you try to Sort before you Initialize, Excel will go ahead and initialize first. (if it finds the summary column header on the far right, it will assume its initialized!).  If, for some reason, you change the data on the far right, or delete some of the data on the far right summary columns, there could be an error.

The command blockYellow and Sort command are the same, if you are on the source sheet.  If you are on the output sheet, blockYellow will not sort first, as its already sorted.  Again, the output sheet will never be re-sorted (e.g., if you hit BlockWhite, you get that without sorting.  But if you do BlockWhite from the source sheet you will get sorting by WHITE then blocked by WHITE.  Another example, if you hit BlockWhite on the source sheet, it will sort with White as the dominant factor, then on the output sheet, if you select BlockYellow, it will just highlight yellow blocks, not resort.

If you try to Sort on the Output tab, you'll be alerted to do it on a source tab (where the original data is).

You can do ANY command from a source tab, but on the Output tab, you can block Yellow, block White, and Block Both Yellow and White (it will not resort).


Please see attached file.

blueeyes727 - Please feel free to advise any issues - even after this post is closed.

Cheers,

Dave
finalCount.zip
0
 

Author Comment

by:blueyes727
Comment Utility
Amazing.  I'll let you know how it went later on today.

Thank you so much.

Blueyes727.
0
 

Author Comment

by:blueyes727
Comment Utility
Amazing, utterly amazing.  I'm still testing it but wow, amazing.

Thank you so much.  You have no idea how much time and grief you are saving me.

I'll get back to you later on today.  Just thought I would let you know how it's been going up until now.

Blueyes727

0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@Dave

I run Excel 2003, so I can't test the code.  With the paucity of informational feedback from the OP, I stepped back from the problem.

However, I'll share my thoughts with you for what I expect will be another follow-up question on this (almost a Columbo experience..."Just one more thing...").

* Convert the source rows into an integer array or collection of vectors.  I tried using a byte data type but the direct data transfer to Excel doesn't like bytes.
My Fast Data Push article: http://www.experts-exchange.com/A_2253.html

* The color of a cell will translate into a numeric value = 2^x (I used 1 for white and 2 for yellow in my test).  I only access the .Interior.Color property once for the source cells.

* You don't really need to count -- AND the (current combination) array elements in your iteration loop.  If they are the same, you will get a non-zero result (all 1s=1, all 2s=2).

* The resulting vector from your iteration loop will tell you how many of each value you have.  With [0,1,2] possible values, I created a (0 to 2) array for my count array and used the numeric values in my vector to index into the array.

* I've already described the sort value formula in a prior comment

========
The workbook I posted reflects the vector.  I used a 4 since that was the group size.  It isn't actually a sum of anything.  I do have to set the cell color property of the allyellow cells, but that isn't any more expensive than setting the border operation in your code.
0
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
@aikimark - a 2003 version as FYI only...

Dave
finalCount.xls
0
 

Author Comment

by:blueyes727
Comment Utility
Hello,

Well I have worked with it all day, I've had no problems, it works perfectly.  I can't thank you enough for this.  I'm sure you have no idea how much you are helping people here.  Let me say, you are helping in a bit way.  This is good.

Thank you,

Blueyes727
0
 

Author Comment

by:blueyes727
Comment Utility
I meant you are helping in a big way... : )

Blueyes727
0
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
I'm so glad :)

So - what is it you're using it for, if you don't mind sharing?

Dave
0
 

Author Comment

by:blueyes727
Comment Utility
Oh don't go there, everything goes in Excel over here.  Different files have different dtata, letters mean different things depending on the file they use.  It's a freaking nightmare.  They have to sort all this info just to see what's on one line.  I really do not get it.  I told them there are programs out there that will work a lot better but they just do not want to try.  One day they said, but for now it woudl be too much work and they do not have the time.  I think it's just an excuse, they're Excel freaks, I really think they have an Office DVD on an altar somewhere and they worship it every day...or at least a few times a week.  

They used to have someone who was really good with Excel and that person moved on an now they're scrambling because something happened with the file they had to count this.  Believe it or not it was on someone's computer and not on the server they said.  OMG!!!  I see stupid people they're everywhere!!!  I told them to get their IT guy's to check it had to be there somewhere, they said the guy's checked and it was not.  I still do not believe that but that's another story.

So now they have a file to sort and a file to count until the cow's come home.  I'm sure they will not need to sort 100,000 lines, they will not even come close to this but I just wanted to make sure if it ever happens they need to, some poor dude will not have to go through what I went through to get here.  I should of asked for this a long time ago.

I really can not thank you enough for this, you have saved me a lot of time and effort.

Thanks again,

Blueyes727
0
 

Author Closing Comment

by:blueyes727
Comment Utility
The people you have working over there are awesome.

Thank you for everything,

Blueyes727
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Thanks, Dave
0
 
LVL 10

Expert Comment

by:broro183
Comment Utility
hi everyone,

It's a long thread that is definitely worth the read with the challenging question and the thoughtful approaches.

Dave,
I whole-heartedly agree with Aikimark - "You've done a magnificent job with these questions.  This wasn't an easy problem to solve." :-)

I know the thread is over now but I'll throw my 2 cents in anyway...
- from back near the start of the thread - if you want to minimise the file sizes when using excel 2007, I recommend the binary format (ie .xlsb) because it can result in much smaller file sizes while still saving the macros.  For example the "COUNT-2-r9" file saved with the following file sizes for me:
.xlsm = 20.4meg, .xls = 62.7meg, or .xlsb = 11.2meg.
Ooopps, I take that back as I can't figure out how to upload the file type!
(I thought we just needed to change the file extension and zip them to ensure they will upload on E-E BUT even that doesn't seem to work.)

- The fewer "hits on the sheet" the faster the code will go, so I have used variant arrays (see the link I've added to Aikimark's article) and in some places the Array function. There are more parts of the code that code benefit from conversion to variant arrays which I didn't get to. (Perhaps not as fast as a byte array mentioned by Akimark which is not (yet?) functional, but we can see that the variant array will work)

- I've read at some time in the past (but I can't find the source now) that unioning ranges can get painfully slow when there are ~400 non-contiguous ranges so I've modified the boxYellowColumns Sub to union "NumOfBlocksToUnion" based on the public constant of that name, do the borders and then reset the unioned range. I'm using my laptop for doing other things at the same time as testing, so I can't give very accurate/consistent results but it does seem to be faster when some blocks (between 50-400) are unioned (see some times from my laptop on the Notes sheet of the attached file).
Feel free to test out different values for the  "NumOfBlocksToUnion".

- I tried to make the GenOutput faster by using a Union approach too, but couldn't quite get it to work. So I have commented out my attempt in case my error (probably relating to adding the 3 blank rows b/n each set) is obvious to someone else.

- I'm a fan of With statements so I have made a few other changes, without testing the performance impact, so hopefully, they don't slow the code down!

- Another thing that I didn't test for performance but I suspect should make a difference was moving the "Redim Preserve..." out of the loop so that it is only done once.

Please see the attached file for code that is slightly modified from Dave's version. My times are consistently slower than the ones mentioned earlier in the thread and I'm guessing that it is due to my laptop's lack of grunt (Celeron with Vista on SP2) &/or the fact that I was doing other things while testing. I think I have made the code slightly faster & would be interested to hear how the speed compares to the previous file on other people's machines. NB: I have only done a simple test on the right hand 3 "initialise" columns & haven't attempted any detailed validation. I know the problem is solved, but could someone please run a test & let me know the times from each of the different files?


Aikimark,
I realise your code is not a finished product but I am interested in seeing it, would you mind posting it with the warning that it is "incomplete"?
(I had originally hoped that creating an array of the interior.color's would be as easy as the one liner approach that we can use to assign a range's values or formulae to a variant array but it seems that's not the case :( .)

thanks
Rob finalCount-v18--.xls-.zip
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@Rob

When my pursuit of a 'summary line' solution met with resistance, I deleted the code.  I'll see if I can reconstitute it from memory.  The code didn't do any of the combination stuff.  It was a proof-of-concept on the bit-masking of the encoded color values.  Dave or Blueyes727 might be able to tell us how many non-yellow groups were omitted out of the 10626 possible combinations (24 | 4).  Do you still want to see the bit masking?
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@Rob

I suspect that the users would have been well served by a more compact view of the data with the ability to reconstitute any group on demand -- a detailed view of that group's rows.
0
 
LVL 10

Expert Comment

by:broro183
Comment Utility
hi Akimark,

I thought that the code may have been an off the cuff draft :-)

Yes please, if it's not too much trouble to redo, I would love to see the bit masking (see my signature). I agree a more compact view would probably be helpful - I zoomed the screen to 10% in attempt to see more information at once and try to identify any patterns & it just becomes an abstract picture!

Thanks
Rob
__________________
Rob Brockett
Always learning & the best way to learn is to experience...
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
This should allow you to play and step through the code.  In my original test, I assigned the intAndResult() array to the next available row, but that worksheet only had one test group comprised of RandBetween(1,2) values rather than the colors.  Once you create an integer array of bit (color) indicators, you would do the bitAND operations during your combination generation step.  This code simulates some of the pre-processing as it steps through the rngGroup cells, converting the cell color into an intColorMask bitmask value.

Option Explicit

Public Sub BitMaskTest()
    Dim rngGroup As Range
    Dim rngCell As Range
    Dim intColorMask() As Integer
    Dim intAndResult() As Integer
    Dim lngColorCount(0 To 2) As Long
    Dim lngRow As Long
    Dim lngCol As Long
    On Error Resume Next
    Set rngGroup = Application.InputBox(prompt:="Select Group Cells", Type:=8)
    If rngGroup Is Nothing Then
        Exit Sub
    End If
    ReDim intColorMask(1 To rngGroup.Rows.Count, 1 To rngGroup.Columns.Count)
    ReDim intAndResult(1 To rngGroup.Columns.Count)
    
    For lngRow = 1 To rngGroup.Rows.Count
        For lngCol = 1 To rngGroup.Columns.Count
            Select Case rngGroup.Cells(lngRow, lngCol).Interior.Color
                Case vbYellow
                    intColorMask(lngRow, lngCol) = 2
                Case vbWhite
                    intColorMask(lngRow, lngCol) = 1
                Case Else
            End Select
        Next
    Next
    For lngCol = 1 To UBound(intColorMask, 2)
        intAndResult(lngCol) = 3
        For lngRow = 1 To UBound(intColorMask, 1)
            intAndResult(lngCol) = intAndResult(lngCol) And intColorMask(lngRow, lngCol)
        Next
    Next
    For lngCol = 1 To UBound(intColorMask, 2)
        lngColorCount(intAndResult(lngCol)) = lngColorCount(intAndResult(lngCol)) + 1
    Next
    rngGroup.Rows(rngGroup.Rows.Count).Offset(1).Insert xlShiftDown
    Set rngCell = rngGroup.Rows(rngGroup.Rows.Count).Cells(1, 1).Offset(1, 0)
    For lngCol = 1 To UBound(intColorMask, 2)
        Select Case intAndResult(lngCol)
            Case 1  'all white
                rngCell.Offset(0, lngCol - 1).Interior.Color = vbWhite
            Case 2  'all yellow
                With rngCell.Offset(0, lngCol - 1)
                    .Interior.Color = vbYellow
                    .Value = 3
                End With
            Case Else   'reset to default cell interior color (= no color)
                rngCell.Offset(0, lngCol - 1).Interior.ColorIndex = xlColorIndexNone
        End Select
    Next
    Debug.Print "Mixed count: " & lngColorCount(0), "All White Count: " & lngColorCount(1), "All Yellow Count: " & lngColorCount(2)
End Sub

Open in new window


When run against the three groups in Dave's finalCount.xls workbook, SmallTest tab, the following should appear in the Immediate window.
Mixed count: 104            All White Count: 35         All Yellow Count: 1
Mixed count: 100            All White Count: 39         All Yellow Count: 1
Mixed count: 82             All White Count: 50         All Yellow Count: 8

Open in new window

These three groups would be assigned sort values of 1035, 1039, and 8050 respectively.

Note:  The 3 assigned to the all yellow column is the result of Dave's test sheet only have groups of three, rather than 4.
0
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
@Rob - thanks for putting this under the "microscope" - good comments!  I, too, fill the sheet data into a dynamic structure, so "hit the sheet" as you call it, minimally.  Counting and loading the structure at the same time could have saved time, but I saw diminishing returns on that (effort, versus improvements already made) - thus the BOXing routine could have been faster (having loaded in the colors, as opposed to testing against the sheet in the loop).

I found in reality that building the union cost alot of time, hence didn't go that route, either.  I don't use variant datatypes unless I need the flexibility: of the larger scale the datatype gives me, the data is unknown, etc.  Otherwise, I type the variables to the data to be used - using variant otherwise should not improve performance.

genOutput was very fast by my estimation, so didn't see the need to generate in any other fashion.  The array structure was sorted, so it was merely the act of copy/paste and given output formats were required, I took Excel to be expeditious enough to do that as opposed to reconstructing it.

The fact of the matter is that an "OK" operation was made inordinate due to significant #'s of rows - from a test of a few hundred to tens of thousands.  Reducing the processing time from 40 minutes to 1.5 seemed enough though I did try more iterations with different approaches (boredom or the geek in me, I guess, lol).  

I did try using With at key points, trying to see if that would improve performance, and unfortunately, it did not, in this case.  Probably because the With was usually reset at each loop iteration - it wasn't standing outside the loop as in other times when I'd used with to performance effect.  Without the performance improvement or in order to reduce the prefix addressing an object, I'm not a big fan of the With statement - just to use it at any opportunity...

While I doubted moving redim/preserve out of the loop would have made much difference, it is a relatively "costly" operation, and if one could "count ahead" to effectively dimension the array at the start, perhaps that could improve performance.  I tested this and the results were the exactly the same.

One thing I did as I worked on tackling the "time" issue, was to put time stamps on every routine, so I knew those that I needed to tackle and those that were marginal return, as a result there were areas perhaps you went that I didn't as I didn't see payout in that effort (e.g., such as genOutput).

Putting all your modifications to work on my machine, for an apples-to-apples comparison had these results:

Initialize:  1.98 minutes (119 seconds)*** vs original 3.1 minutes (186 seconds) - WOW - I've got to lookin to this, and good thing I did! See footnote.
SortYellow: .97 minutes (58 seconds) vs original 1.13 minutes (68 seconds)

*** Your approach to counting colors using SearchFormat technique appears to have paid off, improving initialize processing by roughly 33%.  Well done!  I will definately remember this one - great learning, Rob!  I like it alot!

Interestingly enough, when I just replace my cntColorCol() function with your new, improved, superb (!) cntColorv2, please find the following timing:

Initialize:  1.7 minutes (102) seconds versus (see above).

Here's one to think on (when I was trying to come up with a new BOXing routine, counting colors was on the "brain" - I noticed the autofilter is almost "instantaneous" at filtering to cells of a specified color - could this be somehow leveraged?

Rob - I always enjoy and welcome your input/critique of my programming techniques, though I don't laud them as anything special, lol.  You bring a refreshing perspective and a "perhaps this can help" additude that I greatly appreciate.  You make one think - I learned something significant, wrt color counting - and that's great!

ALL - I've incorporated Rob's cntColorv2 function which appears superior.  It improved the original finalCount.xlsm file to just under 2 minutes (as opposed to just over 3), and that could help - so blueyes727 - this is the file to use, for enhanced performance.

This one's called finalCount rRob.xls (version "Rob" :)

Regards,

Dave
finalCount-rRob.zip
0
 
LVL 10

Expert Comment

by:broro183
Comment Utility
hi Dave & Aikimark,

Thank you for your replies :-)

Work's "financial year end" is keeping me busy this week so I probably won't get to play with this any more for a week or two - by which time it'll be old news and the OP will probably have finished all his work with the time savings! ;-)

Aikimark,
Your code makes sense to me, I just need to sit down & go through the second half, around the ".insert" section, to understand it properly.

Dave,
ditto, I like reading yours & Aikimark's code too as it opens me to new ideas as well. PLUS, I find that it much easier to suggest the small "perhaps this can help" ideas than it is to do the hard yards that you've done by starting from scratch.

I'm curious and I wonder, just how feeble is my laptop...?
What is the total time for your computer take to run the "Block Both Yellow & White" option on the "Original Sheet3" (without pre-running the initialise code)  "as is" in the v18 file I uploaded?
(this is the option I used when testing - ie the times in rows 43, 45:46 of the Notes sheet. These times differ dramatically from the "original" file timed in row 44. However my machine may have been misbehaving!)


hmmm, yes I think it could be leveraged...
I like your idea about the autofilter, the more native functionality* we can use the greater the chance it can be made to hum along even quicker. *I hope the SearchFormat option works in 2003, I have a nagging feeling that it may be new in 2007.
How are these as possible structures / air code...


1)
Use the arrays created near the start of Aikimark's code to test the sum of the bitmasks for four sequential "rows" & then block (if appropriate), or add to a Range, based on the indices of the current array elements being processed.

2)
'turn off  app settings eg page breaks, calc's & display etc
'apply the filter
'populate a range variable with the specialcells(xlvisible) cells in the filtered column
'test that there aren't more than 8,192 separate areas (for 2007 & earlier)
for each smallBlock in Rng.areas
if smallBlock.rows.count >=4 then
'add the rng to a new variable or block it immediately or add it to a union ';-)
'that is populated with "x" discontiguous ranges & then flushed by blocking the ranges in one og, after which the union can occur again for another "x" times before blocking multiple ranges in one go.
'add to a counter (if needed)
end if
next small block
filter next column '& repeat ie loop back to the top
'reset app settings

All good fun, but I better wander off now,
'til next time...

Rob :-)
0
 
LVL 41

Expert Comment

by:dlmille
Comment Utility
Rob - timings (my last set of timings was run on a Quad CPU).  I just ran your code, "as-is" on my laptop - a Lenovo Thinkpad W500, Vista, Duo T9400 @ 2.53 Ghz processor, Excel 2007:

Initialize: 3.1 minutes (186 seconds)
Sort/Block Both White/Yellow w/ sort on Yellow: 3.2 minutes (192 seconds)

Cheers,

Dave
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I think the fastest approach is to create, filter, and sort the combinations in memory and then create the blueyes727-desired output from the in-memory data structures.
0
 
LVL 10

Expert Comment

by:broro183
Comment Utility
hi everyone,

It's probably not relevant to this thread now that it is solved etc, but in case someone wanders past when searching how to count colours - here's a link to a page on Bob Phillip's site that I've just come across which shows some examples: http://www.xldynamic.com/source/xld.ColourCounter.html (seems similar to some of the code in this thread).

Rob
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

What is a Form List Box? (skip if you know this) The forms List Box is the alternative to the ActiveX list box. If you are using excel 2007, you first make sure you have a developer tab (click the Orb)->"Excel Options"->Popular->"Show Developer tab…
Introduction This Article briefly covers methods of calculating the NPV and IRR variants in Excel as well as the limitations in calculating and interpreting IRR results. Paraphrasing Richard Shockley, author of my favourite finance reference tex…
The viewer will learn how to use the =DISCRINV command to create a discrete random variable, use this command to model a set of probabilities and outcomes in a Monte Carlo simulation, and learn how to find the standard deviation of a set of probabil…
This Micro Tutorial will demonstrate the scrolling table in Microsoft Excel using the INDEX function.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now