Link to home
Start Free TrialLog in
Avatar of WeThotUWasAToad
WeThotUWasAToad

asked on

Modify Excel AGGREGATE() formula to include dates

Hello,

This post is a follow-up to two previous threads located here:

Extract certain items from one list to create another list in Excel (closed 2017-10-28)

Understanding a few parts of an array-entered formula in Excel (closed 2017-11-21)

Briefly, the solution to those threads consists of an Excel formula which selectively extracts certain items from a source list and displays the extracted items in a new list while maintaining the correct order of the extracted items. Following is the solution formula and a screenshot from the attached file:

=IFERROR(INDEX(B$1:B$13,AGGREGATE(15,6,ROW(C$4:C$13)/(C$4:C$13=F$2),ROWS($4:4))),"")

Open in new window

User generated image
Now I'm wondering how to modify the above formula to allow for a date column in the source list (which may or may not be in order chronologically) and then display the results in a list which is ordered by date as shown here (eg Abigail was listed first in the previous result lists but is now listed last because her associated date is later than the others):

User generated image
As shown in the next screenshot, the attached file contains solutions from the two previous threads (AutoFilter 1 and AGGREGATE() 1) and accompanying notes (which are not directly related to the current thread) as well as a new sheet tab, DisplayByDate from which the previous screenshot was taken.

User generated image
Thanks

AutoFilter_array-formula-evaluation.xlsm
Avatar of byundt
byundt
Flag of United States of America image

If you want to return the names where column D matches the value in cell H2 in order by the dates in column B, the trick is to first find the date, then use those dates to get the corresponding names.

Regular formula for date (does not need to be array-entered):
=IFERROR(AGGREGATE(15,6,B$4:B$13/(D$4:D$13=H$2),ROWS($4:4)),"")

Regular formula using that date to return the corresponding name (does not need to be array-entered):
=IF(G4="","",LOOKUP(2,1/((B$4:B$13=G4)*(D$4:D$13=H$2)),C$4:C$13))

The AGGREGATE formula shown above tries to find the nth earliest date where the column D is satisfied. You can break the formula apart similar to the one previously posted.

The LOOKUP formula works differently. LOOKUP has a number of interesting properties:
1. Contrary to the documentation, you do not need to sort the second parameter in ascending order.
2. LOOKUP ignores all data that is a different type from the first parameter. In other words, it will ignore error values.
3. If the search array is not sorted, LOOKUP returns the last match for data of the same data type as the first parameter.
4. If a third parameter is used, LOOKUP returns the corresponding value from the range or array of values in the third parameter as the match in the second parameter.

Putting it together, the LOOKUP formula searches for 2, a value it will never find because the Boolean expression returns an array of only 1 or error values. The error values correspond to a failure to match either the date or the column D value. The value of 1 (should be unique) occurs when both the date and column D value are satisfied.

Note: the LOOKUP formula will return error values if there are ties in the dates--only the last corresponding name will be returned.
AutoFilter_array-formula-evaluation.xlsm
The workaround for ties in the dates forces me back to the INDEX/AGGREGATE construction:
=IF(G4="","",INDEX(C$4:C$13,AGGREGATE(15,6,ROW(C$4:C$13)/((D$4:D$13=H$2)*(B$4:B$13=G4)),COUNTIF(G$4:G4,G4))-ROW(C$4)+1))

This formula is very similar to the one previously proposed in question 2. The difference is the COUNTIF, which determines whether you want the first, second, third, etc. instance of a given date.
AutoFilter_array-formula-evaluation.xlsm
Avatar of WeThotUWasAToad
WeThotUWasAToad

ASKER

Brad,

Thanks for your solutions and explanations. It's fun (most of the time) to be on the steep part of a learning curve which is how I would describe my experience with these recent threads.

I've been going through the details of your latest attached file and I confess, I am struggling mightily on this one.

I'm following the same process as with previous formulas (ie trying to break them down into their smallest components and then reassemble them to see what's happening). Also, since this go-round has some new functions — not new to me necessarily but new to this topic — I created some additional tabs (see attachment):

[DisplayByDate_from Brad] — renamed the DisplayByDate tab from your most recent attachment
[DisplayByDate_Date] — to evaluate the date formula in column G
[DisplayByDate_LOOKUP()] — to evaluate the ...LOOKUP()... formula in column H
[DisplayByDate_COUNTIF()] — to evaluate the ...INDEX()...COUNTIF()... formula in column P

I feel like I understand the [DisplayByDate_Date] formula pretty well but [DisplayByDate_LOOKUP()] is giving me fits. Right now, I'm just trying to understand the first factor of the denominator in the quotient acting as the lookup_vector argument for the LOOKUP() function.

For example, how do the 7th & 10th terms in the array resulting from B$4:B$13=G4, ie:

        {FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;TRUE}

evaluate as TRUE since B$4:B$13 has a non-zero value in all 10 rows but column G has a value of zero in all but the first three rows?

Also, can you elaborate and/or give a simple example of the following paragraph — not so much the 1st sentence which I think I understand, but the 2nd & 3rd sentences:
Putting it together, the LOOKUP formula searches for 2, a value it will never find because the Boolean expression returns an array of only 1 or error values. The error values correspond to a failure to match either the date or the column D value. The value of 1 (should be unique) occurs when both the date and column D value are satisfied.
That idea of searching for a value you know you will never find is an odd concept (at least for me to get my head around). I understand that the Boolean expression can never return anything other than 1 or zero, but why does it matter what it returns if you are never going to come up with a 2 anyway?

By the way, this is quite a coincidence but if you look at the numbering in column A down around row 50 or so in the tab [AGGREGATE() 1] in the file I attached to my OP in this thread — so prior to your posts using LOOKUP() — you will find another LOOKUP(2,...) formula which you gave me several years ago to scan up to find the first value you come to. I've never understood how it works but it's a great formula! :P (hehe) (Maybe I should dig into that as my simpler example.)

I've got some other detailed questions but before posting them, here is a much broader question I'm wondering about:

Is there a better way to approach a formula like this than beginning with the smallest items in the deepest parts of the nest and then working your way out from there (which is the method I typically use to attack a formula)?

Thanks,

Steve

Note that I've renamed this latest iteration of the file:

AutoFilter_ListByDate.xlsm
ASKER CERTIFIED SOLUTION
Avatar of byundt
byundt
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Your workbook asked why INDEX uses a range beginning with row 1 while the other parts of the formula start with row 4?
=IFERROR(INDEX(B$1:B$13,AGGREGATE(15,6,ROW(C$4:C$13)/(C$4:C$13=F$2),ROWS($4:4))),"")

Open in new window

If you look at the third parameter in the AGGREGATE, it is returning row numbers that satisfy the criterion C$4:C$13=F$2. AGGREGATE then returns the n smallest such row numbers as you copy the formula down, where n is determined by ROWS($4:4).

Since AGGREGATE is returning a row number, I need the range in INDEX to begin with row 1.

I could have avoided that issue (and made the formula a little longer) by converting the row number to an index number. You do that by subtracting 3 for a range starting in row 4. If I put a 3 in the formula and the Asker's real workbook started on a different row, the Asker would occasionally complain that the formula didn't work. I avoid that problem with the following formula using -ROW(C$4)+1 to subtract 3 from the results of the AGGREGATE.
=IFERROR(INDEX(B$4:B$13,AGGREGATE(15,6,ROW(C$4:C$13)/(C$4:C$13=F$2),ROWS($4:4))-ROW(C$4)+1),"")

Open in new window

For example, how do the 7th & 10th terms in the array resulting from B$4:B$13=G4, ie:

        {FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;TRUE}
You should note that (B$4:B$13=G4) is an array expression. It compares each cell in the range B4:B13 to cell G4, returning TRUE if they match and FALSE otherwise. Each element in the array is the result of comparing one cell from column B to cell G4. The seventh and tenth elements of that array evaluate as TRUE because the dates in B10 and B13 both equal G4.
Brad, sincere apologies for taking so long to comment. Between the holidays and some health issues, I have not yet spent the time needed to post a well-thought-through response. Steve
I'm still working on this.
This thread is still open.
Sorry for the extended absence. I have been dealing with some health issues.
No problem. I'm still here if you need more help.
Brad, sincerest apologies for leaving this open so long. It's by far the most involved Excel topic I've ever tried to tackle and after struggling to get my head around it for a while I just began neglecting it. I feel bad closing it because you obviously spent a lot of time explaining things but perhaps I can revisit it at some point. Thanks.