<

Calculating Distinct Counts in Access

Published on
81,206 Points
63,006 Views
22 Endorsements
Last Modified:
Awarded
Editor's Choice
Community Pick
by Patrick G. Matthews

1. Introduction


All major SQL dialects allow for a Count aggregate function, and Access's native Jet SQL is no exception.  As in other database platforms, Access's Count() (as well as the "domain aggregate" DCount()) function returns the number of records pertaining to a grouped result set and meeting the criteria specified in the query (in the case of Count()) or in DCount() in the third argument of the function.

Consider these example data:

Category  Num
-------------
x         1
x         1
x         2
x         3
y         1
y         1
z         1
z         1
z         2

Open in new window


In this case, it is very simple to get a count of items for each Category:

SELECT Category, Count(*) AS Items
FROM SomeTable
GROUP BY Category

returns:
Category  Num
-------------
x         4
y         2
z         3

Open in new window


Suppose, however, that instead of getting the count of each item for a given category, you instead wanted to know the count of distinct items for each category.  SQL Server's Transact SQL and some other SQL dialects support the Count(Distinct ColumnName) syntax:

SELECT Category, Count(Distinct Num) AS DistinctItems
FROM SomeTable
GROUP BY Category

returns:
Category  Num
-------------
x         3
y         1
z         2

Open in new window


However, Access's Jet SQL does not support this syntax.  This article will demonstrate three techniques for computing distinct counts.  The first approach uses two queries for each distinct count to be performed.  The second uses sub-queries to accomplish the same thing in one SQL statement.  Lastly, the third uses Visual Basic for Applications (VBA) code to create a user-defined function, thus simplifying the operation further but at the cost of some execution speed.

2. Distinct Counts Using 'Two-Step' Queries


One way to generate distinct counts in Access is to utilize what I refer to as a "two-step" process:
First you create a "Step1" query that contains all of the distinct rows defined by the columns that you want to count and the columns you wish to group on
Then you create a "Step2" query that performs the aggregations using the "Step1" query as its source

Using the simple example from Section 1, to determine the count of distinct values in the Category column using the two-step process, we would first create a "Step1" query such as:

SELECT Category
FROM SomeTable
GROUP BY Category

Open in new window


If we save that as "Step1", we would then get out final result by creating a query that uses Step1 as its source:

SELECT Count(*) AS DistinctCategories
FROM Step1

Open in new window


Using the same example, if we wanted to know the count of distinct Num values for each Category, we could use this SQL for Step1:

SELECT Category, Num
FROM SomeTable
GROUP BY Category, Num

Open in new window


The SQL for Step2 then becomes:

SELECT Category, Count(Num) AS DistinctNum
FROM Step1
GROUP BY Category

Open in new window


In the attached sample file (see Section 5), you will see four sample queries that demonstrate this technique:
Approach1_AllAccounts_Step1 and Approach1_AllAccounts_Step2, distinct count of Account values
Approach1_ProjectsByAcct_Step1 and Approach1_ProjectsByAcct_Step2, distinct count of projects for each Account
Approach1_TasksAndUsersByProject_Step1 and Approach1_TasksAndUsersByProject_Step2, distinct count of task/user combinations for each project
Approach1_UsersByAcctAndProject_Step1 and Approach1_UsersByAcctAndProject_Step2, distinct count of users for each account/project combination

3. Distinct Counts Using Sub-queries


In addition to the "two-step" process described above, you can generate distinct counts from a single SQL statement by using sub-queries.  The sub-query defines a derived table that has only distinct rows in it, and then the main query performs the aggregation on that derived table.  That derived table should include in its SELECT clause all of the columns for which you want a distinct count, as well as all of the columns that you wish to group on.

Indeed, that is exactly what Access is doing implicitly in the "two-step" approach outlined in Section 2: when Access runs the "Step2" query, it is in effect treating the corresponding "Step1" query as a derived table.

Using the simple example from Section 1, suppose we needed to determine the count of distinct values in the Category column.  To determine that using a sub-query:

SELECT Count(z.Category) AS DistinctItems
FROM
    (SELECT Category
    FROM SomeTable
    GROUP BY Category) AS z

Open in new window


Using the same example, if we wanted to know the count of distinct Num values for each Category, we could use:

SELECT z.Category, Count(z.Num)
FROM
    (SELECT s.Category, s.Num
    FROM SomeTable s
    GROUP BY s.Category, s.Num) AS z
GROUP BY z.Category

Open in new window


Please note that if you enter SQL similar to the above into Access's query designer, after saving the query Access will adjust the SQL statement so that they will look more like this:

SELECT Count(z.Category) AS DistinctItems
FROM [SELECT Category FROM SomeTable GROUP BY Category]. AS z

SELECT z.Category, Count(z.Num)
FROM [SELECT s.Category, s.Num FROM SomeTable s GROUP BY s.Category, s.Num]. AS z
GROUP BY z.Category

Open in new window


Note: In the SQL examples using the sub-query approach, you may have noticed that I used an alias ("z") for the derived table returned by the sub-query.  While aliases are usually optional and are thus used mainly to improve your SQL's readability, in this case Jet SQL actually does require the alias.  Jet will auto-create an alias if needed, but it can only handle one; therefore, it is always a best practice to declare your own alias when using sub-queries so you have full knowledge and control.

In the attached sample file (see Section 5), you will see four sample queries that demonstrate this technique:
Approach2_AllAccounts, distinct count of Account values
Approach2_ProjectsByAcct, distinct count of projects for each Account
Approach2_TasksAndUsersByProject, distinct count of task/user combinations for each project
Approach2_UsersByAcctAndProject, distinct count of users for each account/project combination

4. DCountDistinct Function


While you can generate distinct counts using "two-step" queries (Section 2) or sub-queries (Section 3), you can also accomplish this by using VBA to create a user-defined function.  Thus, I have created the DCountDistinct() function.  This function provides similar functionality to the native DCount() domain aggregate function, but instead of counting all rows from the specified result set, DCountDistinct returns the number of distinct entries meeting the specified criteria.

The source code for DCountDistinct is as follows:

Function DCountDistinct(CountCols As String, Tbl As String, Optional Criteria As String = "")
    
    ' Function by Patrick G. Matthews
    
    ' Requires reference to Microsoft DAO library
    
    ' This function provides "domain aggregate"-type functionality similar to
    ' COUNT(DISTINCT ColumnName) in Transact-SQL (i.e., SQL Server's dialect).  The function
    ' can be used in queries, forms, and reports in the Access UI and/or in VBA code once added
    ' to a regular VBA module
    
    ' CountColumns is a comma-delimited list of columns for which we are counting the distinct
    '   values (or combinations of values if 2+ columns are specified).  Place field names in
    '   square brackets if they do not meet the customary rules for naming DB objects
    ' Tbl is the table/query the data are pulled from.  Place table name in square brackets
    '   if they do not meet the customary rules for naming DB objects
    ' Criteria (optional) are the criteria to be applied in the grouping.  Be sure to use And
    '   or Or as needed to build the right logic, and to encase text values in single quotes
    '   and dates in #
    
    ' Just like DCount, if there are no items found matching the criteria, DCountDistinct
    ' returns zero
    
    Dim rs As DAO.Recordset
    Dim SQL As String
    
    ' Enable error handler
    
    On Error GoTo ErrHandler
    
    ' Build query that counts the number of distinct items (or combinations of items, if
    ' CountCols specifies 2+ columns) that meet the specified criteria.  If no criteria are
    ' specified, then count goes against entire table
    
    SQL = "SELECT Count(1) AS Result " & _
        "FROM (" & _
            "SELECT DISTINCT " & CountCols & " " & _
            "FROM " & Tbl & " " & _
            IIf(Criteria <> "", "WHERE " & Criteria, "") & _
        ")"
        
    ' Open recordset with result
    
    Set rs = CurrentDb.OpenRecordset(SQL)
    
    ' Set return value
    
    DCountDistinct = rs!Result
    
    rs.Close
    
    GoTo Cleanup
    
ErrHandler:
    DCountDistinct = CVErr(Err.Number)
    
Cleanup:
    Set rs = Nothing
    
End Function

Open in new window


DCountDistinct takes the following arguments:
CountCols indicates the columns for which you want to count distinct values.  To pass more than one column, use a comma to delimit the column names.  You must place the column names in square brackets if the column names do not meet the usual requirements for database object names
Tbl is the name of the table or query from which you are counting distinct items
Criteria is an optional argument for specifying criteria to apply to the results.  I recommend placing all column names in square brackets.  Be sure to use single-quotes as text qualifiers and # as date qualifiers.  Use logical And and/or Or to pass multiple conditions

To use this function in your project, simply add the code to a regular VBA module, and make sure to set a reference to the latest version of Microsoft Data Access Objects (DAO) installed on your computer.  Please note that using DCountDistinct() will probably be slower than using the "two-step" or sub-query approaches explored in Sections 2 and 3, but in small to medium data sets you are unlikely to notice the difference, and the convenience the function offers may outweigh the increased runtime.

Using the simple example from Section 1, we can determine the count of distinct values in the Category column using DCountDistinct():

SELECT DCountDistinct("Category", "SomeTable") AS DistinctItems

Open in new window


Using the same example, if we wanted to know the count of distinct Num values for each Category, we could use:

SELECT Category, DCountDistinct("Num", "SomeTable", "[Category] = '" & Category & "'")
FROM SomeTable
GROUP BY Category

Open in new window


In the attached sample file (see Section 5), you will see four sample queries that demonstrate this technique:
Approach3_AllAccounts, distinct count of Account values
Approach3_ProjectsByAcct, distinct count of projects for each Account
Approach3_TasksAndUsersByProject, distinct count of task/user combinations for each project
Approach3_UsersByAcctAndProject, distinct count of users for each account/project combination

5. Sample File


DCountDistinct.mdb

The attached sample file is a Microsoft Access 2003-format database containing a single table, "Sample", along with several queries demonstrating the count distinct approaches described above.  The file's VBA Project also contains the source code for the DCountDistinct function.


=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you liked this article and want to see more from this author,  please click here.

If you found this article helpful, please click the Yes button near the:

      Was this article helpful?

label that is just below and to the right of this text.   Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
22
Comment
3 Comments
LVL 61

Expert Comment

by:Kevin Cross
Patrick:

Very nice article!

Seeing a number of these questions in the SQL topic areas, I am certain this will come in handy for those new to count(distinct) and possibly MS Access.  As someone who was familiar with this previously, I was still thrilled to get a nice new function in the DCountDistinct() which gives a nice alternative in Jet SQL to sub-queries or multiple-step processes.  As with the other domain functions, it seems well worth the minor overhead of adding VBA to the mix. :)

Thank you very much!
Voted a big YES above.

Respectfully yours,

Kevin
0
LVL 19

Expert Comment

by:david251
Great Article!  Thanks

-David251
0
LVL 1

Expert Comment

by:csehz
Patrick thanks for this article very much, your sample database was the biggest help to me to apply in my work

0

Featured Post

Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

Join & Write a Comment

In Microsoft Access, learn how to use Dlookup and other domain aggregate functions and one method of specifying a string value within a string. Specify the first argument, which is the expression to be returned: Specify the second argument, which …
With Microsoft Access, learn how to specify relationships between tables and set various options on the relationship. Add the tables: Create the relationship: Decide if you’re going to set referential integrity: Decide if you want cascade upda…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month