- Experts Exchange Approved
Introduction
This article introduces the reader to the concept of a table (or query) of numbers, commonly called a Tally Table. I detail different methods to create and populate a Tally Table in Access and show the new table used as part of a problem solution.
For years I've used an OneRow table as a part of my Access solutions, eliminating many duplicate values in a very efficient manner. In the past couple of years, I have seen a similar special table in SQL Server and MySQL articles. I have now participated in an EE Access question whose solution lent itself to a Tally Table.
Definition: A Tally Table is a record source (database table or query) that is a contiguous sequence of integer values in consecutive rows. This is sometimes called a number table.
Why use a Tally Table: There are many situations where you need a sequence of values. While it is easy to write loops in VBA and TSQL, there are no looping statements in Access SQL syntax. While a Tally Table is a sequence of integer values, such values can be transformed into other Access/VBA data types (character, floating point, date, etc.)
If you are interested in using a Tally Table for a non-Access database, start by reading these articles.
SQL Server - http://www.experts-exchang
SQL Server - http://www.sqlservercentra
MySQL - http://www.experts-exchang
SQLite - http://www.experts-exchang
_________________________
Creating the Tally Table
Since an Access Select statement requires a From clause, we can't use the TSQL method of creating the number sequence strictly within the stored procedure, with no physical storage in a table:
Once we have some minimal sequence in a table, we can use table joining 'magic' (matrix product growth) in a query to produce a much larger sequence of numbers. I will show you a trick to create a query-only source of numbers, but I do a little cheating with my From clause.
Note: In the following code examples, I am starting the numbering sequence after highest current number, using 1 if the table is empty. This is an efficiency consideration -- appending new numbers rather than trying to repopulate the entire table. Alternatively, this code can be changed to delete all the rows before populating the (now) empty table.
_________________________
Create the TallyTable with code
_________________________
Create the TallyTable with SQL
Note: You can also rewrite the first 'populating query' below, converting them into a Make Table query.
_________________________
Populating the Tally Table with code
_________________________
Populating the Tally Table with SQL
When using queries to generate a number sequence, there still must be a table with one or more rows. To keep this example simple, I'm populating my source table with ten numbers (1-10). You can use a non-empty system table if you don't already have a non-empty table.
Note: The following SQL will not run, as shown, in Access97 since it doesn't support the From (Select...) form of SQL. Access97 developers can work around that limitation by saving the Unioned Select statements as a separate query and then include the name of that query in your From clause.
I'm using the MSysObjects table in this example.
QBTallyTableBase10 query:
However, there is a performance penalty for generating all the numbers from such 'query magic'. I created a TallyTable2 with the same ten numbers and compare the performance of 1K and 1M number retrievals below.
Populate an already defined TallyTable2 from the magic query.
Note: You can also create and populate the TallyTable2 with a Make-Table form of the query:
Make-Table Example 1 -- Single query
Make-Table Example 2 -- Using QBTallyTableBase10
Now that we have a small set of (ten) numbers, we use the multiplicative property of table joins to produce much larger sets.
QueryBasedTallyTable_BIG
QueryBasedTallyTable -- 1K numbers using 10-row table
QueryBasedTallyTable_BIG_
QueryBasedTallyTable_AllQ
Note: You could have populated the TallyTable table with the numbers produced by these queries. We will leave our TallyTable2 table with only 10 rows. It is up to you to decide whether you want to use the code, define and populate queries, or a make-table query to populate your full-sized TallyTable.
As I mentioned earlier, there is a performance penalty when calculating all the numbers in memory on-the-fly. Since later usage examples only use the queries based on TallyTable2, you ought to understand why I didn't use the all-query number source.
QueryBasedTallyTable performance comparisons
_________________________
Date ranges problem - TallyTable Example 1
This article got its genesis from this recent question about looping in MS-Access SQL.
http://www.experts-exchang
The solution needs to produce pairs of dates for reporting periods. The first column is the start of each logical week in the period and will be one of the following:
- Starting date
- Monday
- First day of the month
The second column is the end of each logical week in the period and will be one of the following:
- Ending date
- Sunday
- Last day of the month
An example of the desired data is posted here:
http://www.experts-exchang
During my preparation for this article, I did some performance tuning to the queries, so the following (optimized) queries are different than those posted in the question. There are two configurations, one based on the TallyTable table and the other based on the QueryBasedTallyTable query.
Notes:
- All the queries have two parameters ([start date] and [end date])
- The use of the Distinct keyword is used for performance. Back in the 90s, I discovered that ordered queries act as if they were indexed. This may be caused by caching or some other query optimization 'magic'. If you find that your queries don't execute as quickly as you'd like, try adding Distinct keyword, Order By clause, Or Group By clause.
- If you need to update the resulting query results, do not use the Distinct or Group By performance-enhancing trick. Their use results in a non-updatable record source.
Now that we have our TallyTable and its query equivalent, QueryBasedTallyTable, we can construct our date sources for the two columns.
JustDatesFromTallyTable
Now that we have our numbers, we need a source of dates for our columns.
ReportingPeriods_Fast
Now that we have a source of dates, we have a solution to the problem.
JustDatesFromQBTallyTable
Let's not forget the query-based Tally Table we created earlier. We need to produce dates from these numbers as well.
ReportingPeriodsQB_Fast
Now we can produce the two-column query for the question, based on the query dates.
How is the performance?
The table-versus-query performance comes down to a space/time trade-off. The TallyTable takes up space in your database, but the numbers are available immediately. The query requires a very small (ten row) table, but the numbers must be calculated when used. The test involved opening a recordset from a parameterized query and moving to the last record. This was repeated 10 times. Different configurations of date ranges and recordset types were timed. The performance test was run at least three times. In addition, for two of the tests I added an index to the table and a primary key index to the table. Although the indexes improved performance, we doubled the size of the TallyTable. The time measurement is ticks. These times represent a 10 iteration execution.
This is the test of the date range posted in the question (2/22/2011 - 7/9/2011).
There is no index on TallyTable used in this test.
I thought it would be worth comparing a larger date range (2/22/2011 - 8/15/2012), which is 3.6 times the test above. This is good practice in both capacity and performance testing. Since my tests indicated that the performance degradation was non-linear, I decided to evaluate the influence of indexes.
_________________________
Simulating Production Data Volume - TallyTable Example 2
It is helpful to stress test your database designs and applications using large sets of data. When you add your TallyTable (or query) to existing data sources in your queries you can easily multiply its row count. If you had a 1,000 row table, but new that the production table would be 10 million rows, you could add a TallyTable (or query) to your test-table query and limit the TallyTable to 10,000 rows.
_________________________
Parameterized Top N - TallyTable Example 3
If you have tried to create a Select query with a Top N keyword and ever wished you could parameterize it, a TallyTable might be the solution.
This can be implemented a few different ways, depending on the source and nature of the data.
- Join your TallyTable with an ordinalcount column and limit the TallyTable rows with a parameter.
- Add (select count(*) ) as an ordinalcount column with direct parameter comparison
- Add DCount() as an ordinalcount column with direct parameter comparison
In order to present a reasonable test, I had to make some assumptions about the data.
- Either there is no autonumber field or the table has some deleted rows. This is the reason why I didn't test an inner join between the TallyTable and my performance test table. Yes, it is bad design/practice to have a table without an autonumber field.
- If you have an autonumber field on a read-only table (no deletes), you could parameterize the Top N by comparing on this field
- If you have an autonumber field on a table with deleted rows, you should use that field in your ordinal count calculations, since long integer comparisons are much faster than text field comparisons (used in this test).
- I cast my ordinal count column as a long integer data type for faster comparison
- I explicitly defined my parameter as a long integer data type for faster comparison.
- The data might be a query instead of a table
Note: Beware of lousy performance due to RBAR (row-by-agonizing-row) operations. This is most pronounced in the Word Count example queries below.
There is a very clever DistinctDCount() alternative -- not shown or tested in this section, since I only care about row counts.
http://www.experts-exchang
SQL Highlights:
To get reasonably good performance from my queries, I made sure that the query parameters were explicitly typed.
In the performance figures, the Select refers to a column defined like this:
In the performance figures, the DCount refers to a column defined like this.. Note that I have also added a cast to the column data as a long integer. It was surprising that this made such a performance difference.
ParameterizedTop_N performance tests - no index on the TallyTable
In case you were curious, the fastest result was with a primary key index on the TallyTable
_________________________
Word/character counts - TallyTable Example 4
There are many times when I need to know how many delimited words there are in a text field. Usually, I have written a small user-defined function that does the string inspection, since the VBA environment has looping and parsing functions (Split and Replace). It can also be done using a TallyTable.
I tested two different queries to find the delimiter positions. One used the MID() function and the other used the InStr() function.
DelimiterPositions
DelimiterPositions_Instr
Word count
Once you know the delimiter positions, you can use a Group By query to get the count of words. Since I added delimiters to the front and back of the text data in my DelimiterPositions data, I had to subtract one from the Count() value.
Parsed/Split text data
Since I've already calculated the delimiter positions, It is an easy step to parse the text.
Note: Although I could have done this parsing by joining the DelimiterPositions_Instr query to itself, it was not nearly as good a performer as the ParsedData query above.
VBA Function Splitting
If you find that you need a more sophisticated word count or splitting process, you should read these articles:
http://www.experts-exchang
http://www.experts-exchang
If this article has been helpful, please click the Yes link below.
by: aikimark on 2011-05-23 at 03:33:11ID: 27629
I caught an error on the first of the delimiter position queries. The length parameter of the Mid() function was a literal. It should have been equal to the length of the [Delim] parameter. I also simplified the query, although it still does not perform as well as the InStr version of the query.
DelimiterPositions
Select allOpen in new window