In database applications, it's often useful to retrieve a single item of information from a table. For example the full name of a company from its ID number, the current price of a product, or the number of visits of a patient. When this is needed in a query, several SQL techniques are available to obtain and use this information. In the interface or from Visual Basic for Access, domain lookup functions are used: DLookup, DSum, DCount, etc.
The present article shows that DLookup is versatile, but comparatively slow. If the area of application is restricted and focused on very simple table lookups, it becomes possible to write functions with much better performance. However that possibility is subject to the following constraints:
» The table is in a Jet (Access) database.
» The search doesn't involve wild-cards or any calculations.
» The field(s) in the criteria are (or should be) indexed.
If these conditions are met, a single item of information can be retrieved from a table up to 100 times faster than by using DLookup. This sort of speed opens up new possibilities, and offers solutions to problems typically considered too complex for Access.
Incidentally, the first sections show how plain DLookup can be implemented in stand alone Visual Basic or in VB.NET -- for that matter in any program using the Jet Engine as database manager -- when Access built-in functions are not available. The other domain functions can then be simulated easily as well. This idea is implemented in a companion article.
The help file provides: DLookup( expr, domain [, criteria> ] )
The three arguments are in fact three SQL clauses, and the same information is obtained by running the following query:
SELECT [Top 1] expr
The "Top 1" restriction is neither necessary nor actually included, but only implied since DLookup returns a single value. The expression argument can really be any legal definition of a field (without the aliased field name), and the criteria can contain subqueries. Both require square brackets when the field name is improper, unlike the domain.
The syntax for the other domain lookup functions is almost identical, they simply wrap the expression into one of the aggregate functions of SQL. They can in fact all be simulated by DLookup:
We are not going to writer faster versions for aggregate functions here; this article focuses on looking up single field values as quickly as possible. However, there are fast function similar to DSum(), DMax() and DMin() in the second demo, along with some others that are not available in Access -- and that might have been called DNext(), DPrev(), and DList() if they were.
Searching for faster lookup techniques
If you are looking for the fastest version only, this section can be skipped without harm; it describes in steps how speed is achieved by reducing versatility. The first attempt is quite obvious.
Function MyDLookup(Expr As String, Domain As String, Optional Criteria = Null) Dim strSQL As String strSQL _ = " SELECT " & Expr _ & " FROM " & Domain _ & " WHERE " + Criteria With CurrentDb.OpenRecordset(strSQL, dbOpenDynaset) If .RecordCount Then MyDLookup = .Fields(0) Else MyDLookup = Null End WithEnd Function
This almost emulates DLookup. The '+' concatenation operator ensures that the entire WHERE clause is nulled when no criteria is provided. The query is opened as recordset, and the first field, the expression is returned if and only if a record is retrieved.
Benchmarking this function shows that it's noticeably slower than the built-in version. There must be a problem. The function is tight and "by the book", but the experienced developer will quickly suspect the call to CurrentDb (on line 10).
In all current versions of Access since 97 at least, CurrentDb doesn't return the current database object. Instead it's a new object with all collections refreshed. I will forgo the discussion of reasons and consequences of this behaviour, but point out that it's slow and unnecessary, except -sometimes- when tables or queries are being created and deleted.
If we use an existing database object like DBEngine(0)(0), then the function will triple in speed and become almost 50% faster than DLookup.
With DBEngine(0)(0).OpenRecordset(strSQL, dbOpenDynaset) ' or, if an open database object exists: With MyDb.OpenRecordset(strSQL, dbOpenDynaset)
It seems odd that DLookup() is outperformed so easily. Moreover, this implementation provides some record source flexibility, because the MyDLookup domain argument can accept a multi-table Join clause, which DLookup rejects. [Note: this isn't really a great improvement, especially as the criteria argument of both functions accept sub-queries.]
It must be that DLookup performs some additional tasks to improve robustness rather than speed, like general error management and recovery: verifying that domain is a table or a query, adding square brackets if needed, refreshing both collections for a second attempt if domain was not found... It also runs in a different name space, but that's irrelevant at the moment.
By using the same technique as DLookup, we achieved an unexpected small improvement. Let's try another approach: opening a recordset and using the .FindFirst method. Let's also simplify the problem by looking for a field and not an expression.
Function MyDLookup(Field As String, Domain As String, Criteria As String) Dim rec As DAO.Recordset Set rec = DBEngine(0)(0).OpenRecordset(Domain, dbOpenDynaset) rec.FindFirst Criteria If Not rec.NoMatch Then MyDLookup = rec(Field)End Function
Benchmarking this function shows something quite different: the relative execution time rises very quickly with the table size being searched. It is slightly faster than DLookup on tiny tables (100 records), but dramatically slower on any serious data, even when the field in the criteria is indexed.
And that's our first lead. What happens when running a select query compared to using the "find first" method? Jet optimizes the query before even looking at the first record. The recordset's method, on the other hand, doesn't use an index even if it is available.
We can't close this exploration without trying parametric queries. This approach is logical for programmers switching from most other SQL engines: trying to emulate stored procedures. This method is in fact recommended for "pass-through" queries, not very useful for Jet databases, but let's try.
For the test to be valid, the query definition should be kept open between calls to the function, which means it cannot be generic. Instead, this function specialises in a single task.
Function CompanyName(Key) Static qdf As QueryDef Dim strSQL As String If qdf Is Nothing Then strSQL _ = " SELECT CompanyName" _ & " FROM Customers" _ & " WHERE CustomerID = ?" Set qdf = CurrentDb.CreateQueryDef("", strSQL) End If With qdf .Parameters(0) = Key With .OpenRecordset If .EOF _ Then CompanyNameQdf = Null _ Else CompanyNameQdf = .Fields(0) End With End WithEnd Function
When tested, the function performs much like the function (2), 'MyDLookup', without the versatility. Although pass-though queries can be very efficient in other circumstances, they do not seem useful for Jet databases, as could be expected.
Many of the functions presented up to here leave room for improvement, especially by using more static objects and by optimising for specific scenarios (looking in the same table most of the time, searching in a specific key order, calling the function from a program loop allowing very specific assumptions, etc.). If you use any of them, do spend some time optimising. In this article, however, we keep looking for yet another solution.
Despite some noticeable improvements, the functions above all play in the same leage. To find the highly specialised tool, performing one task only, but right, in the least amount of time, we need two things: lower level techniques and more information about the target.
We can guess that all lookup techniques that remain stable with regard to table size use an index. We know an index exists because we performed all lookups on the key field of the table. Let's formalise this restriction: we are now looking for a field in a table, and the criteria must be defined on an index. Let's also try to use the index explicitly.
An Index property exists for recordsets, correlated to the Seek method, but only for "table-type" recordsets. This means opening the table directly, without any interface layer. When one double-clicks "Products", one doesn't "open the table"; instead the query "TABLE Products" is opened, shorthand for "SELECT * FROM Products". The data displayed is in fact a Dynaset -- a dynamic set of records freshly read from the table. A Dynaset creates a number of illusions for the benefit of the user: each record is assigned a row number, stable until the recordset is refreshed; there is an actual row count; searching, filtering and sorting is possible on any field; etc.
Raw tables -- recordsets opened with the "open table" option -- do not even have a record count. You would need to create a loop to count them! The records are unsorted, unless you choose one of the indexes, making them appear to be ordered. If an index is selected, you can navigate the index (next, previous) and also seek a key value.
Function DSeek( _ Field As String, Table As String, Index As String, Key1, _ Optional Key2, Optional Key3, Optional Key4, Optional Key5 _ ) As Variant With DBEngine(0)(0)(Table).OpenRecordset(dbOpenTable) .Index = Index .Seek "=", Key1, Key2, Key3, Key4, Key5 If Not .NoMatch Then DSeek = .Fields(Field) End WithEnd Function
At this point it's good to read the help file on the Seek method.
Any index on a table can be used with Seek, and if the index has multiple fields, one value must be provided for each (at least when using the '=' operator). The table must be opened in "table mode", which is the default for the "open recordset" method of a "table definition".
The Seek method is only available in the "Microsoft Jet workspace", which might include some ODBC and ISAM databases (but not ODBCDirect). I have not tested external data sources in view of fast table lookups. If you choose to do so, it's probably easier to use the ADO library, which expose more properties and methods of external database drivers.
A peek intoADO
ODBC drivers might expose Indexes and implement the Seek method, and ADO will inform of their availability through the supports property. If Supports(adSeek+adIndex) returns true for a recordset opened as "table direct" with a server-side "key set" cursor, you can create a function similar to DSeek above.
Function DSeekADO( _ Field As String, Table As String, Index As String, _ ParamArray Keys() _ ) As Variant With New ADODB.Recordset .CursorLocation = adUseServer .Open Table, CurrentProject.Connection, _ CursorType:=adOpenKeyset, _ LockType:=adLockReadOnly, _ Options:=adCmdTableDirect .Index = Index .Seek Keys(), adSeekFirstEQ If Not .EOF Then DSeekADO = .Fields(Field) End WithEnd Function
[/step]The function doesn't in fact call any external driver; it simply uses the current project's connection, meaning the current Jet database. It does however demonstrates the ADO syntax of the Seek method, and provides a plausible candidate for benchmarking against the previous functions.
It turns out that the overhead is much higher, similar to the problem when using CurrentDb, so that this attempt is only marginally faster than DLookup (about 20%) when tested on a Jet database. Jumping ahead to the next section and function (6), it is possible to create dedicated lookup functions with ADO, with good results, but the performance will not match that of DAO. Only one example is included in the attached benchmarking demo file, roughly five times slower than the equivalent DAO function.
To put it simply, if you have a Jet database, use DAO. If you have an external database, ADO seems to deal better with ODBC drivers, but bear in mind that a better solution to our goal -- a fast lookup function, is likely to make use of stored procedures and similar techniques rather than raw index manipulation.
This opens an entirely new topic, but we have already wandered far enough from the intended scope of this article. I will drop ADO and dicussing external databases at this point, and return to fast lookup functions in a Jet database.
Dedicated Fast Lookup Functions
The best performance is achieved by imposing the most restrictions. We are now trying to obtain the value of a single field in a record, by matching a value to an indexed field of the table, and we need to remove any remaining overhead.
Our very first attempt used "CurrentDb.OpenRecordset". Using "CurrentDb" was a serious drag on performance, but so is "OpenRecordset", which needs to be eliminated as well.
Function CompanyName(Key) Static srec As DAO.Recordset If srec Is Nothing Then Set srec = DBEngine(0)(0)("Customers").OpenRecordset srec.Index = "PrimaryKey" End If With srec .Seek "=", Key If Not .NoMatch Then CompanyName = !CompanyName End WithEnd Function
The name of the function, CompanyName, is that of the field, !CompanyName -- notice the difference. I often recycle field names as fast lookup function names.
The first time the function is called, the static recordset is set to a "table-type" recordset of the table Companies (the "open recordset" method defaults to "open table" instead of "open dynaset"). This object will remain open between function calls -- that's the meaning of "static". The second time, the function jumps over the initialization.
For every call, the passed "Key" is sought in the current index. It doesn't matter if the key field is numeric or text, but the "Key" variable should be of the same type as the key field. See one typical trap and it's solution as Warning just below the code snippet 7 below.
How does the function perform compared to DLookup? It's up to 100× faster! We actually need to benchmark it against a Joined query.
Consider these three queries:
1. Jet performs the JOIN:
SELECT O.OrderID, O.CustomerID, C.CompanyName
FROM Orders O LEFT JOIN Customers C ON O.CustomerID = C.CustomerID
ORDER BY O.OrderID;
SELECT OrderID, CustomerID
(The lookup function is called from the loop, not the query.)
3. Embedded call to the function CompanyName()
SELECT OrderID, CustomerID, CompanyName(CustomerID) As CompanyName
Opening the queries 2. and 3., and browsing their records, is of course slightly faster than doing the same with 1., provided the function isn't called. So the test should loop through all orders, several times, and actually obtain the customer's company name for each order. In cases 1. and 3., this means reading the name into a dummy variable -- this forces Jet to evaluate either the join or the calculated field. In case 2. this means calling the function "externally" from code.
The raw measure is the number of turns in the loop during one second, which includes writing the company name into a variable. The order of magnitude on my old laptop is 1e5, meaning that execution times are expressed in µs.
Their relative raw performance is roughly equal to their numbering in the list: the second is two times slower than the first, and the third three times.
This is impressive! Of course Jet performs better when joining two tables on key fields than anything we can do from the outside. The overhead is in the function call itself. But the point is that this solution's scores in the same order of magnitude as plain SQL joins.
If we subtract the timing of an empty loop (writing a field from the Orders table, and not the linked Company Name), and compare the remaining time (a plausible measure of the real time spent), the comparison remains honourable: around 7 times between cases 1. and 3.
For measuring speed, one can either set a distance and record the time, or set a time and measure the distance. Due to the vast differences in speed, the second option is preferable. All raw results are "lookups per second" ("lps"), obtained by running a tight infinite loop with a timer-based exit condition. To avoid cached results, a small set of valid keys, plus one invalid key, are being used in turn. Plotted on a logarithmic scale, these are the results of one benchmarking session, performed using the first attached database, on tables ranging from ten to one million records. The scores (in "lps") have been normalised against the average performance of DLookup -- by sliding the ruler, for those of us who remember slide-rules...
The numbers are those of the functions. The group on the left are the early attempts, the worst being the one using "CurrentDb.OpenRecordset" for each call; I left it here to represent all carelessly written user defined functions. Likewise, the "find first" method illustrates all methods which start to fail on large data sets (the dots are averages for different table sizes, results above 10 000 records are below 0.1). The real improvements occur when using the "seek" method, and then when using static tables -- resulting in the "fast table lookup" functions.
The relative position of "SQL", an attempt to score joins on the same chart with lookup functions, is rather arbitrary. The dots do represent actual benchmark results (roughly 2 and 3 times the score of function 6, from the last test), or different interpretations of these results (6.7× as example). In reality, tests in other circumstances can place Jet SQL techniques way below the fastest functions... On the other hand, the line should also extend much farther to the right. For example, Jet can optimise sorting when ordering by a field, even if the field isn't indexed at first, but not when ordering by the result of a function -- no matter how fast. This being said, the factor 3 is a good rule of thumb.
If you wish to do your own benchmarking or explore how I arrived at the figures presented here, please play with this Access 2000 format test file (more information and guidelines are included inside). The benchmarking itself is somewhat technical, but it also demonstrates all the functions in this article in context. The "method numbers" are those of the snippets: 1 to 6, zero being plain DLookup().
Warning: This isn't a demonstration database, but a playground for experienced users who wish to evaluate the numbers published here and perform their own benchmarking. You do not need to study it if you only want to use fast table lookup functions. FastLookupBe.mdb
Several finer points need to be addressed and understood for the implementation to be successful.
Alll but one of the functions tested above return Empty on failure. Due to the special behaviour of 'Empty', the result could be mistaken for a succesful lookup (for example the value zero). In a database environment, it is more coherent to return Null on failure, using pseudo-code like:
If <success> Then Result = <field value> Else Result = Null
In other cases, it might be simpler to initialise the return value to Null at the top of the function.
Another problem is implied by the section title "Dedicated Functions". The example is indeed highly specialised: it returns the company name of a customer identified by his ID. It is possible to write a generic function, using the same declaration as DSeek above, but that requires a mechanism to manage open tables, which is the main problem.
Static variables are fine, but static object variables can be the cause of vicious bugs. Once the function "company name" (above) as been called, there is no way to close the table other than resetting the Visual Basic project (or to use a special nonsense value in an argument to close the static object from within the function). An open table is for example locked for design changes, which is only a minor problem, but there are more consequences. The table was opened by a database object, and the entire database thus remains open as well, probably with further depending objects.
Open database objects should all be available through a global collection, just like the Application's interface collections: Forms, Reports, Modules... We can create a new collection for open tables.
' Collection to hold open tablesGlobal Tables As New Collection' GetTable() returns a table-type recordset by table name'Function GetTable( _ TableName As String, _ Optional Index As String = "PrimaryKey" _ ) As DAO.Recordset Dim recT As Recordset ' do we already have a recordset? For Each recT In Tables If recT.Name = TableName Then Exit For Next recT ' if not, open it and add it to the collection If recT Is Nothing Then Set recT = CurrentDb(TableName).OpenRecordset Tables.Add recT, recT.Name End If If Len(Index) Then recT.Index = Index Set GetTable = recTEnd Function
Technical note: this new collection belongs to the VBA project, and not to the Application; also, being a VB collection, it isn't zero-based like all built-in Access and DAO collections.
Having solved the (potential) problems of static open tables embedded in functions, the collection can now be used to create a generic "fast lookup" function, with a declaration line identical to DSeek (function 5 above):
' Seek a record from a table, and read a field value'Function SeekValue(Field As String, TableName As String, Key1, _ Optional Key2, Optional Key3, Optional Key4, Optional Key5) With GetTable(TableName) .Seek "=", Key1, Key2, Key3, Key4, Key5 If .NoMatch Then SeekValue = Null Else SeekValue = .Fields(Field) End WithEnd Function
The reason is that combo boxes and list boxes always return String values. Most of the time, they are converted back to whatever is needed, but not in this case. The String "22" will not match the Asset ID 22... Use instead:
...or any other method to force the conversion to numeric.
Front-end / Back-end architecture
Up to now, we have been using loosely CurrentDb or DBEngine(0)(0). This will not work when a back-end database is used. A linked table cannot be opened in a table-type recordset from the front-end, so a new database object is required. This creates a potentially funny -- or disastrous -- pitfall.
Let's say the Tables collection has been implemented as suggested above, and used by "fast lookup" functions. Some tables are thus open. If the user or the application then re-links the tables to another back-end, the tables in the collection will still refer to the old file.
For this reason, I usually wrap the back-end database object and the Tables collection into a class module. A side advantage is that an open link to the back-end is maintained even if no form is currently active. This can speed up some operations.
The module below can easily be added to any front-end database; simply designate one of your main linked tables (one you know will always be present) in the constant declaration; it will be used only to obtain the path of the database file (assuming all tables are in the same back-end database).
'-------------------------------------------------------------------------------' "Fast Lookup" Back-end demo -- simple class module' Written for Experts Exchange, www.experts-exchange.com' Copyright Markus G Fischer, Geneva 2009'-------------------------------------------------------------------------------'' The sole purpose of this class is to always maintain open a database object' pointing to the back-end and thereby to provide the Table() function.'' It is sufficient to change the constant MAIN to one of your linked tables,' and this class will work in your application. Use it by adding:'' Global BE As New claBackEnd_1 (the exact name of the class)'' to a normal module, and BE.Table(<name>) will return a table-type recordset' which can be used to implement "fast lookup" functions.'Option Compare DatabaseOption ExplicitConst MAIN As String = "Orders" ' a table namePublic Db As DAO.Database ' the database objectPrivate colTables As Collection ' collection of tables' Table maintains a collection of open tables. When called the first time, a' new recordset is created for the table, the second time the existing object' is returned. The index is set to the Index argument.'Public Function Table( _ TableName As String, _ Optional IndexName As String = "" _ ) As DAO.Recordset Dim recT As DAO.RecordsetOn Error Resume Next ' is the table alread in the collection? For Each recT In colTables If recT.Name = TableName Then Exit For Next recT ' if not, create a recordset for it If recT Is Nothing Then Set recT = Db(TableName).OpenRecordset colTables.Add recT, TableName End If ' set index and return object If Len(IndexName) Then recT.Index = IndexName Set Table = recTEnd Function' If needed, all tables can be closed at once'Public Sub CloseTables() Set colTables = New CollectionEnd Sub' When the instance is initialized, the path to the back-end is extracted from' the MAIN table, and used to open it in a new database object.'Private Sub Class_Initialize() Dim strPath As String strPath = Mid(CurrentDb(MAIN).Connect, 11) If Len(Dir(strPath)) Then Set Db = OpenDatabase(strPath) Set colTables = New Collection End IfEnd Sub' When the instance is terminated, the database is closed.'Private Sub Class_Terminate() Set colTables = Nothing If Not Db Is Nothing Then Db.Close Set Db = NothingEnd Sub
[/step]Your "fast lookup" functions can use BE.Table(name), similar to GetTable() above, and SeekValue() can be implemented as well. Since the database object is exposed, you can also replace CurrentDb.Execute, for example, with BE.Db.Execute. Of course, CurrentDb sees the linked tables and the front-end's queries, while BE.Db only sees the actual back-end tables and any queries stored with them, which can be a neat trick or an annoyance, depending.
If you wish to test the module or score lookup functions using a back-end (for example over a network), please download this demo file. The file used as back-end is in fact the test database, used for benchmarking. Any other database containing the old NorthWind demo tables would work as well. Both are in Access 2000 format.
The front-end "application", contains two versions of a back-end wrapping class module, and a demo form implementing various fast lookup functions. FastLookupFE.mdb
The link below is a repetition of the "benchmarking test" database from above. We are only interested in the tables it contains: when you open the front-end for the first time, simply link all tables from the back-end (there is a link opening the relevant dialogue on the first page of the welcoming form). FastLookupBe.mdb
Do we need all that?
The entire exercise, to find a way to do something minuscule hundred times faster, may seem futile. Let's just buy a faster computer!
It is my opinion that fast computers are no excuse for neglecting simple optimisation. If by saving a couple of seconds here and there you save one minute over the day for a dozen users, that's one hour a week. Even if the life span of the application is only a couple of years, that's one hundred hours. But that's not the point of this article.
It isn't either that "DLookup is slow". That's an absurd statement, like saying "my Ferrari is slow". Compared to an aeroplane, it is, but compared to other cars, it isn't. DLookup is as fast as it gets. It can routinely evaluate and return an expression in under a millisecond. Consider also this expression, entered on a form based on Orders:
It doesn't get any cleaner than that. DLookup is executed in the Application's name space, to which the local object's names are added. They can be used in both the expression and the criteria arguments. In effect, this becomes a parametric query; no need to worry about Null or quotes in the criteria breaking the syntax... [And if you are puzzled by the syntax, please try it for yourself!]
What's more, it can obtain data from any source, like cross-tab queries to name one example, without much room for optimisation. Used right, DLookup is a Ferrari and a Jeep at the same time.
So, what is the point?
When a lookup function isn't used as an element of the interface but as a building block in a series of calculations, time might become an issue. If 10 000 lookups are being used to build a chart, the difference between 10 seconds and 0.1 second becomes quite noticeable when waiting for it to appear...
In some queries, adding one more table can be a problem (due to the limit in the number of tables, the result in the execution plan, the 'updateable' status, etc.). If the lookup function will compete against Jet for efficiency, it should again be as fast as possible.
Finally, a fast lookup function locates a record using an index, and in this article, we stopped there: it's either a match or it isn't. But why stop? Once the record is found, we can peek at the next or previous record in the index. We can also seek using inequality operators or partial keys. We can chain lookups or build indexes specifically for a lookup function. This opens a whole world of possibilities -- making up for many of the missing SQL keywords in Access SQL -- some of which are demonstrated in the second demo file.
A few related articles demonstrating Fast Lookup functions are in preparation.
I have developed Fast Lookup functions several years ago, and used them extensively in several branches or database development. They were for example invaluable in a multi-currency financial database, in a costing tool for large projects with full archival of adjustments, and in several scientific of academic applications.
When I learn new features in other SQL variants, I often think "this can be simulated using a Fast Lookup". Finding the next, previous, last or first record; parsing a hierarchical structure; ranking; keyword searches; fuzzy searches; calculated indexes... these ideas are explored in related articles.
These functions have been among the best tricks in my bag for years, and I believe you will find them as useful as I do. The present article only covers the basics, in detail, in the hope that you will avoid some of the pitfalls any advanced technique entails.
I would like to thank aikimark, for his very thorough editorial reviews, which vastly improved the technical accuracy, the clarity, and the style of the article. All remaining errors are naturally my own.