Solved

# Union query is very sloooooow

Posted on 2006-05-29
559 Views
I have a union query.  It is based upon two queries, each brings fields from multiple tables, some of which are calculated.
One query it is based upon returns about 1500 records in a few seconds.
The other query returns about 33000 records in, say, 20 seconds.
The union (SELECT * FROM AuftragCostEigenes  UNION ALL SELECT * FROM AuftragCostAkkordant) takes over five minutes to complete.
There is no sorting in either base query.

As the total time for the individual queries is dramatically less than the union query does anyone have any ideas why / suggestions that could improve matters.
0
Question by:AndyAinscow

LVL 44

Expert Comment

Try putting a sort in both base queries.  Just out of curiosity, remove the ALL and see what that does to the time.
0

LVL 58

Expert Comment

[Ray, the order by clause of both queries is skipped in a UNION query. Also, removing the ALL can only make things worse (it will have compare all 34500 records to determine uniqueness).]

Hello AndyAinscow

I'm not going to ask *why* you need to combine 1500 records with 33000 others in a UNION, but I should as this seems a clear sign of bad design. However, when working with medium-sized to large recordsets, you will prefer temp tables:

DROP TABLE zttblTempUnion;
SELECT * INTO zttblTempUnion FROM AuftragCostEigenes;
INSERT INTO zttblTempUnion SELECT * FROM AuftragCostAkkordant;
CREATE INDEX ...?

Cheers!
(°v°)
0

LVL 44

Expert Comment

harfang:  Maybe partly so, but I get this from Help:

If you want to specify sorting in a union query, add a single ORDER BY clause to the end of the last SELECT statement. In the ORDER BY clause, specify the field name to sort, which must come from the first SELECT statement. For information on the ORDER BY clause, click .

0

LVL 43

Author Comment

OK.
Adding sort and removing the ALL make little difference, a couple of seconds here and there.

ALL - in any case that is REQUIRED else any duplicate records are suppressed by the UNION (I got burnt by that many years ago - I found out why the result was not correct by comparing every single record from the individual queries and the union).

<I'm not going to ask *why* you need to combine 1500 records with 33000 others in a UNION>
Actually the UNION query is in a stripped down version.  The end result is to calculate the cost of a 'piece of work'.  The work can combine materials purchased, machine hours, employee time (both own employees and contactors with different valuations to cover social and other costs....) and so on.
Maybe a different table structure would be better for this (but might be worse for other things) - BUT the table structure exists and I have to work with it.

I suspect having to fill a temporary table is a really bad option due to database bloating  (If I really do need to go that way, which I might, then I expect I would initially use a disconnected recordset).

What I am doing here is a worse case to test performance, in reality there would be a lot of pre-filtering.
0

LVL 43

Author Comment

To re-iterate.
I am really puzzled by the discrepancy in performance.  A few seconds plus a few more seconds for the individual queries wheras the union takes an order of magnitude longer.
0

LVL 44

Expert Comment

Have you tried compact and repair?  You may have a limiting memory size that will accomodate either of the two queries without paging to virtual memory on the disk, but with the union query, you may require that disk access which as we all know is somewhat slooowweerrr.  I have half a gig memory,and with a 6k records table and a 12k records table the union all query ran in about 4 seconds.
0

LVL 43

Author Comment

compact and repair - yes did that before I asked the question, both on BE and FE.

Memory - its running (both DB's local) on my laptop, Win XP with 256 MB.  According to task manager roughly half of memory is in use by OP system and running progs.

I'll be on line again tomorrow, its getting on for midnight here now.
0

LVL 43

Author Comment

ps.
Each record isn't that big, just a few numeric fields, a date and a small text string.
0

LVL 84

Expert Comment

harfang is on the mark, and you are correct in regards to bloating. Often devs will use a temporary db for their temp tables, and perform a compact/repair on that db as needed.
0

LVL 58

Expert Comment

AndyAinscow

The best answer I can provide is that the Union basically prevent the use of any indexes. Being faced with a union query would be the human equivalent of getting two address books and being asked: "Please combine all Swiss addresses from the first with all Company addressed from the second!" Each of the individual queries might be doable, thanks to proper indexes and flags, but you will not be able to just use pointers to the information, you will need to copy everything down.

That is what's happening. Instead of a dynamic keyset, you get a snapshot. Worse, you get loosly typed fields, and all records will have to be checked for possible conversion (because Field1 in the first query is a 10-character text field, but in the second, it's a date field, for instance).

So Jet will execute each query speedily, but it will also have to parse all records, copy the fields, do the conversions, check for errors, and decide what to show you first. All that whithout indexes.

In essence, I'm convinced that a Union query is not the proper tool in your case. You should perform the price calculations for each different cost center, and add them up only in the end. If this is for a report, you should use subqueries for  each cost center, and not one gigantic query for all details of different nature. Just running that union query is sloooow. Imagine what it will be like to create groups and subgroups on it (without indexes)...

If you really need it for some reason, use a temp table, even in a temp db, which you can delete and recreate at each startup if you like. But you should probably find a way to avoid it alltogether.

(°v°)
0

LVL 44

Expert Comment

C'mon Markus:  (because Field1 in the first query is a 10-character text field, but in the second, it's a date field, for instance)  -  you of all people should know that the only type conversion allowed in UNION queries is text to number and vice versa.  Your example, according to MS Help would not even run.  Must be getting late over there!
0

LVL 58

Expert Comment

Huh? If you mean "allowed if you want a meaningful result", I agree, but the mechanism is simply this: Jet will cast most mixed type column to string – even just date and number. This works in Northwind.mdb:

SELECT Discontinued From Products
UNION SELECT OrderDate FROM Orders
UNION SELECT CompanyName FROM Customers

Mixing a boolean, a date and text is silly, of course, but it's still the type of things that Jet has to manage in Union queries... I believe the only rule is that the number of columns must match.

Cheers!
(°v°)
0

LVL 43

Author Comment

Harfang - what you say makes sense re the indexs.
I always make certain fields are of correct types and in the same order for UNION queries.
Dynamic vs Snaphot sets - no problem, it is for a report so a snapshot is suitable.

I don't think using a temp table (DB) is suitable (write to disc) I may well have to use a disconnected set (or other in memory collection).

I'l look into the subquery / subreport.  It may be suitable for this particular case.

ps.  We are each by a big lake at opposite ends of the same country.
0

LVL 58

Expert Comment

(°v°)
0

LVL 38

Expert Comment

Andy,
Based on the following excerpt from this link and other links, there are few things you can try that might improve your union query performance.

The purpose of a UNION is to take several dissimilar tables, or query results, and create a single unified result. To perform this step, SQL needs to know how you want the result table to look. You use the first SELECT fields list for this determination. Therefore, the fields list in the first SELECT must allow for all the fields that will be needed, including those fields that are not in the first statement but will appear later.

1. try placing the longest most complicated select statement as the first select statement.
3. eliminate columns with aggregate functions (sum, count, etc.) unless they are absolutely needed in the final result....perhaps you could leave the calcs/aggregates out of the union query, but include the calcs/aggregates as part of a select query that uses the union query as a "raw data" source.
0

LVL 58

Expert Comment

PDB,

I would suggest the exact opposite of your point 3. The UNION should come at the latest possible moment, meaning that all aggregate functions and groupings should be made in the individual queries. Using aggregate functions based on a union query is particularily bad if you have groupings, because no indexes are available.

Just my opinion, of course,
(°v°)
0

LVL 38

Expert Comment

Harfang,
You are probably correct (as usual), but we will not know for sure unless Andy gives it a try.

PDB
0

LVL 58

Expert Comment

PDB, thanks ;) and yes, of course, each situation is different. Cheers!
(°v°)
0

LVL 43

Author Comment

Can't complain but always interested in possible extra clients.

PDB.
1)  Next to no difference noted, I tried that earlier just didn't report the results here.
2)  I'll give it a try but I don't expect any benefit (maybe I'll be pleasantly suprised)
3)  I agree with harfang re the aggragate/union order of action, I would be astonished if that made an improvement.
0

LVL 43

Author Comment

OK, tried the SELECT a, b, c instead of SELECT * - no improvement.
0

LVL 43

Author Comment

I've just tried creating a new DB and importing the tables from the BE and the queries from the FE.
The performance is still catastrophic.
0

LVL 43

Author Comment

PDB
I've just checked your link out - no new info (for me) that I can see there.

I think I'm going to have to do something in memory with individual queries rather than the union.
0

LVL 43

Author Comment

I've had another look at the tables the queries are based upon.  Adding some more indexs might help speed things up (not all fields it filters on are indexed).
0

LVL 44

Expert Comment

Markus:  MS Access Help says:

Note   Each SELECT statement must return the same number of fields, in the same order. The corresponding fields are required to have compatible data types with one exception: You can use a Number and a Text field as corresponding fields.

AndyAinscow:  I'm still suspicious of memory.  Try cutting the larger table in two and running the UNION.  Eventually you should get to a point where there will be a remarkable difference in execution speed.  The OS can be quite a memory hog, and all the TSR programs we have in the start menu these days blows me away.
0

LVL 58

Expert Comment

Ray,

This might be a vesion issue. A2k, Jet SQL 4.0, DAO 3.6 and ADO 2.1 all state:
All queries in a UNION operation must request the same number of fields; however, the fields do not have to be of the same size or data type.

Which version's help file were you referring to, and did you try if it's enforced?
(°v°)
0

LVL 58

Expert Comment

Hello AndyAinscow

If you want to play with temp tables, you can use this module to create and manage a linked temp database. This avoids bloating (most of it anyway). I created the module a few months back and cleaned it up for publishing here.

Have fun!

'-------------------------------------------------------------------------------
' Class Module  claTempDB
' Author        Markus G Fischer, aka harfang (°v°)
' Created       September 2005
' Updated       May 2006, MGF: cleanup, comments
' Copyright     Markus G Fischer, Geneva 2005-2006
' Distribution  free if proper credit is given
'-------------------------------------------------------------------------------
'
' Purpose
' ¯¯¯¯¯¯¯
' Implement a local temp database for use with temp tables. This prevents
' "bloating" of the front-end while providing all the benefits of locally
' stored temp tables.
'
'
' Implementation
' ¯¯¯¯¯¯¯¯¯¯¯¯¯¯
' The temp database is created at each startup in the users temp folder, as
' returned by Environ("TEMP"), using the same name as the current database,
' with two leading tilde, e.g. "~~MyFE.mdb".
'
'
' Usage
' ¯¯¯¯¯
'
' • Create a global variable for the class module:
'   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
'   Global Temp As claTempDB
'
'
' • Create temporary tables with MakeTable()
'   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
'   Temp.MakeTable(TableName, Query [,Index [,Index ...]])
'
'   TableName:  name of the temp table, e.g. "zttblAllDetails"
'   Query:      SQL string, name of a query, or name of a table
'   Index...:   optional name(s) of fields to index
'
'   Creates:    the table in the temp database, and
'               a linked table in the current database
'   Returns:    the number of records transfered, or -1 on failure
'
'   WARNING:    if the table exists, it is overwritten!
'
'   Example:
'
'       strSQL _
'           = " SELECT CompanyName, CustomerID As ID FROM Customers" _
'           & " UNION SELECT CompanyName, SupplierID FROM Suppliers"
'
'       If Temp.MakeTable("zttblAll", strSQL, "CompanyName", "ID") > 0 Then
'
'           Temp.DB.TableDefs("zttblAll").Fields("ID").Required = True
'           DoCmd.OpenTable "zttblAll"
'
'       End If
'
' • Delete them with DropTable()
'   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
'   Temp.DropTable(TableName)
'
'   TableName:  name of the temp table to delete
'
'   Deletes:    the table in both the temp and the current databases
'   Returns:    True on success or table not found, False on error
'
'
' • Access the temp database with Temp.DB
'   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
'   Any more complex operations can be executed through either the linked
'   tables in the current database, or in the temp database, using the DB
'   property. For example:
'
'   Debug.Print Temp.DB.Name, Temp.DB.TableDefs.Count
'   Set recTable = Temp.DB.OpenRecordset("zttblAny", dbOpenTable)
'
'
' • Delete the temp database by releasing the global variable
'   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
'   Set Temp = Nothing
'
'
' Disclaimer
' ¯¯¯¯¯¯¯¯¯¯
' The method DropTable(), called by MakeTable(), actually deletes the named
' table in the arguments. You should therefore use a very strict naming
' convention, for example by using the suffix "zttbl" – meaning non-data (z)
' temporary (t) table (tbl) – for all temporary tables.
'

Option Compare Database
Option Explicit

Dim mstrFileName    As String           ' path and name of temp database
Dim mdbTemp         As DAO.Database     ' open database object

Property Get DB() As DAO.Database
Set DB = mdbTemp
End Property

Private Function Drop(pdb As Database, pstrTable As String) As Boolean
'
' Drop a table from a database, does not fail if it did not exist.
'
On Error Resume Next

pdb.Execute "DROP TABLE " & pstrTable
If Err.Number = 3376 Then Err.Clear   ' does not exist
Drop = (Err.Number = 0)

End Function

Public Function DropTable(TableName As String) As Boolean
'
' Drops a table from both the temp and the current database
'
If Drop(CurrentDb, TableName) Then
If Drop(mdbTemp, TableName) Then
DropTable = True
Exit Function
End If
End If
If Err Then MsgBox Err.Description, vbExclamation, "DropTable()": Err.Clear

End Function

Public Function MakeTable( _
ByVal TableName As String, _
ByVal Query As String, _
ParamArray IndexFields() _
) As Long
'
' Creates a temp table in the temp database, a linked table in the current
' database, and basic indexed on selected fields.
'
Dim tdf As TableDef
Dim intF As Integer

On Error GoTo Failure

' [Table Name] brackets
If InStr(TableName, " ") And Left(Trim(TableName), 1) <> "[" Then
TableName = "[" & TableName & "]"
End If

' Subquery for SQL strings, and brackets for [Query names]
If InStr(Query, "SELECT ") Then
Query = "(" & Query & ") As TMP"
ElseIf InStr(Query, " ") And Left(Trim(Query), 1) <> "[" Then
Query = "[" & Query & "]"
End If

' [Index Field] brackets
For intF = 0 To UBound(IndexFields)
If InStr(IndexFields(intF), " ") _
And Left(Trim(IndexFields(intF)), 1) <> "[" Then
IndexFields(intF) = "[" & IndexFields(intF) & "]"
End If
Next intF

' build full make-table query
Query _
= " SELECT * INTO " & TableName _
& " IN '" & mstrFileName & "'" _
& " FROM " & Query

' drop existing tables
If Not DropTable(TableName) Then GoTo Failure

With CurrentDb

' make the temp table
.Execute Query
MakeTable = .RecordsAffected

Set tdf = .CreateTableDef(TableName)
tdf.Connect = ";DATABASE=" & mstrFileName
tdf.SourceTableName = TableName
.TableDefs.Append tdf

End With

' index all selected fields
For intF = 0 To UBound(IndexFields)
Query _
= " CREATE INDEX " & IndexFields(intF) _
& " ON " & TableName _
& " (" & IndexFields(intF) & ")"
Debug.Print Query
mdbTemp.Execute Query
Next intF

Exit Function

Failure:
If Err Then MsgBox Err.Description, vbExclamation, "MakeTable()": Err.Clear
MakeTable = -1

End Function

Private Sub Class_Initialize()
'
' Generate the temp database name and create, overwriting if needed.
'
mstrFileName = CurrentDb.Name
mstrFileName = Mid(mstrFileName, InStrRev(mstrFileName, "\") + 1)
mstrFileName = Environ("TEMP") & "\~~" & mstrFileName
If Len(Dir\$(mstrFileName)) Then Kill mstrFileName
Set mdbTemp = DBEngine.CreateDatabase(mstrFileName, dbLangGeneral)

End Sub

Private Sub Class_Terminate()
'
' Close the temp database and delete it.
'
On Error Resume Next

If Not mdbTemp Is Nothing Then
mdbTemp.Close
Set mdbTemp = Nothing
End If
If Len(Dir\$(mstrFileName)) Then Kill mstrFileName

End Sub
'-------------------------------------------------------------------------------

You could also add this to a normal module (the above  is a class module):

Global Temp As New claTempDB

Function TempCleanup()

Dim intT As Integer
Dim strTable As String

Set Temp = Nothing
With CurrentDb
For intT = .TableDefs.Count - 1 To 0 Step -1
strTable = .TableDefs(intT).Name
If Left(strTable, 5) = "zttbl" Then _
.TableDefs.Delete strTable
Next intT
End With

End Function

(°v°)
0

LVL 43

Author Comment

harfang - interesting piece of code.

(For this case of little use - only the data/queries is in access DBs.  The user interface is coded in C++.  Don't bother trying to convert it, I already have C++ code to cope with creating/modifying/linking tables and DBs.)

What I have noticed is something I can't explain.
In the test UNION query and its two base queries.
Each query on its own runs rapidly but in the UNION the performance is terrible.
Now I have had a look at the tables and added an index to one field that one query was filtering on.
No noticable difference to the individual queries BUT the union query is dramatically faster (not really acceptable but a lot better).

The only explanation I can come up with is that the jet engine isn't running each query individually and filling a 'pipe' filled with the output (which was how I viewed it).
0

LVL 58

Accepted Solution

Andy,

I think you are right. The default mechanism of the UNION query involves removing duplicates, so that Jet could very well run all queries simultaneously. The modifier ALL turns the removal off, but perhaps not the central mechanism.

Information about query performance in an Access database
http://support.microsoft.com/?kbid=209126

Use Microsoft Jet's ShowPlan to write more efficient queries
http://builder.com.com/5102-6388-5064388.html

The ShowPlan feature might give you the answer to your last question...

Happy hacking!
(°v°)
0

LVL 43

Author Comment

This looks as if it could be relevant.
Avoid calculated fields in subqueries. If you add a query that contains a calculated field to another query, the expression in the calculated field may slow performance in the top-level query.
0

LVL 43

Author Comment

Very interesting.  (Why can't M\$ have tools like this directly available from the access interface, rather than having to set registry keys...)
The info in the link would be better if the listings it refers to where actually there.  I'll have to study that more closely.
0

LVL 44

Expert Comment

Markus:  A2K with SP3.  To be honest, I did not test the 'allegation'.
0

LVL 43

Author Comment

Thanks to everyone.

I have made collections in memory to store and manipulate the results of individual queries to provide the information I require.  It takes a few seconds to run which is acceptable given what it is doing.
0

## Featured Post

In the previous article, Using a Critera Form to Filter Records (http://www.experts-exchange.com/A_6069.html), the form was basically a data container storing user input, which queries and other database objects could read. The form had to remain op…
I see at least one EE question a week that pertains to using temporary tables in MS Access.  But surprisingly, I was unable to find a single article devoted solely to this topic. I don’t intend to describe all of the uses of temporary tables in t…
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…