Link to home
Start Free TrialLog in
Avatar of Cbobchin
CbobchinFlag for United States of America

asked on

How can I speed up this query?

Hi,

I have a query that is based on a view that takes forever to run (and usually errors out due to TEMPDB running out of space.

In this query the view gets joined to itself and is used to filter by the maxof routercallsequence#

The base query selecting the view by date returns the days worth of data in about 4 - minutes. Certainly a usable time as the base table contains several million records and the query returns about 45k records.

So in essence I'm trying to join 45k records to themselves. there are no indices on the view and I am trying to get them added.

I can get to the tables via Access so if an Access solution applies so be it.

Thanks

Craig
Avatar of GRayL
GRayL
Flag of Canada image

Put your 45k records into a temp table in Access and go from there.  Can you post the query that produces the 45k records and the self-join query.  Can you explain what this means >there are no indices on the view and I am trying to get them added.<??
I'm not much of an SQL guy so I'm going to look at this from a hardware perspective.  Do you know the specs of the server this is running on?  More specifically are there nice fast drives running on the server?  I've seen really nice servers that are running with a couple terabytes of space but they have relatively slow SATA drives in them so getting information off them can take a while since it's not hard for the hard disks to be your bottleneck.

Avatar of Cbobchin

ASKER

GRayL,

Wierd i posted the SQL statement I'm using but it didn't show up. Trying it again.

As for indices The SQL server view does not have any indexes and I'm trying to get the DBAdmin to add some to speed up the queries I'll be running from the view.



SELECT SurveyID, DateTime, Variable5, Var5_1st12, DNIS, InstrumentPortNumber, AgentPeripheralNumber, ANI,  Max(RouterCallKeySequenceNumber) AS MaxOfRouterCallKeySequenceNumber
FROM dbo.vw_TCD INNER JOIN dbo.vw_TCD AS dbo.vw_TCD_1 ON dbo.vw_TCD.Var5_1st12 = dbo.vw_TCD_1.Var5_1st12
Where dbo.vw_TCD.DateTime >= '04/15/2009 00:00:00' and dbo.vw_TCD.DateTime <= '04/15/2009 23:59:59'
GROUP BY SurveyID, DateTime, Variable5, Var5_1st12, DNIS, InstrumentPortNumber, AgentPeripheralNumber, ANI;

Open in new window

i1akey, unfortunately I don't know, nor have any influence on the hardware. so it is a moot point.
Am I to under stand this is the query that reduces the record count from several million to 45 K.  If so, stick the result in a temp table named myTemp as below:

SELECT
SurveyID,
DateTime,
Variable5,
Var5_1st12,
DNIS,
InstrumentPortNumber,
AgentPeripheralNumber,
ANI,  
Max(RouterCallKeySequenceNumber) AS MaxOfRouterCallKeySequenceNumber  INTO myTemp
FROM dbo.vw_TCD INNER JOIN dbo.vw_TCD AS dbo.vw_TCD_1 ON dbo.vw_TCD.Var5_1st12 = dbo.vw_TCD_1.Var5_1st12
Where dbo.vw_TCD.DateTime >= '04/15/2009 00:00:00' and dbo.vw_TCD.DateTime <= '04/15/2009 23:59:59'
GROUP BY SurveyID, DateTime, Variable5, Var5_1st12, DNIS, InstrumentPortNumber, AgentPeripheralNumber, ANI;
The portion that filters down from several million to 45k is this:

I've tried putting it into a temp table in Access  and just joining the temp tables but I get the same results..

SELECT SurveyID, DateTime, Variable5, Var5_1st12, DNIS, InstrumentPortNumber, AgentPeripheralNumber, ANI,  Max(RouterCallKeySequenceNumber) AS MaxOfRouterCallKeySequenceNumber
FROM dbo.vw_TCD 
Where dbo.vw_TCD.DateTime >= '04/15/2009 00:00:00' and dbo.vw_TCD.DateTime <= '04/15/2009 23:59:59'
GROUP BY SurveyID, DateTime, Variable5, Var5_1st12, DNIS, InstrumentPortNumber, AgentPeripheralNumber, ANI;
 
Access query code:
 
SELECT [4-15 termcall detail_1].SurveyID, [4-15 termcall detail_1].DateTime, [4-15 termcall detail_1].Variable5, [4-15 termcall detail_1].Var5_1st12, [4-15 termcall detail_1].DNIS, [4-15 termcall detail_1].InstrumentPortNumber, [4-15 termcall detail_1].AgentPeripheralNumber, [4-15 termcall detail_1].ANI, Max([4-15 termcall detail_1].RouterCallKeySequenceNumber) AS MaxOfRouterCallKeySequenceNumber
FROM [4-15 termcall detail] INNER JOIN [4-15 termcall detail] AS [4-15 termcall detail_1] ON [4-15 termcall detail].Var5_1st12 = [4-15 termcall detail_1].Var5_1st12
GROUP BY [4-15 termcall detail_1].SurveyID, [4-15 termcall detail_1].DateTime, [4-15 termcall detail_1].Variable5, [4-15 termcall detail_1].Var5_1st12, [4-15 termcall detail_1].DNIS, [4-15 termcall detail_1].InstrumentPortNumber, [4-15 termcall detail_1].AgentPeripheralNumber, [4-15 termcall detail_1].ANI, [4-15 termcall detail].Var5_1st12;

Open in new window

Avatar of chapmandew
are the indexes on the fields you're joining on and the fields in which you're setting criteria on in your where clause?
Chapmandew, No indexes are on the view. I'm trying to get some added.
What is the point of joining a table to itself using the same fields in the ON clause?  All that does is create a 45k recordset identical to either of the two tables in the join.  Try removing the join and do the query and group by on just the table, no join.  Is the Max() function any different and more importantly, is the execution a light-year faster?  
Reason for joining the table to itself is that there can be multiple records that are the same expect for the Routercallsequence number. of those that are duplicated I  need to find the records that have the highest sequence number. as well as the single records that are not duplicates.
IMHO, you do not need the JOIN.  Did you try it without?
NO....don't add them to the views just yet.  How complex are the views?  Are there indexes on the underlying tables?
GRayL, yes I did try it and it did not work.

Chap, there are some indexes on the underlying table but not on the field I'm joining on which only exists in the view (it's the 1st 12 characters of another field).
What happened?
Will you show us the query you tried it with?
I know for sure that if you join a table to itself on a common field, if it contains unique values, you will wind up with a record set identical to the original table.  If any of the common fields have two or more records, you will wind up with a table which multiples of  those records, however none of the values will have changed.  If you compare Max(fldOfInterest) in the original table against Max(fldOfInterest) in the JOINed tables you will wind up with the same value.  
Question, how come you don't get ambiguous column error? Your columns are not qualified with the aliases.

GRayl, the queries i've posted are the ones I'm using. What i'm getting is still the multiple records. I want to filter it down to just one record (with the max(routercallsequencenumber))

As to why i'm not getting the ambiguous collumn error I don't know.
If you won't answer my question, I can't help you.  Bye.
>>the queries i've posted are the ones I'm using. <<
Unfortunately, you have posted quite a few and we would just like to see the current one.

It would also help to see some sample data, current output and desired output.

>>What i'm getting is still the multiple records<<
Define "multiple records", what columns make a "multiple record".
GRayL I was off line most of Fri. and I did not take my laptop home with me. So sorry for not getting back to you immediately.

I can post some sample Data and  current and hoped for output.

It will take me a bit of time but I'll post it later this morning.
I somehow suspect that you may have seen the last of GRayL in this thread.
acperkins, I'm sorry to hear that.  in any event, here is the sample data:

1st: sample Raw Data:

                 The current output  from the SQL query I posted above never completes or if it does (when I choose  like 5 records) is the same as the current output.

I had an idea over the weekend that doing a subquery in the where clause might allow me to get the max router sequence number and then filter on that in the main query to get the final results.

What do you think?

Raw Data Which is the same as current output:
 
SurveyID	DateTime	Var5_1st12	AgentPeripheralNumber	InstrumentPortNumber	RouterCallKeySequenceNumber
0032BBDC-290311DE-B143AD29-22247AEC	4/15/2009 8:53	149123111961	45677	65677	3
0032C334-290611DE-84E8AD29-22247AEC	4/15/2009 9:12	149123133286	45433	65433	3
0034A8CA-290911DE-99118017-7702D3C8	4/15/2009 9:35	149123155989	46296	66296	3
0037E064-292811DE-91B38017-7702D3C8	4/15/2009 13:15	149123380683	46316	66316	3
00380715-291C11DE-80EA8017-7702D3C8	4/15/2009 11:47	149123291565	45564	65564	3
003C977C-291B11DE-924EAD29-22247AEC	4/15/2009 11:42	149123284308	45359	65359	3
003C977C-291B11DE-924EAD29-22247AEC	4/15/2009 11:47	149123284308	46298	66298	4
0040D6E1-28F911DE-B845AD29-22247AEC	4/15/2009 7:39	149123489820	46049	66049	3
00441960-294211DE-B1C8AD29-22247AEC	4/15/2009 16:29	149123530242	45260	65260	3
00456D64-292611DE-BA608017-7702D3C8	4/15/2009 13:12	149123366024	45187	65187	3
00456D64-292611DE-BA608017-7702D3C8	4/15/2009 13:05	149123366024	45510	65510	4
0046922D-294511DE-BD3FAD29-22247AEC	4/15/2009 16:44	149123538980	46296	66296	3
0047FEEF-291311DE-9D51AD29-22247AEC	4/15/2009 10:47	149123229093	45732	65732	3
0048C95E-290F11DE-83A5AD29-22247AEC	4/15/2009 10:17	149123200689	45222	65222	3
0049B3AD-28F411DE-A31FAD29-22247AEC	4/15/2009 7:01	149123266440	45357	65357	3
0053650D-291911DE-9E4E8017-7702D3C8	4/15/2009 11:27	149123270353	45463	65463	3
0053D65F-293011DE-99CFAD29-22247AEC	4/15/2009 14:09	149123438529	45408	65408	3
00561AAD-292C11DE-81438017-7702D3C8	4/15/2009 13:49	149123409958	55211	75211	3
0057AB8E-295411DE-A8EBAD29-22247AEC	4/15/2009 18:36	149123565536	46001	66001	3
0057AB8E-295411DE-A8EBAD29-22247AEC	4/15/2009 18:35	149123565536	45700	65700	4
0063469E-293D11DE-AB088017-7702D3C8	4/15/2009 15:56	149123512371	46240	66240	3
00635140-292511DE-94E2AD29-22247AEC	4/15/2009 12:52	149123358621	45403	65403	3
00641A22-28F911DE-A4778017-7702D3C8	4/15/2009 7:42	149123489840	45041	65041	3
0065F187-291A11DE-A9798017-7702D3C8	4/15/2009 11:34	149123277168	47361	67361	3
0067B5D6-290D11DE-B556AD29-22247AEC	4/15/2009 10:03	149123186245	45362	65362	3
006C666D-291111DE-904CAD29-22247AEC	4/15/2009 10:28	149123215045	46099	66099	3
 
Desired Output:
SurveyID	DateTime	Var5_1st12	AgentPeripheralNumber	InstrumentPortNumber	RouterCallKeySequenceNumber
0032BBDC-290311DE-B143AD29-22247AEC	4/15/2009 8:53	149123111961	45677	65677	3
0032C334-290611DE-84E8AD29-22247AEC	4/15/2009 9:12	149123133286	45433	65433	3
0034A8CA-290911DE-99118017-7702D3C8	4/15/2009 9:35	149123155989	46296	66296	3
0037E064-292811DE-91B38017-7702D3C8	4/15/2009 13:15	149123380683	46316	66316	3
00380715-291C11DE-80EA8017-7702D3C8	4/15/2009 11:47	149123291565	45564	65564	3
003C977C-291B11DE-924EAD29-22247AEC	4/15/2009 11:47	149123284308	46298	66298	4
0040D6E1-28F911DE-B845AD29-22247AEC	4/15/2009 7:39	149123489820	46049	66049	3
00441960-294211DE-B1C8AD29-22247AEC	4/15/2009 16:29	149123530242	45260	65260	3
00456D64-292611DE-BA608017-7702D3C8	4/15/2009 13:05	149123366024	45510	65510	4
0046922D-294511DE-BD3FAD29-22247AEC	4/15/2009 16:44	149123538980	46296	66296	3
0047FEEF-291311DE-9D51AD29-22247AEC	4/15/2009 10:47	149123229093	45732	65732	3
0048C95E-290F11DE-83A5AD29-22247AEC	4/15/2009 10:17	149123200689	45222	65222	3
0049B3AD-28F411DE-A31FAD29-22247AEC	4/15/2009 7:01	149123266440	45357	65357	3
0053650D-291911DE-9E4E8017-7702D3C8	4/15/2009 11:27	149123270353	45463	65463	3
0053D65F-293011DE-99CFAD29-22247AEC	4/15/2009 14:09	149123438529	45408	65408	3
00561AAD-292C11DE-81438017-7702D3C8	4/15/2009 13:49	149123409958	55211	75211	3
0057AB8E-295411DE-A8EBAD29-22247AEC	4/15/2009 18:35	149123565536	45700	65700	4
0063469E-293D11DE-AB088017-7702D3C8	4/15/2009 15:56	149123512371	46240	66240	3
00635140-292511DE-94E2AD29-22247AEC	4/15/2009 12:52	149123358621	45403	65403	3
00641A22-28F911DE-A4778017-7702D3C8	4/15/2009 7:42	149123489840	45041	65041	3
0065F187-291A11DE-A9798017-7702D3C8	4/15/2009 11:34	149123277168	47361	67361	3
0067B5D6-290D11DE-B556AD29-22247AEC	4/15/2009 10:03	149123186245	45362	65362	3
006C666D-291111DE-904CAD29-22247AEC	4/15/2009 10:28	149123215045	46099	66099	3

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of GRayL
GRayL
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If the query is as simple as that, then you may be better off using a derived table as it should be more performant.  If on the other hand, as I suspect, you still need to group more columns then you may want to indicate which are those columns or show how the query results need to change.
Actually it seems to work (at least in Access with a limited amount of Data). But when I converted it to SQL server going against the defined view, it is now going on 2 hrs of run time and still no results.

Are you running it with the 45k record dataset or the original table with millions of records.  If the latter, anytime you have an sub query in a query, it is going to take time.  
I ran it on both. On the smaller data set in ran very quickly. On the SQL server dataset it ran for over 15 hours and never gave me a response.




With millions of records in a table, unless the server has gobs of memory, I imagine it has to page to disk regardless of the OS, for those huge tables.  The instant that starts to happen, do the brakes ever go on.  I think the method of significantly reducing the db size before applying a query with a sub-query is the way to go.
BTW, that was what I suggested in my first post.
With only 14 million rows and such a small resultset there is no earthly reason why that simple query on a well indexed table should not return your results in under 5 minutes.  There has to be something else going on.  Either the hardware/software for SQL Server is subpar, the network is lousy or something else.
There may be no 'earthly' reason but the 'devil' is in the details.  Why do you think a query with a subquery should run on a table with over 14 meg records in 'minutes' on a server, whose credentials are totally unknown in terms of memory, and cpu power?
I believe I indicated what I perceived were the problems.  Hopefully the intended recipient of my comments (the author) grasped my intent.  Otherwise I will be happy to spell it out.
AC, no I did not grasp the intent, but I'd love to hear it.
You need to troubleshoot the problem.  At the moment you have not got a clue as to where the problem lies.  Is it SQL Server or is it in the network due to the size of the resultset returned or is it the way you are retrieving that data using ODBC (?) from MS Access?

In order to do that you need to isolate the problem.  First you need to take MS Access out of the picture and execute that query from within Query Analyzer or SSMS.  Does it take a long time.  If the answer is yes, then examine the execution plan and determine what is the problem.  More than likely the table is not indexed appropriately.  If it is indexed, then perhaps it has been a while since the statistics have been updated.

After doing all of that and it is still slow, see if the problem is the size of the resultset.  Add a TOP 100 to the query (without any ORDER BY) or a SET ROWCOUNT 100 prior to executing the query.  This may help you identify if the network is the problem.

Once you have done all that and have the time down to an acceptable amount only then should you add MS Access to the mix.
Cbobchin:  I'm all ears.  Waiting to hear the results of your tests.  Don't forget to compare results where first the queries are executed on the server containing the data, and then remotely, elsewhere in the network all by SQL Server.  There may very well be a network speed problem.
I'm a working on it. I've had a couple of other items come up that were a bit higher priority today.
Well one thing I can tell you all, is that the base table this whole thing is built on has just shy of 700 million records.

I'm running a query that will tell me how many records are in the view I had created for filtering some of those down.

I also know that the 3 main fields I use in the selection criteria are not indexed, though i'm working on getting them indexed.

more info as it becomes available.
Well the filtered recordset that I use for my view has almost 6 million records in it.

So I'm trying to select 1 day's worth of records from a lot of records on fields that are unindexed. No wonder my query to select just 10 records has been running over 17 hours now and still has no results.
Rather than a 'filtered' recordset, have you tried to copy the filtered result to a temp table and then apply the query with the subquery to the temp table?
No, because I don't have those permissions on the database.
Is a filtered table of 6 million records from 700 million in reality still a 700 million record table when it comes to processing a query?  Can you summarize where we are at now?  You started off saying several million records from which a day's worth reduced to 45K records.  Now we're at 700 million from which a day reduces to 6 million?
The base table is 700 million records. The 6 million records is from 1/1/09 through yesterday based on my base filter (NOT using the subselect). A day's worth of records is still in the range of 30-50k.
Of no fault of GRayL, the solution is only partially complete. Turns out the database needs some minor work and the fields I was told initially held the correct data turns out not to.