Solved

Optimizing a Simple Select

Posted on 2013-01-03
10
414 Views
Last Modified: 2013-01-24
I have a very simple select statement:

SELECT COUNT(*) as varTotal
FROM tbl_stat_master
WHERE stat_account = 173
AND stat_created 
BETWEEN '2012-11-01 00:00:00' 
AND '2012-12-01 00:00:00'

Open in new window


It's running on a table that has 36 Million records. I'd like to it be faster.

I have two indexes setup, one on stat_account and one on stat_created.

So first of all, is there something I should do to make the query faster? Like maybe COUNT(*) isn't a good idea, or there something better than BETWEEN?

And secondly, should I change my indexes? Make a combo index that includes account and created?

Or maybe you have another idea?

Thank you!
0
Comment
Question by:cnxmax
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
  • +2
10 Comments
 
LVL 45

Assisted Solution

by:Kent Olsen
Kent Olsen earned 167 total points
ID: 38741631
Hi cnx,

That query is very reasonable, and a very popular kind of query in the real world.  :)

The BETWEEN clause is just a shorthand way of writing the condition.  SQL actually translates that to "WHERE (A <= value1 AND A >= value2).

A composite index over stat_account and stat_created will probably help, depending on the database design.  It can't hurt to create that index and try it.


Good Luck,
Kent
0
 
LVL 13

Assisted Solution

by:Ryan
Ryan earned 167 total points
ID: 38741636
Consider converting DateTime to integer, or double (days since 1/1/1900 for example). It may not make a difference, just thinking aloud.

Also make sure indices aren't fragmented.
0
 
LVL 9

Assisted Solution

by:sognoct
sognoct earned 111 total points
ID: 38742026
create clustered index with stat_created and also stat_account

Create the clustered index before creating any nonclustered indexes.

I think that using <= and => is better than between (no need for engine to translate)

as much as I know count(*), count(0), count(1) have the same processing speed ... so use count(*)

if stat_created is a datetime ... you must convert the value of the where in datetime format with a convert, or system will have to convert the values from the table in the same format (slower)
0
Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

 
LVL 13

Assisted Solution

by:Ryan
Ryan earned 167 total points
ID: 38742151
If you're calling this query multiple times with different criteria, then I think the engine can optimize the repeat calls if you use @parameters rather than a string literal.  It'll also meet the suggestion by sognoct for the type casting.

I think those are imperceptible differences in this case though.
0
 
LVL 45

Assisted Solution

by:Kent Olsen
Kent Olsen earned 167 total points
ID: 38742580
Ok.  Before this train gets too far off the track.

Do NOT create a clustered index.  Yet.  That will relocate all 35,000,000 rows in a way that may make this query faster, but may also have a negative impact on other queries.  You need to make that decision before you go down this path.

There is NO difference in using BETWEEN or the <= and >= syntax.  They evaluate into exactly the same execution.  Use the one that makes the most sense to your SQL.


The other posters are correct about the dates though.  You might want to add a column with a date in numeric form.

Kent
0
 
LVL 9

Assisted Solution

by:sognoct
sognoct earned 111 total points
ID: 38743073
Kdo is right

can try it on a db copy and evalutate if there is a performance issue to other queries ...
it depends on how much delete and insert are done on this table, I guess this is a statistic table in which stat_account is the id of the stat creator and stat_created is when it was created  so maybe there are no DELETE queries if there are 36 Million records. If I'm right you can (in the debug db) try to remove current indexes and add the clustered couple ...

For getting better performance INT can give half time in SELECT, you can watch differences in this article:
INT vs DATETIME vs TIMESTAMP

For the BETWEEN is true performance are the same (can evaluate with "explain" command)
0
 
LVL 32

Assisted Solution

by:awking00
awking00 earned 55 total points
ID: 38744101
Also, be sure to gather statistics on your table and indexes.
0
 

Author Comment

by:cnxmax
ID: 38744215
Thank you for all your responses! I have a lot of good information to go off of.

To clarify a few things:

We never do deletes.

We insert about 1000/records per minute during peak times and average about 100/minute over all.

We're considering purging old data (might be a good separate question on the most efficient way to delete records older than x days on a regular basis).




A few questions...

To type cast, is this the best way:

CONVERT('2012-11-01 00:00:00' , datetime)

Open in new window




Also, if I transform the dates to integers for existing and new incoming records, what would be the best way to do: '2012-11-01 12:59:02'

I'm assuming since I only want month granularity I can just use:

201211

And put an index on that column? That seems to me like it would probably lead to a big performance boost.



@MrBullwinkle-- you said "I think the engine can optimize the repeat calls if you use @parameters rather than a string literal". I'm not sure what @parameters are?



Thanks again for all your help!
0
 
LVL 45

Accepted Solution

by:
Kent Olsen earned 167 total points
ID: 38744276
Hi Max,

If it were my application, I'd store the integer date as yyyymmdd (year * 10000 + month * 100 + day).  You may only need month granularity now, but having the full date gives you more options, particularly for items where you need to search date windows.

And integer operations (compare, join, calculate) are the fastest of all operations.  Your queries will be noticeably faster!

And since this is a transaction logging system, don't use a clustered index.  It may offer advantages for some reporting, but you could see a performance hit during inserts when the cluster needs some minor reorganizing.


Good Luck,
Kent
0
 
LVL 13

Assisted Solution

by:Ryan
Ryan earned 167 total points
ID: 38744320
I'm not sure what language you're using, but an example in VB.NET would be
			Using cmd As New SqlCommand()
				cmd.CommandText = "SELECT * FROM tblTest WHERE ID=@ID"
				cmd.Connection=cnn
				cmd.Parameters.Add(New SqlParameter("@ID", myID))
				
				retVal=cmd.ExecuteScalar()
				cmd.Connection.Close()
			End Using
		return retVal

Open in new window


Notice how I don't have to cast my parameters type to the query. It does it automatically based on the type of the variable myID.


It does sound like converting those date fields to an integer field would be a huge gain for you.  You just need to change all the calls that reference that to convert to whatever you're doing.  If its Int(datevalue.Format("YYYYMM")) or something else.  I would suggest using "YYYYMMDD" just because integer can handle it (its no more overhead), and you won't kick yourself later if you want the granularity of a day.  If there's any chance you'll want less than that, you could do   "YYMMDDHHmm", and you'd be ok with year ranges from 2000 to 2021.  Max Int value is 2147483647 (2^(8*4-1))
0

Featured Post

Get proactive database performance tuning online

At Percona’s web store you can order full Percona Database Performance Audit in minutes. Find out the health of your database, and how to improve it. Pay online with a credit card. Improve your database performance now!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
Recently I was talking with Tim Sharp, one of my colleagues from our Technical Account Manager team about MongoDB’s scalability. While doing some quick training with some of the Percona team, Tim brought something to my attention...
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question