Optimizing a Simple Select

I have a very simple select statement:

SELECT COUNT(*) as varTotal
FROM tbl_stat_master
WHERE stat_account = 173
AND stat_created 
BETWEEN '2012-11-01 00:00:00' 
AND '2012-12-01 00:00:00'

Open in new window

It's running on a table that has 36 Million records. I'd like to it be faster.

I have two indexes setup, one on stat_account and one on stat_created.

So first of all, is there something I should do to make the query faster? Like maybe COUNT(*) isn't a good idea, or there something better than BETWEEN?

And secondly, should I change my indexes? Make a combo index that includes account and created?

Or maybe you have another idea?

Thank you!
Who is Participating?

Improve company productivity with a Business Account.Sign Up

Kent OlsenConnect With a Mentor Data Warehouse Architect / DBACommented:
Hi Max,

If it were my application, I'd store the integer date as yyyymmdd (year * 10000 + month * 100 + day).  You may only need month granularity now, but having the full date gives you more options, particularly for items where you need to search date windows.

And integer operations (compare, join, calculate) are the fastest of all operations.  Your queries will be noticeably faster!

And since this is a transaction logging system, don't use a clustered index.  It may offer advantages for some reporting, but you could see a performance hit during inserts when the cluster needs some minor reorganizing.

Good Luck,
Kent OlsenConnect With a Mentor Data Warehouse Architect / DBACommented:
Hi cnx,

That query is very reasonable, and a very popular kind of query in the real world.  :)

The BETWEEN clause is just a shorthand way of writing the condition.  SQL actually translates that to "WHERE (A <= value1 AND A >= value2).

A composite index over stat_account and stat_created will probably help, depending on the database design.  It can't hurt to create that index and try it.

Good Luck,
RyanConnect With a Mentor Project Engineer, ElectricalCommented:
Consider converting DateTime to integer, or double (days since 1/1/1900 for example). It may not make a difference, just thinking aloud.

Also make sure indices aren't fragmented.
Build your data science skills into a career

Are you ready to take your data science career to the next step, or break into data science? With Springboard’s Data Science Career Track, you’ll master data science topics, have personalized career guidance, weekly calls with a data science expert, and a job guarantee.

sognoctConnect With a Mentor Commented:
create clustered index with stat_created and also stat_account

Create the clustered index before creating any nonclustered indexes.

I think that using <= and => is better than between (no need for engine to translate)

as much as I know count(*), count(0), count(1) have the same processing speed ... so use count(*)

if stat_created is a datetime ... you must convert the value of the where in datetime format with a convert, or system will have to convert the values from the table in the same format (slower)
RyanConnect With a Mentor Project Engineer, ElectricalCommented:
If you're calling this query multiple times with different criteria, then I think the engine can optimize the repeat calls if you use @parameters rather than a string literal.  It'll also meet the suggestion by sognoct for the type casting.

I think those are imperceptible differences in this case though.
Kent OlsenConnect With a Mentor Data Warehouse Architect / DBACommented:
Ok.  Before this train gets too far off the track.

Do NOT create a clustered index.  Yet.  That will relocate all 35,000,000 rows in a way that may make this query faster, but may also have a negative impact on other queries.  You need to make that decision before you go down this path.

There is NO difference in using BETWEEN or the <= and >= syntax.  They evaluate into exactly the same execution.  Use the one that makes the most sense to your SQL.

The other posters are correct about the dates though.  You might want to add a column with a date in numeric form.

sognoctConnect With a Mentor Commented:
Kdo is right

can try it on a db copy and evalutate if there is a performance issue to other queries ...
it depends on how much delete and insert are done on this table, I guess this is a statistic table in which stat_account is the id of the stat creator and stat_created is when it was created  so maybe there are no DELETE queries if there are 36 Million records. If I'm right you can (in the debug db) try to remove current indexes and add the clustered couple ...

For getting better performance INT can give half time in SELECT, you can watch differences in this article:

For the BETWEEN is true performance are the same (can evaluate with "explain" command)
awking00Connect With a Mentor Commented:
Also, be sure to gather statistics on your table and indexes.
cnxmaxAuthor Commented:
Thank you for all your responses! I have a lot of good information to go off of.

To clarify a few things:

We never do deletes.

We insert about 1000/records per minute during peak times and average about 100/minute over all.

We're considering purging old data (might be a good separate question on the most efficient way to delete records older than x days on a regular basis).

A few questions...

To type cast, is this the best way:

CONVERT('2012-11-01 00:00:00' , datetime)

Open in new window

Also, if I transform the dates to integers for existing and new incoming records, what would be the best way to do: '2012-11-01 12:59:02'

I'm assuming since I only want month granularity I can just use:


And put an index on that column? That seems to me like it would probably lead to a big performance boost.

@MrBullwinkle-- you said "I think the engine can optimize the repeat calls if you use @parameters rather than a string literal". I'm not sure what @parameters are?

Thanks again for all your help!
RyanConnect With a Mentor Project Engineer, ElectricalCommented:
I'm not sure what language you're using, but an example in VB.NET would be
			Using cmd As New SqlCommand()
				cmd.CommandText = "SELECT * FROM tblTest WHERE ID=@ID"
				cmd.Parameters.Add(New SqlParameter("@ID", myID))
			End Using
		return retVal

Open in new window

Notice how I don't have to cast my parameters type to the query. It does it automatically based on the type of the variable myID.

It does sound like converting those date fields to an integer field would be a huge gain for you.  You just need to change all the calls that reference that to convert to whatever you're doing.  If its Int(datevalue.Format("YYYYMM")) or something else.  I would suggest using "YYYYMMDD" just because integer can handle it (its no more overhead), and you won't kick yourself later if you want the granularity of a day.  If there's any chance you'll want less than that, you could do   "YYMMDDHHmm", and you'd be ok with year ranges from 2000 to 2021.  Max Int value is 2147483647 (2^(8*4-1))
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.