Solved

Optimizing a Simple Select

Posted on 2013-01-03
10
404 Views
Last Modified: 2013-01-24
I have a very simple select statement:

SELECT COUNT(*) as varTotal
FROM tbl_stat_master
WHERE stat_account = 173
AND stat_created 
BETWEEN '2012-11-01 00:00:00' 
AND '2012-12-01 00:00:00'

Open in new window


It's running on a table that has 36 Million records. I'd like to it be faster.

I have two indexes setup, one on stat_account and one on stat_created.

So first of all, is there something I should do to make the query faster? Like maybe COUNT(*) isn't a good idea, or there something better than BETWEEN?

And secondly, should I change my indexes? Make a combo index that includes account and created?

Or maybe you have another idea?

Thank you!
0
Comment
Question by:cnxmax
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
  • +2
10 Comments
 
LVL 45

Assisted Solution

by:Kent Olsen
Kent Olsen earned 167 total points
ID: 38741631
Hi cnx,

That query is very reasonable, and a very popular kind of query in the real world.  :)

The BETWEEN clause is just a shorthand way of writing the condition.  SQL actually translates that to "WHERE (A <= value1 AND A >= value2).

A composite index over stat_account and stat_created will probably help, depending on the database design.  It can't hurt to create that index and try it.


Good Luck,
Kent
0
 
LVL 13

Assisted Solution

by:Ryan
Ryan earned 167 total points
ID: 38741636
Consider converting DateTime to integer, or double (days since 1/1/1900 for example). It may not make a difference, just thinking aloud.

Also make sure indices aren't fragmented.
0
 
LVL 9

Assisted Solution

by:sognoct
sognoct earned 111 total points
ID: 38742026
create clustered index with stat_created and also stat_account

Create the clustered index before creating any nonclustered indexes.

I think that using <= and => is better than between (no need for engine to translate)

as much as I know count(*), count(0), count(1) have the same processing speed ... so use count(*)

if stat_created is a datetime ... you must convert the value of the where in datetime format with a convert, or system will have to convert the values from the table in the same format (slower)
0
Get Database Help Now w/ Support & Database Audit

Keeping your database environment tuned, optimized and high-performance is key to achieving business goals. If your database goes down, so does your business. Percona experts have a long history of helping enterprises ensure their databases are running smoothly.

 
LVL 13

Assisted Solution

by:Ryan
Ryan earned 167 total points
ID: 38742151
If you're calling this query multiple times with different criteria, then I think the engine can optimize the repeat calls if you use @parameters rather than a string literal.  It'll also meet the suggestion by sognoct for the type casting.

I think those are imperceptible differences in this case though.
0
 
LVL 45

Assisted Solution

by:Kent Olsen
Kent Olsen earned 167 total points
ID: 38742580
Ok.  Before this train gets too far off the track.

Do NOT create a clustered index.  Yet.  That will relocate all 35,000,000 rows in a way that may make this query faster, but may also have a negative impact on other queries.  You need to make that decision before you go down this path.

There is NO difference in using BETWEEN or the <= and >= syntax.  They evaluate into exactly the same execution.  Use the one that makes the most sense to your SQL.


The other posters are correct about the dates though.  You might want to add a column with a date in numeric form.

Kent
0
 
LVL 9

Assisted Solution

by:sognoct
sognoct earned 111 total points
ID: 38743073
Kdo is right

can try it on a db copy and evalutate if there is a performance issue to other queries ...
it depends on how much delete and insert are done on this table, I guess this is a statistic table in which stat_account is the id of the stat creator and stat_created is when it was created  so maybe there are no DELETE queries if there are 36 Million records. If I'm right you can (in the debug db) try to remove current indexes and add the clustered couple ...

For getting better performance INT can give half time in SELECT, you can watch differences in this article:
INT vs DATETIME vs TIMESTAMP

For the BETWEEN is true performance are the same (can evaluate with "explain" command)
0
 
LVL 32

Assisted Solution

by:awking00
awking00 earned 55 total points
ID: 38744101
Also, be sure to gather statistics on your table and indexes.
0
 

Author Comment

by:cnxmax
ID: 38744215
Thank you for all your responses! I have a lot of good information to go off of.

To clarify a few things:

We never do deletes.

We insert about 1000/records per minute during peak times and average about 100/minute over all.

We're considering purging old data (might be a good separate question on the most efficient way to delete records older than x days on a regular basis).




A few questions...

To type cast, is this the best way:

CONVERT('2012-11-01 00:00:00' , datetime)

Open in new window




Also, if I transform the dates to integers for existing and new incoming records, what would be the best way to do: '2012-11-01 12:59:02'

I'm assuming since I only want month granularity I can just use:

201211

And put an index on that column? That seems to me like it would probably lead to a big performance boost.



@MrBullwinkle-- you said "I think the engine can optimize the repeat calls if you use @parameters rather than a string literal". I'm not sure what @parameters are?



Thanks again for all your help!
0
 
LVL 45

Accepted Solution

by:
Kent Olsen earned 167 total points
ID: 38744276
Hi Max,

If it were my application, I'd store the integer date as yyyymmdd (year * 10000 + month * 100 + day).  You may only need month granularity now, but having the full date gives you more options, particularly for items where you need to search date windows.

And integer operations (compare, join, calculate) are the fastest of all operations.  Your queries will be noticeably faster!

And since this is a transaction logging system, don't use a clustered index.  It may offer advantages for some reporting, but you could see a performance hit during inserts when the cluster needs some minor reorganizing.


Good Luck,
Kent
0
 
LVL 13

Assisted Solution

by:Ryan
Ryan earned 167 total points
ID: 38744320
I'm not sure what language you're using, but an example in VB.NET would be
			Using cmd As New SqlCommand()
				cmd.CommandText = "SELECT * FROM tblTest WHERE ID=@ID"
				cmd.Connection=cnn
				cmd.Parameters.Add(New SqlParameter("@ID", myID))
				
				retVal=cmd.ExecuteScalar()
				cmd.Connection.Close()
			End Using
		return retVal

Open in new window


Notice how I don't have to cast my parameters type to the query. It does it automatically based on the type of the variable myID.


It does sound like converting those date fields to an integer field would be a huge gain for you.  You just need to change all the calls that reference that to convert to whatever you're doing.  If its Int(datevalue.Format("YYYYMM")) or something else.  I would suggest using "YYYYMMDD" just because integer can handle it (its no more overhead), and you won't kick yourself later if you want the granularity of a day.  If there's any chance you'll want less than that, you could do   "YYMMDDHHmm", and you'd be ok with year ranges from 2000 to 2021.  Max Int value is 2147483647 (2^(8*4-1))
0

Featured Post

Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Shadow IT is coming out of the shadows as more businesses are choosing cloud-based applications. It is now a multi-cloud world for most organizations. Simultaneously, most businesses have yet to consolidate with one cloud provider or define an offic…
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

710 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question