Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Optimizing a Simple Select

Posted on 2013-01-03
10
Medium Priority
?
417 Views
Last Modified: 2013-01-24
I have a very simple select statement:

SELECT COUNT(*) as varTotal
FROM tbl_stat_master
WHERE stat_account = 173
AND stat_created 
BETWEEN '2012-11-01 00:00:00' 
AND '2012-12-01 00:00:00'

Open in new window


It's running on a table that has 36 Million records. I'd like to it be faster.

I have two indexes setup, one on stat_account and one on stat_created.

So first of all, is there something I should do to make the query faster? Like maybe COUNT(*) isn't a good idea, or there something better than BETWEEN?

And secondly, should I change my indexes? Make a combo index that includes account and created?

Or maybe you have another idea?

Thank you!
0
Comment
Question by:cnxmax
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
  • +2
10 Comments
 
LVL 46

Assisted Solution

by:Kent Olsen
Kent Olsen earned 668 total points
ID: 38741631
Hi cnx,

That query is very reasonable, and a very popular kind of query in the real world.  :)

The BETWEEN clause is just a shorthand way of writing the condition.  SQL actually translates that to "WHERE (A <= value1 AND A >= value2).

A composite index over stat_account and stat_created will probably help, depending on the database design.  It can't hurt to create that index and try it.


Good Luck,
Kent
0
 
LVL 13

Assisted Solution

by:Ryan
Ryan earned 668 total points
ID: 38741636
Consider converting DateTime to integer, or double (days since 1/1/1900 for example). It may not make a difference, just thinking aloud.

Also make sure indices aren't fragmented.
0
 
LVL 9

Assisted Solution

by:sognoct
sognoct earned 444 total points
ID: 38742026
create clustered index with stat_created and also stat_account

Create the clustered index before creating any nonclustered indexes.

I think that using <= and => is better than between (no need for engine to translate)

as much as I know count(*), count(0), count(1) have the same processing speed ... so use count(*)

if stat_created is a datetime ... you must convert the value of the where in datetime format with a convert, or system will have to convert the values from the table in the same format (slower)
0
Application Discovery Service in AWS

In the era of the cloud, customers migrating away from their existing on-premise infrastructure. This requires lots of planning, strategies, and effort to identify their existing resources and determine how best to migrate.  Datacenter migrations happen in four phases -

 
LVL 13

Assisted Solution

by:Ryan
Ryan earned 668 total points
ID: 38742151
If you're calling this query multiple times with different criteria, then I think the engine can optimize the repeat calls if you use @parameters rather than a string literal.  It'll also meet the suggestion by sognoct for the type casting.

I think those are imperceptible differences in this case though.
0
 
LVL 46

Assisted Solution

by:Kent Olsen
Kent Olsen earned 668 total points
ID: 38742580
Ok.  Before this train gets too far off the track.

Do NOT create a clustered index.  Yet.  That will relocate all 35,000,000 rows in a way that may make this query faster, but may also have a negative impact on other queries.  You need to make that decision before you go down this path.

There is NO difference in using BETWEEN or the <= and >= syntax.  They evaluate into exactly the same execution.  Use the one that makes the most sense to your SQL.


The other posters are correct about the dates though.  You might want to add a column with a date in numeric form.

Kent
0
 
LVL 9

Assisted Solution

by:sognoct
sognoct earned 444 total points
ID: 38743073
Kdo is right

can try it on a db copy and evalutate if there is a performance issue to other queries ...
it depends on how much delete and insert are done on this table, I guess this is a statistic table in which stat_account is the id of the stat creator and stat_created is when it was created  so maybe there are no DELETE queries if there are 36 Million records. If I'm right you can (in the debug db) try to remove current indexes and add the clustered couple ...

For getting better performance INT can give half time in SELECT, you can watch differences in this article:
INT vs DATETIME vs TIMESTAMP

For the BETWEEN is true performance are the same (can evaluate with "explain" command)
0
 
LVL 32

Assisted Solution

by:awking00
awking00 earned 220 total points
ID: 38744101
Also, be sure to gather statistics on your table and indexes.
0
 

Author Comment

by:cnxmax
ID: 38744215
Thank you for all your responses! I have a lot of good information to go off of.

To clarify a few things:

We never do deletes.

We insert about 1000/records per minute during peak times and average about 100/minute over all.

We're considering purging old data (might be a good separate question on the most efficient way to delete records older than x days on a regular basis).




A few questions...

To type cast, is this the best way:

CONVERT('2012-11-01 00:00:00' , datetime)

Open in new window




Also, if I transform the dates to integers for existing and new incoming records, what would be the best way to do: '2012-11-01 12:59:02'

I'm assuming since I only want month granularity I can just use:

201211

And put an index on that column? That seems to me like it would probably lead to a big performance boost.



@MrBullwinkle-- you said "I think the engine can optimize the repeat calls if you use @parameters rather than a string literal". I'm not sure what @parameters are?



Thanks again for all your help!
0
 
LVL 46

Accepted Solution

by:
Kent Olsen earned 668 total points
ID: 38744276
Hi Max,

If it were my application, I'd store the integer date as yyyymmdd (year * 10000 + month * 100 + day).  You may only need month granularity now, but having the full date gives you more options, particularly for items where you need to search date windows.

And integer operations (compare, join, calculate) are the fastest of all operations.  Your queries will be noticeably faster!

And since this is a transaction logging system, don't use a clustered index.  It may offer advantages for some reporting, but you could see a performance hit during inserts when the cluster needs some minor reorganizing.


Good Luck,
Kent
0
 
LVL 13

Assisted Solution

by:Ryan
Ryan earned 668 total points
ID: 38744320
I'm not sure what language you're using, but an example in VB.NET would be
			Using cmd As New SqlCommand()
				cmd.CommandText = "SELECT * FROM tblTest WHERE ID=@ID"
				cmd.Connection=cnn
				cmd.Parameters.Add(New SqlParameter("@ID", myID))
				
				retVal=cmd.ExecuteScalar()
				cmd.Connection.Close()
			End Using
		return retVal

Open in new window


Notice how I don't have to cast my parameters type to the query. It does it automatically based on the type of the variable myID.


It does sound like converting those date fields to an integer field would be a huge gain for you.  You just need to change all the calls that reference that to convert to whatever you're doing.  If its Int(datevalue.Format("YYYYMM")) or something else.  I would suggest using "YYYYMMDD" just because integer can handle it (its no more overhead), and you won't kick yourself later if you want the granularity of a day.  If there's any chance you'll want less than that, you could do   "YYMMDDHHmm", and you'd be ok with year ranges from 2000 to 2021.  Max Int value is 2147483647 (2^(8*4-1))
0

Featured Post

Survive A High-Traffic Event with Percona

Your application or website rely on your database to deliver information about products and services to your customers. You can’t afford to have your database lose performance, lose availability or become unresponsive – even for just a few minutes.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Microsoft Access is a place to store data within tables and represent this stored data using multiple database objects such as in form of macros, forms, reports, etc. After a MS Access database is created there is need to improve the performance and…
Backups and Disaster RecoveryIn this post, we’ll look at strategies for backups and disaster recovery.
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question