How do I figure out the most efficient SQL query for a table with a clustered composite primary key?
Posted on 2012-04-04
I'm taking a class in SQL and am a relative newbie in the area of constructing high performing SQL queries against large data sets. I'm currently looking at a problem that I'm having difficulty figuring out the best approach/query.
Consider the following:
There is a table defined as:
CREATE TABLE [Foo](
[load_id] [int] NOT NULL,
[master_id] [varchar](20) NOT NULL,
[group_id] [varchar](20) NOT NULL,
[section_id] [varchar](20) NOT NULL,
[datetime_stamp] [datetime] NULL
[... other data columns ...]
CONSTRAINT [pk_Foo] PRIMARY KEY CLUSTERED
( [load_id] ASC,
This is a big table with N million records
The table gets appended with new data multiple times a day.
Every time new records get appended to the table, approximately 32K records are added.
All 32K records have the same load_id.
The load_id is incremented by one each time we load a batch of 32K records
(the 1st 32K entries have load_id=1, the next 32K has load_id=2, etc...).
The datetime_stamp field shows the time at which the entries
were loaded and is the same for all 32K entries in a single load.
Q::What's the most efficient way to return the 1st of Foos for the current day?
Today, records were loaded into this table at 9am, 12pm and 3pm.
At 5pm today we want to know what foos were loaded at 9am since that is the first load that occurred today.
For any given day, there can be different number of loads and the times that the loads occur will vary.
So what I think I want to do here is query for the requested day and return all the records where the load_id == 1. I think the key to a good solution surrounds the order that the items occur in your query's where clause.
I'm a little confused about the effects of a clustered composite key and how that would effect the performance of the query.
Datetime_stamp is not indexed so I believe it will probably perform a full table scan (??)
If it wasn't a clustered index, I'd probably search for load_id then followed by date.
There's something that I'm missing here. I've been scratching my head for a while.
What is the most efficient query here?
Thanks in advance,