Posted on 2014-03-11
Last Modified: 2014-04-01
I have the following table

CompanyId  int(2)          Primary key
CustomerId char(20)     Primary key
SaleDate        DateTime
SalesAmount   Decimal(12,2)
SalesStatus      char(1)

This table has more than 8989998788 records

I am using the below where condition to fetch the  records

select customerId , SaleDate,SalesAmount from  Customer_sales
where CustomerId >'12345'
and CustomerId <'879999'

Should I have to include the CompanyId   in the where condition since it is a composite primary key ?   Will this improve the query performance ?

And,  Does where clause in the same order how the composite primary key builded ?
 (ie)  where CompanyId =12 and CustomerId >'12345'
Question by:Varshini S
LVL 39

Assisted Solution

lcohan earned 142 total points
ID: 39921694
"Should I have to include the CompanyId   in the where condition since it is a composite primary key ?   Will this improve the query performance ?"

Only if you put the CompanyId first in the where clause like you did above.
CompanyId =12 and CustomerId >'12345'

where CompanyId =12 and CustomerId between '12345' and '879999'

As much costly as it may look I would also try to add an index on CustomerId then try the
just the where CustomerId between '12345' and '879999' to see the difference.
LVL 21

Assisted Solution

by:Dale Burrell
Dale Burrell earned 71 total points
ID: 39921698
Whether you include the CompanyId depends on the requirements for the data you are trying to obtain.

It may improve performance, anything the reduces the number of records returned will usually be quicker. But if the query is running slow a good index will probably do more for you.

There is no order in a where clause - all parts are equal and will be executed in whatever order SQL Server sees fit.
LVL 40

Assisted Solution

by:Kyle Abrahams
Kyle Abrahams earned 71 total points
ID: 39921699

You need to include both columns unless customerId is first in the key.

The actual order (companyID = 12 and customer >'12345') vs (customer>'12345' and companyID = 12) shouldn't matter.
Use Case: Protecting a Hybrid Cloud Infrastructure

Microsoft Azure is rapidly becoming the norm in dynamic IT environments. This document describes the challenges that organizations face when protecting data in a hybrid cloud IT environment and presents a use case to demonstrate how Acronis Backup protects all data.

LVL 13

Expert Comment

ID: 39921705
If you have queries that do not use the CompanyId but only CustomerId, consider a normal non-clustered index on just CustomerId. The system is able to figure it out if you rearrange the order in the WHERE clause if you do want to include CompanyId.

You have a bizarrely high number of records.  I assume 9 trillion is an exaggeration.

If you have an excessive number of records, like a few hundred million or more, consider looking into partitioning the table on CompanyId.

If there are a lot of inserts and updates, be sure to schedule a regular check of the index and rebuild as needed.
LVL 39

Assisted Solution

lcohan earned 142 total points
ID: 39921709
"There is no order in a where clause - all parts are equal and will be executed in whatever order SQL Server sees fit. "

I found that In SQL (unfortunately) comparing to other databases the order of columns in WHERE must match the INDEX columns order for best performance.

Also I see you use the quotes around the Customer - isn't it that a INT type column rather than text?

Auch... I just noticed that CustomerId is char(20) in PK and that's definitely not helping in this case if you can add the CompanyId prior to Customer that should definitely help.

I bet if you check the query plan with that structure and have the single quotes around the customer id as a number you will see implicit cast that SQL has to do to match the data type.
LVL 13

Assisted Solution

magarity earned 72 total points
ID: 39921741
In the links you've given, the first one is a guy who says he's heard people claim order matters but he's unable to reproduce it or notice that himself. The second is a forum with a lot of people saying that's been hashed over already and proven not the case. The third link is about Oracle.
CustomerId is listed in the table definition in the original question as a character string 20 so it needs to be in quotes.
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 72 total points
ID: 39921846
For an index lookup to be effective, you must include the first column in the index.  Other index columns, in order, should follow whenever available.

For your specific example, if the clustering key is:
( CompanyId, CustomerId )
then you must provide the CompanyId, or your query will have to scan the entire table (unless an appropriate nonclustered index is available, of course).

As an analogy, think of a phone book, which is effectively clustered on ( last_name, first_name ).  If I provide you a last name, you can do an effective lookup.  But you can't quickly tell me everyone with a first name of John.  An RDBMS (SQL / Oracle / whatever) index is roughly the same restriction.
LVL 48

Accepted Solution

PortletPaul earned 72 total points
ID: 39922036
>>"The third link is about Oracle."

I see MySQL, Oracle, PostgreSQL, and SQL Server provided in that reference (i.e. the discussion is generic about index design - not queries) and it contains the following (emphasis added):
Full Table Scan

The database does not use the index because it cannot use single columns from a concatenated index arbitrarily. A closer look at the index structure makes this clear.


The ordering of a two-column index is therefore like the ordering of a telephone directory: it is first sorted by surname, then by first name. That means that a two-column index does not support searching on the second column alone; that would be like searching a telephone directory by first name.
It then goes on to make this assertion:
Even though the two-index solution delivers very good select performance as well, the single-index solution is preferable. It not only saves storage space, but also the maintenance overhead for the second index. The fewer indexes a table has, the better the insert, delete and update performance.

So that article is arguing that you wouldn't need both employees_pk and subsidiary_id in an index and suggests singular field indexes would be preferred. However this is not relevant for this question as CompanyId, CustomerId form the primary key and (in SQL Server) they will form the clustered index, and hence both fields should be used in the where clause to take advantage of that index.  If only wanting to filter by CustomerId then an additional index is needed as already discussed by others.

The article does not appear to make any claim as the sequence required within a where clause.

Featured Post

Migrating Your Company's PCs

To keep pace with competitors, businesses must keep employees productive, and that means providing them with the latest technology. This document provides the tips and tricks you need to help you migrate an outdated PC fleet to new desktops, laptops, and tablets.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
I have a large data set and a SSIS package. How can I load this file in multi threading?
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question