Link to home
Start Free TrialLog in
Avatar of davidcahan
davidcahanFlag for United States of America

asked on

Index Creation on Table with 30 Million Rows

Have a table with approx. 30 million rows in it.  It has 4 columns (3 int's, one date) plus a PK.  I need to be able to join on the 4 columns when doing an insert to make sure the row isn't already there.  I also need to be able to group on any combination of the 4, sort on any one and possible group by 2 1 or more.

I originally had an index on each column but the index's alone were eating up 32GB of space.  I switched to an additional index on the PK with the 4 columns included and the index size dropped 12GB but I've noticed a significant decrease in select performance.  Now I'm thinking of creating an index using all 4 columns (as opposed the index of the PK with the 4 columns included, I hope that makes sense).  But I've always thought that the only way that will give me good performance is if I'm using all 4 columns to join, sort, group, etc.

Am I basically stuck with a choice between performance and index size?  that if I want good performance I need to create a sep. index on each and just eat the disk space?
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

>>but I've noticed a significant decrease in select performance. <<
Have you considered updating stats?  
Have you checked to see if the indexes need to be defragged?
ASKER CERTIFIED SOLUTION
Avatar of knightEknight
knightEknight
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of davidcahan

ASKER

Will creating on all 4 adversely affect performance when I only end up using one in some of my where clauses?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Of course, if you run a query with a WHERE clause that has a single column, and that column is not indexed, that query will be slow.
I'm guessing that the most selectable column in my case would be DateTime.  We keep track of each and everytime a file is played and in a day each file can play over 700 times.  If a unit has total files that play and each one plays 700 times that's a lot of different datetime's.

Wouldn't you agree?
sorry, that should have said "if a unit has 20 total files that play...".  

also, let me show you some common statements I'm writing against the table, maybe that will help. (i'm not writing syntactically correct SQL.  Just trying to give you a general idea)

Select count(*) from table
group by ContentID,UnitID,ZoneID,Date
having count(*) > 1

Insert into table
Select a,b,c,d from importtable left join table on a = contentid and b = unitid and c = zoneid and d=date
where table.pk is null

select unitname, contentname, date
      from table with(nolock)
      inner join units
      on table.unitid =  um_id
      inner join content
      on table.contentid = cm_id
      where table.date between '2010-01-01' and '2010-12-31'
      and accountid = 174

Besides selectability you must first consider usage.  If the datetime column is always or frequently used in your WHERE clauses then it is a good candidate for your clustered index.  The only caution I would add (if this is the case) is that if you are adding millions of new rows per day to this table, the inserts will get slower over time, even though the selects will still perform well (again, assuming the datetime column is in the WHERE clauses).
Do indexes come into play with JOIN's?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial