Link to home
Start Free TrialLog in
Avatar of apresto
aprestoFlag for Italy

asked on

Applying Indexes to a table

Hi All.

I have written an application that will read a number of CSV files and insert the data from them into a table (roughly 60column).  And in total there will eventually be 46million records.

Now, i want to apply a clustered index to an int field but i dont know whether to apply it to the empty shell PRE-insert, or to apply it in bulk when the insertions have finished.

I know that it will take a long time either way, if i do it during insertion it will take longer to finish the insert processes, if i do it at the end it will take about 10 hours to index, i just want to make sure the integrity of the table remains 100% - im worried that if the indexes are written during insertion that something might go wrong - paranoia

Any suggestions?
ASKER CERTIFIED SOLUTION
Avatar of fanopoe
fanopoe
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Avatar of lahousden
lahousden
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of apresto

ASKER

Thanks for the resonses guys.  i have about 500gb to play with so im not too worried about space, im just worried about integrity.

@fanapoe

what do you mean when you say integrity checks? Will this take a long time? what do they do?

Do you tihnk it is wisest to insert to an emprty shell table and then move to the indexed table because of the fact that the data is coming from CSV?

@lahousden

I was thinking about doing it upfront so that when it is done it is simply ready to use.  I havent applied the non clustered indexes on the data, i was going to do this later, i was worried about the clustered index as this takes by far the longest.  A non clustered index takes between 3-5 hours with this volume of data
If space is not an issue then applying the index afterwards will almost certainly deliver a more compact organisation of your clustered index - with some small performance advantages (at least until you start adding new rows that require page splits at the leaf level).
when I say integrity checks I mean do a manual check for any anamolies that would blow up your indexes. i.e., if you're going to put a PK on a column(s), do a query to test for dupes/nulls. Any other indexes that could fail should be manually checked. then you can either delete the offending data, or filter it before putting it into your indexed table.