SQL Server 2005 - Fuzzy Grouping Task in SSIS causing TempDB to grow to ridiculously large sizes... need help

First off, I'm a SQL Server newbie and I may have gotten in over my head a little here, but you gotta learn somehow.

I have read everything I can get my hands on about the new Fuzzy Grouping feature in SSIS and I have created a package that looks for duplicates in one of my DB Tables.  The table has 6 fields and about half a million rows.  I need the package to use Fuzzy Grouping too look for "near duplicates", in the table and copy the "duplicates" to another table where they can be reviewed and eventually have the IDs resolved so that only one entry for each actually "entity" exists in the table.

The package I created works great in my test environment (much smaller table), but when it is run on the production server with the large table, the package takes almost a day to run and the last time I ran it the combined size of the TempDB files was several hundred GIGS!

I read on MSDN that the size of TempDB can become "quite large", but that's about as descriptive as they get.  I'm sure there is some basic step that I am missing that will keep the size of TempDB from growing out of control, but like I said, I'm new at this stuff, and I may have tried to "run before I really knew how to walk", so to speak.  Regardless, I need to make this work somehow and if anyone can offer some advice I would greatly appreciate it.

Thanks.
naj2576Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Vadim RappCommented:
I'm sure you are not missing anything. From what BOL says about the fuzzy grouping, and from the times disk space is mentioned, including recommendation not to run it on production server (obviously because of the possibility to eat up all disk space), it's clear that what you saw is typical. Remember, it's brand new feature, so it's not very surprising.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Databases

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.