SQL Server 2005 - Fuzzy Grouping Task in SSIS causing TempDB to grow to ridiculously large sizes... need help

Posted on 2006-03-27
Last Modified: 2006-11-18
First off, I'm a SQL Server newbie and I may have gotten in over my head a little here, but you gotta learn somehow.

I have read everything I can get my hands on about the new Fuzzy Grouping feature in SSIS and I have created a package that looks for duplicates in one of my DB Tables.  The table has 6 fields and about half a million rows.  I need the package to use Fuzzy Grouping too look for "near duplicates", in the table and copy the "duplicates" to another table where they can be reviewed and eventually have the IDs resolved so that only one entry for each actually "entity" exists in the table.

The package I created works great in my test environment (much smaller table), but when it is run on the production server with the large table, the package takes almost a day to run and the last time I ran it the combined size of the TempDB files was several hundred GIGS!

I read on MSDN that the size of TempDB can become "quite large", but that's about as descriptive as they get.  I'm sure there is some basic step that I am missing that will keep the size of TempDB from growing out of control, but like I said, I'm new at this stuff, and I may have tried to "run before I really knew how to walk", so to speak.  Regardless, I need to make this work somehow and if anyone can offer some advice I would greatly appreciate it.

Question by:naj2576
    1 Comment
    LVL 40

    Accepted Solution

    I'm sure you are not missing anything. From what BOL says about the fuzzy grouping, and from the times disk space is mentioned, including recommendation not to run it on production server (obviously because of the possibility to eat up all disk space), it's clear that what you saw is typical. Remember, it's brand new feature, so it's not very surprising.

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    How to improve team productivity

    Quip adds documents, spreadsheets, and tasklists to your Slack experience
    - Elevate ideas to Quip docs
    - Share Quip docs in Slack
    - Get notified of changes to your docs
    - Available on iOS/Android/Desktop/Web
    - Online/Offline

    APEX (Application Express) is used to develop a web application from Oracle. SQL Workshop is one of the tools that comes with Oracle APEX to query or modify the database objects or to make any changes to the structure.
    Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
    Video by: Steve
    Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
    Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

    779 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    16 Experts available now in Live!

    Get 1:1 Help Now