How can I calculate the amount of disk space needed for a SQL 2005 full text index?

Posted on 2012-09-04
Last Modified: 2012-09-05
I have a table with 1626078 rows and a column of VARCHAR (MAX). I would like to take a guess and the amount of space I need to ask for when creating the full text index on the VARCHAR (MAX) column.

I did (8002 * 1626078) and it looks like my FTI will take approx 12.5GB. I got the 8002 value from 8000 characters + 2 bytes for the termination. Then multiplied that by the amount of rows in my table.

I am using SQL 2005 so if I am not mistaken the FTI goes in a separate file.

Did I miss anything? Am I heading into disaster?

Thanks experts!
Question by:sqlagent007
    LVL 29

    Expert Comment

    by:Rich Weissler
    A few thoughts:
    First, the maximum amount that could be stored in VARCHAR(MAX) is 2 GB, not a page worth (little less than 8k).
    Second, the Full Text Index, in 2005 does go to a separate directory... but that isn't a bad thing.

    You can grab a random sample of the table (select top (n) percent [...] order by <something randomish>), put the table on a test box, and perform to a full text index on that.  From that, you can establish a rough estimate.  With the information provided in your question, the estimate for the FTI isn't possible.
    LVL 68

    Assisted Solution

    I would use the actual length of your data, not assume 8000 bytes per row (which could be (way) too low or (way) too high, depending on your specific data).

    SELECT SUM(CAST(LEN(varchar_max_column) AS bigint))
    FROM dbo.tablename

    That will typically be an *over*estimate, since noise words are eliminated, and duplicates words are not physically fully repeated in the index.
    LVL 29

    Accepted Solution

    Also remember that the full text index will also be compressed.  ("Rather than construct a B-tree structure based on a value stored in a particular row, MSFTESQL builds an inverted, stacked, compressed index structure based on individual tokens from the text being indexed.")

    You can proceed with an estimate of the space consumed by the one column, but use that as an extreme upper limit rather than a realistic estimate of the space which will be consumed by the full text catalog.
    LVL 1

    Author Closing Comment

    Thanks experts! That did the trick, the index was very compressed and only took up 50mb.

    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    Join & Write a Comment

    I've encountered valid database schemas that do not have a primary key.  For example, I use LogParser from Microsoft to push IIS logs into a SQL database table for processing and analysis.  However, occasionally due to user error or a scheduled task…
    In this article I will describe the Copy Database Wizard method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
    This video is in connection to the article "The case of a missing mobile phone (". It will help one to understand clearly the steps to track a lost android phone.
    Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…

    746 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now