Complex SQL Query

Posted on 2011-10-03
Last Modified: 2012-05-12
Greetings Expert,

    Background: We have a large table of documents (1,000,000+ entries) with a variable number of text columns and bit columns. The variable number of columns is implemented by using a DocTextFields table and a DocCheckboxFields table where only the entries that contain data are listed. There can be over 100 of these variable number of columns. Please see attached Stored-Procedure.xlsx file for more detail about how these tables are defined.

    We are looking for an efficient way to pull back the data so the following is true (see Stored-Procedure.xlsx for an exact example of how we would like the resulting table to look):

        1. The FieldId of the text field or the checkbox field should be the column header of the returned table.

        2. Only the requested fields are included. Typically maybe 10 of the 100 fields are needed.

        3. Only the requested DocIds are included. The stored procedure needs to be in a form where we can add ROW_NUMBER() OVER and pull back only 25 docs at a time.

        4. It needs to be possible to sort by the TextFields or the CheckboxFields

    We are using SQL Server 2008 and have a fair amount of flexibility as to how this should be done. Any help would be greatly appreciated.

    NOTE: I have included Schema-Script.sql and Data-Script.sql for an easy way to create a test database and try out various approaches.

    Please let me know if you need additional information.


Question by:karstenweber
    LVL 44

    Expert Comment

    Have you considered normalizing your data design?

    Author Comment

    Thanks aikimark for the feedback.

    I'm not quite sure I understand what you mean by normalizing the data design. As far as I'm concerned, one of the complications with our approach is that the data is so highly normalized. Instead of storing everything in a huge non-normalized table (similar to an Excel sheet), we are only storing the fields that are not empty (like storing only the cells in an Excel sheet that contain data).

    Can you give us some more details on what you were thinking?


    LVL 44

    Expert Comment

    >>with a variable number of text columns and bit columns
    This part of the description raised my concerns that this table is not as normalized as it could/should be.  I don't see these reflected in the two SQL files you posted.
    LVL 75

    Expert Comment

    by:Anthony Perkins
    >>we are only storing the fields that are not empty (like storing only the cells in an Excel sheet that contain data).<<
    Right, you are going to have to look into using PIVOT to convert the rows into columns.
    LVL 44

    Expert Comment


    I thought there was some other table that we weren't seeing.  Maybe I've got an entirely wrong perspective on this problem.  (wouldn't be the first time)
    LVL 75

    Expert Comment

    by:Anthony Perkins
    >>I thought there was some other table that we weren't seeing.<<
    I have to confess I did too.  That is until I looked at the schema.
    LVL 18

    Accepted Solution

    Hi Karsten,

    You are correct, this data is highly normalised and makes sense if the front end allows for a high degree of configurability.

    Before I tackle the question, could I just suggest you trim your example code files down by taking the "chaff" out when including them as it just takes us longer to work out what the meat of the code is. For instance your schema code boils down to just a couple of small table definitions, i.e.
    CREATE TABLE DocCheckboxFields (DocId uniqueidentifier, FieldId uniqueidentifier)

    Open in new window

    is the only relevant piece in the first 37 lines.

    So on with your question!
    Dynamic SQL is the only way I can see this working. Dynamic SQL is technique where you build SQL script using SQL. i.e.
       Declare @SQL nVarChar(Max)
        Select Top 1 @SQL = 'Select [' + A.FieldId + '] = B.Text From DocTextFields'
                                  + ' Where DocId = ' + @DocIdParameter
            From FieldsIncluded
        Exec sp_ExecuteSQL @SQL

    Open in new window

    Dynamic SQL is very very flexible BUT it is also very, very inefficient. Use it sparingly and only if there are no other alternatives.

    To take that snippet further, you can loop through a table like FieldsIncluded to build the SQL script. Again Loops are bad in SQL performance terms but this is a scenario where I can see no other choice. I'd recommend creating a table variable, assuming the number of columns in the output is less than 1000. Because a table variable is assured to be in memory it keeps the loop as fast as possible. See code snippet below.

    What I've shown you is just a pointer in what I think is the right direction.
    It covers point 1 extensively.
    Point 2 is a matter of applying the right WHERE clauses to my examples.
    Point 3 is a matter of "partitioning" or "paging". I have some code where I have implemented that but it's not to hand easily.
    I'll try and post that up later.
    Point 4, depending on the end user viewer you use, it may be sufficient to pass it off to that tool, other wise you can build your sort order into the dynamic sql based on data or parameters passed to the routine.

    Hope that gets you started!


    Declare @FieldsIncluded Table(RowNo Int, FieldId UniqueIdentifier)
    Declare @Row Int, @MaxRow Int, @SQL nVarChar(4000)
    Insert Into @FieldsIncluded(RowNo, FieldId)
        Select Row_Number() Over(FieldId), Field_Id From FieldsIncluded
    Select @Row = 0, @MaxRow = Count(*) From @FieldsIncluded
    Set @SQL = 'Select [DocId]'
    While @Row < @MaxRow
        Select @SQL = @SQL + ', [' + FieldId + '] = Text' 
            From @FieldsIncluded Where RowNo = @Row
    Select @SQL = @SQL + ' From DocTextFields Where DocId = @DocIdParameter'
    Exec sp_ExecuteSQL @SQL

    Open in new window


    Author Comment

    Thanks chrismc. I'll take a look at your detailed response as soon as I get a chance.


    Author Closing Comment

    Thanks chrismc for your input. Much appreciated.

    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    Join & Write a Comment

    If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
    Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
    Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
    Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.

    732 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now