Complex SQL Query

Greetings Expert,

    Background: We have a large table of documents (1,000,000+ entries) with a variable number of text columns and bit columns. The variable number of columns is implemented by using a DocTextFields table and a DocCheckboxFields table where only the entries that contain data are listed. There can be over 100 of these variable number of columns. Please see attached Stored-Procedure.xlsx file for more detail about how these tables are defined.

    We are looking for an efficient way to pull back the data so the following is true (see Stored-Procedure.xlsx for an exact example of how we would like the resulting table to look):

        1. The FieldId of the text field or the checkbox field should be the column header of the returned table.

        2. Only the requested fields are included. Typically maybe 10 of the 100 fields are needed.

        3. Only the requested DocIds are included. The stored procedure needs to be in a form where we can add ROW_NUMBER() OVER and pull back only 25 docs at a time.

        4. It needs to be possible to sort by the TextFields or the CheckboxFields

    We are using SQL Server 2008 and have a fair amount of flexibility as to how this should be done. Any help would be greatly appreciated.

    NOTE: I have included Schema-Script.sql and Data-Script.sql for an easy way to create a test database and try out various approaches.

    Please let me know if you need additional information.

Regards,

Karsten
Schema-Script.sql
Data-Script.sql
Stored-Procedure.xlsx
karstenweberAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

aikimarkCommented:
Have you considered normalizing your data design?
karstenweberAuthor Commented:
Thanks aikimark for the feedback.

I'm not quite sure I understand what you mean by normalizing the data design. As far as I'm concerned, one of the complications with our approach is that the data is so highly normalized. Instead of storing everything in a huge non-normalized table (similar to an Excel sheet), we are only storing the fields that are not empty (like storing only the cells in an Excel sheet that contain data).

Can you give us some more details on what you were thinking?

Thanks,

Karsten
aikimarkCommented:
>>with a variable number of text columns and bit columns
This part of the description raised my concerns that this table is not as normalized as it could/should be.  I don't see these reflected in the two SQL files you posted.
Protecting & Securing Your Critical Data

Considering 93 percent of companies file for bankruptcy within 12 months of a disaster that blocked access to their data for 10 days or more, planning for the worst is just smart business. Learn how Acronis Backup integrates security at every stage

Anthony PerkinsCommented:
>>we are only storing the fields that are not empty (like storing only the cells in an Excel sheet that contain data).<<
Right, you are going to have to look into using PIVOT to convert the rows into columns.
aikimarkCommented:
@ac

I thought there was some other table that we weren't seeing.  Maybe I've got an entirely wrong perspective on this problem.  (wouldn't be the first time)
Anthony PerkinsCommented:
>>I thought there was some other table that we weren't seeing.<<
I have to confess I did too.  That is until I looked at the schema.
chrismcCommented:
Hi Karsten,

You are correct, this data is highly normalised and makes sense if the front end allows for a high degree of configurability.

Before I tackle the question, could I just suggest you trim your example code files down by taking the "chaff" out when including them as it just takes us longer to work out what the meat of the code is. For instance your schema code boils down to just a couple of small table definitions, i.e.
   
CREATE TABLE DocCheckboxFields (DocId uniqueidentifier, FieldId uniqueidentifier)

Open in new window

is the only relevant piece in the first 37 lines.

So on with your question!
Dynamic SQL is the only way I can see this working. Dynamic SQL is technique where you build SQL script using SQL. i.e.
   Declare @SQL nVarChar(Max)
    Select Top 1 @SQL = 'Select [' + A.FieldId + '] = B.Text From DocTextFields'
                              + ' Where DocId = ' + @DocIdParameter
        From FieldsIncluded
    Exec sp_ExecuteSQL @SQL

Open in new window

Dynamic SQL is very very flexible BUT it is also very, very inefficient. Use it sparingly and only if there are no other alternatives.

To take that snippet further, you can loop through a table like FieldsIncluded to build the SQL script. Again Loops are bad in SQL performance terms but this is a scenario where I can see no other choice. I'd recommend creating a table variable, assuming the number of columns in the output is less than 1000. Because a table variable is assured to be in memory it keeps the loop as fast as possible. See code snippet below.

What I've shown you is just a pointer in what I think is the right direction.
It covers point 1 extensively.
Point 2 is a matter of applying the right WHERE clauses to my examples.
Point 3 is a matter of "partitioning" or "paging". I have some code where I have implemented that but it's not to hand easily.
I'll try and post that up later.
Point 4, depending on the end user viewer you use, it may be sufficient to pass it off to that tool, other wise you can build your sort order into the dynamic sql based on data or parameters passed to the routine.

Hope that gets you started!

Cheers
Chris
   


Declare @FieldsIncluded Table(RowNo Int, FieldId UniqueIdentifier)
Declare @Row Int, @MaxRow Int, @SQL nVarChar(4000)

Insert Into @FieldsIncluded(RowNo, FieldId)
    Select Row_Number() Over(FieldId), Field_Id From FieldsIncluded

Select @Row = 0, @MaxRow = Count(*) From @FieldsIncluded
Set @SQL = 'Select [DocId]'
While @Row < @MaxRow
Begin
    Select @SQL = @SQL + ', [' + FieldId + '] = Text' 
        From @FieldsIncluded Where RowNo = @Row
End
Select @SQL = @SQL + ' From DocTextFields Where DocId = @DocIdParameter'
Exec sp_ExecuteSQL @SQL

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
karstenweberAuthor Commented:
Thanks chrismc. I'll take a look at your detailed response as soon as I get a chance.

Karsten
karstenweberAuthor Commented:
Thanks chrismc for your input. Much appreciated.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.