?
Solved

Complex SQL Query

Posted on 2011-10-03
9
Medium Priority
?
512 Views
Last Modified: 2012-05-12
Greetings Expert,

    Background: We have a large table of documents (1,000,000+ entries) with a variable number of text columns and bit columns. The variable number of columns is implemented by using a DocTextFields table and a DocCheckboxFields table where only the entries that contain data are listed. There can be over 100 of these variable number of columns. Please see attached Stored-Procedure.xlsx file for more detail about how these tables are defined.

    We are looking for an efficient way to pull back the data so the following is true (see Stored-Procedure.xlsx for an exact example of how we would like the resulting table to look):

        1. The FieldId of the text field or the checkbox field should be the column header of the returned table.

        2. Only the requested fields are included. Typically maybe 10 of the 100 fields are needed.

        3. Only the requested DocIds are included. The stored procedure needs to be in a form where we can add ROW_NUMBER() OVER and pull back only 25 docs at a time.

        4. It needs to be possible to sort by the TextFields or the CheckboxFields

    We are using SQL Server 2008 and have a fair amount of flexibility as to how this should be done. Any help would be greatly appreciated.

    NOTE: I have included Schema-Script.sql and Data-Script.sql for an easy way to create a test database and try out various approaches.

    Please let me know if you need additional information.

Regards,

Karsten
Schema-Script.sql
Data-Script.sql
Stored-Procedure.xlsx
0
Comment
Question by:karstenweber
  • 3
  • 3
  • 2
  • +1
9 Comments
 
LVL 46

Expert Comment

by:aikimark
ID: 36909661
Have you considered normalizing your data design?
0
 

Author Comment

by:karstenweber
ID: 36910378
Thanks aikimark for the feedback.

I'm not quite sure I understand what you mean by normalizing the data design. As far as I'm concerned, one of the complications with our approach is that the data is so highly normalized. Instead of storing everything in a huge non-normalized table (similar to an Excel sheet), we are only storing the fields that are not empty (like storing only the cells in an Excel sheet that contain data).

Can you give us some more details on what you were thinking?

Thanks,

Karsten
0
 
LVL 46

Expert Comment

by:aikimark
ID: 36910811
>>with a variable number of text columns and bit columns
This part of the description raised my concerns that this table is not as normalized as it could/should be.  I don't see these reflected in the two SQL files you posted.
0
Free recovery tool for Microsoft Active Directory

Veeam Explorer for Microsoft Active Directory provides fast and reliable object-level recovery for Active Directory from a single-pass, agentless backup or storage snapshot — without the need to restore an entire virtual machine or use third-party tools.

 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 36911371
>>we are only storing the fields that are not empty (like storing only the cells in an Excel sheet that contain data).<<
Right, you are going to have to look into using PIVOT to convert the rows into columns.
0
 
LVL 46

Expert Comment

by:aikimark
ID: 36911477
@ac

I thought there was some other table that we weren't seeing.  Maybe I've got an entirely wrong perspective on this problem.  (wouldn't be the first time)
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 36911556
>>I thought there was some other table that we weren't seeing.<<
I have to confess I did too.  That is until I looked at the schema.
0
 
LVL 18

Accepted Solution

by:
chrismc earned 2000 total points
ID: 36911726
Hi Karsten,

You are correct, this data is highly normalised and makes sense if the front end allows for a high degree of configurability.

Before I tackle the question, could I just suggest you trim your example code files down by taking the "chaff" out when including them as it just takes us longer to work out what the meat of the code is. For instance your schema code boils down to just a couple of small table definitions, i.e.
   
CREATE TABLE DocCheckboxFields (DocId uniqueidentifier, FieldId uniqueidentifier)

Open in new window

is the only relevant piece in the first 37 lines.

So on with your question!
Dynamic SQL is the only way I can see this working. Dynamic SQL is technique where you build SQL script using SQL. i.e.
   Declare @SQL nVarChar(Max)
    Select Top 1 @SQL = 'Select [' + A.FieldId + '] = B.Text From DocTextFields'
                              + ' Where DocId = ' + @DocIdParameter
        From FieldsIncluded
    Exec sp_ExecuteSQL @SQL

Open in new window

Dynamic SQL is very very flexible BUT it is also very, very inefficient. Use it sparingly and only if there are no other alternatives.

To take that snippet further, you can loop through a table like FieldsIncluded to build the SQL script. Again Loops are bad in SQL performance terms but this is a scenario where I can see no other choice. I'd recommend creating a table variable, assuming the number of columns in the output is less than 1000. Because a table variable is assured to be in memory it keeps the loop as fast as possible. See code snippet below.

What I've shown you is just a pointer in what I think is the right direction.
It covers point 1 extensively.
Point 2 is a matter of applying the right WHERE clauses to my examples.
Point 3 is a matter of "partitioning" or "paging". I have some code where I have implemented that but it's not to hand easily.
I'll try and post that up later.
Point 4, depending on the end user viewer you use, it may be sufficient to pass it off to that tool, other wise you can build your sort order into the dynamic sql based on data or parameters passed to the routine.

Hope that gets you started!

Cheers
Chris
   


Declare @FieldsIncluded Table(RowNo Int, FieldId UniqueIdentifier)
Declare @Row Int, @MaxRow Int, @SQL nVarChar(4000)

Insert Into @FieldsIncluded(RowNo, FieldId)
    Select Row_Number() Over(FieldId), Field_Id From FieldsIncluded

Select @Row = 0, @MaxRow = Count(*) From @FieldsIncluded
Set @SQL = 'Select [DocId]'
While @Row < @MaxRow
Begin
    Select @SQL = @SQL + ', [' + FieldId + '] = Text' 
        From @FieldsIncluded Where RowNo = @Row
End
Select @SQL = @SQL + ' From DocTextFields Where DocId = @DocIdParameter'
Exec sp_ExecuteSQL @SQL

Open in new window

0
 

Author Comment

by:karstenweber
ID: 36911761
Thanks chrismc. I'll take a look at your detailed response as soon as I get a chance.

Karsten
0
 

Author Closing Comment

by:karstenweber
ID: 36914615
Thanks chrismc for your input. Much appreciated.
0

Featured Post

Granular recovery for Microsoft Exchange

With Veeam Explorer for Microsoft Exchange you can choose the Exchange Servers and restore points you’re interested in, and Veeam Explorer will present the contents of those mailbox stores for browsing, searching and exporting.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ready to get certified? Check out some courses that help you prepare for third-party exams.
One of the most important things in an application is the query performance. This article intends to give you good tips to improve the performance of your queries.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Suggested Courses

862 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question