How to build a "crosstab" query from a single table?

I have a table that I want to build a "crosstab" query from.  MSAccess has this ability using the "PIVOT" command, but I need a standard SQL way of doing it (I am using Interbase SQL Server).

The table has the following fields (Unique on RID_QID):

     RID (Respondent ID)
     QID (Question ID)

I want to realize a query that has the RID field as the row header, the QID as the column header and the answer as the cell data.

Obviously, the trick is that the columns are dynamic depending on what records are in the table.

Thanks in Advance,
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

There is no pivot construct in ANSI SQL - this is an enhancement that Access provided. The solution is to return a query that can be programmatically constructed into a crosstab in whatever receives the data. A stored procedure could also do this, or any 3GL could populate some grid object with the returned data.

In the simple scenario above, you would simply order by RID then by QID. Each record that comes back would be a new row (since you have implied that RID is unique for a given QID since you can only have one answer at the intersection). Each new QID for a given RID would result in a new column that would be added if the QID did not already exist as a column.

There are kludges to get around this. For instance, you could hardcode how the columns are handled if you know the values in advance. That's not a good solution, but you did ask for a standard (i.e. ANSI) SQL way to do it and based on my searches a few months ago on the same problem, there are none.


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
derrickhAuthor Commented:
OK, I have also determined since I posted what you have said above.  However, I am further constrained by the fact that I am dealing with very large result sets with hundreds or thousands of RIDs (respondents) and 100-200 QIDs (questions).

Considering this, i need to do the work on the SQL server.  The database and application are almost complete, but this query is required and I am not an expert on stored procedure language (and the server-side objects' behvior) for Interbase.  I will be happy to post even more points if you can help me put together the stored procedure.  It has to be as efficient as possible because of the large data size.

Now we're on to a different question. The question now is "how can I achieve the equivalent of a crosstab query in an efficient manner given that a cross-platform ANSI-SQL solution does not exist".

Honestly, I would do this in whatever target is going to receive the data. I would have a query along the lines of:

FROM results

This will order by RID giving you the rows, and then by QID which drives the columns. In whatever application is building up the results, you would simply track changes in RID and QID values to determine if you have to add a row or QID. The drag is that QID is contained within RID for this example so you would have to check and see if a QID column is already created - if so, you would simply use that column again to insert the value. If not, you would add the column to whatever "thing" you're using to contain the information, whether it be a grid control, array, etc.

This is typically what reporting engines and pivot tables do if they support this function. For instance, in Oracle this exact thing is done to create a crosstab/matrix report. It orders the data this way and builds up the contents client-side.

Grabbing the data server side with one query makes sense - doing the assembly of that data is probably best done client-side where it will be most efficient. It is generally CPU-bound and I don't see the benefits of putting that on the server - let the server give you the data as fast as it can, and then assemble on your client application.

Powerful Yet Easy-to-Use Network Monitoring

Identify excessive bandwidth utilization or unexpected application traffic with SolarWinds Bandwidth Analyzer Pack.

Having said all of that, I'll talk a bit about specific solutions - then I'm pretty much spent on this topic I'm afraid :) We've gotten a bit away from the original question I have to make a few guesses. You did mention you're using interbase however.

Some tools make it convenient to do client-side crosstab assembly. Delphi 5 enterprise supports decision cubes/crosstabs. C++ Builder also provides this functionality.

In a tool like Visual C++, you could choose an unbound grid control, and simply add rows and elements using its standard interface. It's quite easy to do and you can easily check for existing columns as part of the interface before you create a new column based on a new QID.

There are a number of reporting tools that also support crosstab queries. builds an ActiveX control that you can deploy by Web or include in your applications that will build dynamic crosstabs, i.e. pivot tables with run-time manipulation. They are quite nice. Discoverer 3i for Oracle also supports them if my memory serves me correctly. There are some cheaper stand-alone reporting tools (I haven't used them) like ReportSmith that create crosstab reports.

Excel supports pivot tables and you can easily set this up to interact with Oracle through ODBC. This would achieve what you want and allow you to export to a number of formats/presentations.

As far as a stored procedure goes, I think you will end up with a very specific and kludgy solution. I would recommend against it. Unless you can make hard assumptions about the RID values to drive your columns, I can't think of a good solution. If the RID and QID columns are driven from lookup tables and you're happy to have blanks/NULLs in the crosstab where no RID/QID intersection exists, you could do something of the form:

select (... defaults ... )
where ...
and not exists ( query below )
select ( ... values ... )
where ... (query reused in not exists above)
order by ...

The stored procedure could then take these values and concatenate them for each RID.

I think it may be time to close this question and create a new and more specific one. The final answer will not reflect the original question and altering the original question will make this thread more confusing.

derrickhAuthor Commented:
Let me get one more clarification and we'll close things up.  I still think its relevent to the question of how to build a crosstab query, since it is not obvious.  

I understand that you're recommending to do the construction on the client (I'm using JBuilder BTW).  I originally considered this solution, but was tenetive since I could end up having 2 copies of the large dataset on the client.  The way I see it, I would have:


   2) The constructed dataset (Adding columns as needed (QIDs) and Rows as needed (RIDs), then binding this constructed dataset to a gridcontrol.  In JBuilder, constructing a dataset then binding it to a Grid is preferable since the dataset can be easily cached using JBuilder's JDataStore object.  An unbound grid cannot cache its data.

So you are suggesting to do things on the client side so not to bog down the server with a UNION query like you presented, right?  I will probably pursue this solution, but for my own edification, I would like to try it both ways and see this difference in performance.

Can you please provide me with a little more information that will help me construct the Stored Procedure Query on the server?  How does the query you included create a table with columns defined in the QID column of the original table?  This is the part that really confuses me.  The only way I could think of was to do a "SELECT QID DISTINCT" first then string concatinating the results in the final query.  Does your suggestion get around this?

Let me give you a few more pieces of information to clarify what I need the stored procedure to return.

1) The query will not be scoped.  That is, I need it to return the entire table's data.

2) There should be no other tables involved.

3) I don't mind NULLs in intersection points that don't have an associated record in the table.

It is unclear to me what should be put into the "defaults", "query below" & "values" tags in your SQL.

Thanks for your help,
Lots of questions :)

First, I am suggesting that you do the assembly client-side. Unless there are things in your situation I haven't thought about, I would rather take the assembly of data client-side leaving the database to do what it does best - query data and return the minimal amount to get the job done. This keeps network traffic to a minimum and keeps the database from doing a bunch of logic that it may not be optimized for.

The UNION query may not be all that slow on the database - you don't even need it since just ordering by RID,QID will bring the data back the way you want it for JBuilder assembly.

I'm not sure why you would go for a data-bound control unless you have to do some manipulation that goes back to the server - this could be a drag. If you don't need to manipulate data in the control I would recommend an unbound control that you programmatically populate yourself.

Because you're returning the whole table, I'm going to assume there are more than a few records in the table. Minimizing the transfer of data is a benefit that I would go for - if you build up the crosstab matrix server-side, you have to bring all of the information back for an x*y matrix of information - some/much of which are empty cells.

Here is a possible query that you could use if you insist on getting the data back for easy construction of a crosstab - note that it is not very efficient:

desc demo;
name   null?   type
rid            NUMBER(2)
qid            NUMBER(2)
answer         char(1)

select * from demo;
RID   QID   Answer
1     2     y
1     2     y
2     3     n
3     2     n
4     4     y
5     3     n

SELECT a.rid,b.qid, '-'
WHERE NOT EXISTS ( SELECT 'x' FROM demo c WHERE c.rid = a.rid and c.qid = b.qid )
SELECT rid,qid,answer
FROM demo;

RID     QID     Answer
1      2      y
1      3      -
1      4      -
2      2      -
2      3      n
2      4      -
3      2      n
3      3      -
3      4      -
4      2      -
4      3      -
4      4      y
5      2      -
5      3      n
5      4      -

Notice that you now have a full cross-tab-ready array based on all of the distinct RID and QID values. This is dead easy to populate into a table and you should be able to base your data-bound grid on this query as well.

I can't give you a stored procedure solution that I can guarantee will work with your database (I'm on Oracle) but the query above should make it straight forward for you to progress.

derrickhAuthor Commented:

Thanks for the info, but I'm afraid I didn't make my self clear.  The structure of the returned query (using revided data from your example) should look like:


RID   QID   Answer
1     1     y
1     2     y
2     2     n
3     3     n
4     3     y
5     4     n


RID    1        2        3        4
1       y         y        -         -
2       -          -        n        n
3       -          -        n        -
4       -          -        y        -
5       -          -        -        n

It seems that the result set you list only achieves the insertion of null records, but doesn't change the resulting query STRUCTURE.

As for network efficiency, I think traffic will be lower if I can do this on the server because I expect to have a low percentage of NULLs (most questions will be filled in) and also because the table structure above eliminates having to return a QID for each element of data since the QIDs are now column headers.

I swear we're almost done here.  :-)

I understood you perfectly. The structure that my query returned is 100% conducive to _constructing_ the crosstab result. Your code would have a very easy time assembling the results into a matrix format because all values come back for _all_ cells. In other words, whereas the standard select returns 6 results, the query I provided provides 5x4 results for a matrix consisting of 5 unique RIDs and 4 unique QIDs.

As I pointed out, there is no standard query to create a crosstab query on the server. You would do the assembly server-side. I argue "why bother" when the assembly client-side is faster and arguably easier.

I cannot see anyway you can construct a query that will allow you to leverage a data-bound grid. You can perhaps create a stored procedure that will return OUT parameters or return a VARCHAR2 string for each row, but I don't think this is effective.

derrickhAuthor Commented:
I just finished implementing a client side crosstab DataSet (JBuilder's object for representing 2 dimensional data) by first adding columns for each unique QID in the dataset, then processing the simple SQL statement "SELECT RID,QID,ANSWER FROM RESPONSEDATA".  

Unfortunately, this was what I was trying to avoid doing on the client side.  I don't care if there are missing elements in the grid, since the column values are initialized to NULL and I only set the items in a row for which there are rows in the raw data table.  I thought that if I could do basically the same thing on the server(get the rows in one query, construct a temporary table and then populate and return it), it would be faster (I wasn't too worried about tying up the server CPU for this particular application).

Finding an implementation a server side Stored Procedure to do this has proven very elusive, so I guess after I get this project stablized, I will have to spend some time learning Interbase's Stored Procedure language and do it myself.

I appreciate your effort given to this question.

Keep in mind that a temp table probably isn't your way to do things. You will probably be hitting the disk for the temp table create/update. Also, when you insert the information into the temp table, how do you know how many columns you would have? You don't know how many QIDs/RIDs you have in advance - you could hypothetically create the table at runtime (DDL) then populate it (DML) but I suspect that would be very slow.

Good luck,

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.