How to build a "crosstab" query from a single table?

Posted on 2000-01-17
Medium Priority
Last Modified: 2008-03-10
I have a table that I want to build a "crosstab" query from.  MSAccess has this ability using the "PIVOT" command, but I need a standard SQL way of doing it (I am using Interbase SQL Server).

The table has the following fields (Unique on RID_QID):

     RID (Respondent ID)
     QID (Question ID)

I want to realize a query that has the RID field as the row header, the QID as the column header and the answer as the cell data.

Obviously, the trick is that the columns are dynamic depending on what records are in the table.

Thanks in Advance,
Question by:derrickh
  • 6
  • 4

Accepted Solution

s_franklin earned 1200 total points
ID: 2376397
There is no pivot construct in ANSI SQL - this is an enhancement that Access provided. The solution is to return a query that can be programmatically constructed into a crosstab in whatever receives the data. A stored procedure could also do this, or any 3GL could populate some grid object with the returned data.

In the simple scenario above, you would simply order by RID then by QID. Each record that comes back would be a new row (since you have implied that RID is unique for a given QID since you can only have one answer at the intersection). Each new QID for a given RID would result in a new column that would be added if the QID did not already exist as a column.

There are kludges to get around this. For instance, you could hardcode how the columns are handled if you know the values in advance. That's not a good solution, but you did ask for a standard (i.e. ANSI) SQL way to do it and based on my searches a few months ago on the same problem, there are none.


Author Comment

ID: 2376751
OK, I have also determined since I posted what you have said above.  However, I am further constrained by the fact that I am dealing with very large result sets with hundreds or thousands of RIDs (respondents) and 100-200 QIDs (questions).

Considering this, i need to do the work on the SQL server.  The database and application are almost complete, but this query is required and I am not an expert on stored procedure language (and the server-side objects' behvior) for Interbase.  I will be happy to post even more points if you can help me put together the stored procedure.  It has to be as efficient as possible because of the large data size.


Expert Comment

ID: 2377234
Now we're on to a different question. The question now is "how can I achieve the equivalent of a crosstab query in an efficient manner given that a cross-platform ANSI-SQL solution does not exist".

Honestly, I would do this in whatever target is going to receive the data. I would have a query along the lines of:

FROM results

This will order by RID giving you the rows, and then by QID which drives the columns. In whatever application is building up the results, you would simply track changes in RID and QID values to determine if you have to add a row or QID. The drag is that QID is contained within RID for this example so you would have to check and see if a QID column is already created - if so, you would simply use that column again to insert the value. If not, you would add the column to whatever "thing" you're using to contain the information, whether it be a grid control, array, etc.

This is typically what reporting engines and pivot tables do if they support this function. For instance, in Oracle this exact thing is done to create a crosstab/matrix report. It orders the data this way and builds up the contents client-side.

Grabbing the data server side with one query makes sense - doing the assembly of that data is probably best done client-side where it will be most efficient. It is generally CPU-bound and I don't see the benefits of putting that on the server - let the server give you the data as fast as it can, and then assemble on your client application.

A proven path to a career in data science

At Springboard, we know how to get you a job in data science. With Springboard’s Data Science Career Track, you’ll master data science  with a curriculum built by industry experts. You’ll work on real projects, and get 1-on-1 mentorship from a data scientist.


Expert Comment

ID: 2377370
Having said all of that, I'll talk a bit about specific solutions - then I'm pretty much spent on this topic I'm afraid :) We've gotten a bit away from the original question I have to make a few guesses. You did mention you're using interbase however.

Some tools make it convenient to do client-side crosstab assembly. Delphi 5 enterprise supports decision cubes/crosstabs. C++ Builder also provides this functionality.

In a tool like Visual C++, you could choose an unbound grid control, and simply add rows and elements using its standard interface. It's quite easy to do and you can easily check for existing columns as part of the interface before you create a new column based on a new QID.

There are a number of reporting tools that also support crosstab queries.
www.activereports.com builds an ActiveX control that you can deploy by Web or include in your applications that will build dynamic crosstabs, i.e. pivot tables with run-time manipulation. They are quite nice. Discoverer 3i for Oracle also supports them if my memory serves me correctly. There are some cheaper stand-alone reporting tools (I haven't used them) like ReportSmith that create crosstab reports.

Excel supports pivot tables and you can easily set this up to interact with Oracle through ODBC. This would achieve what you want and allow you to export to a number of formats/presentations.

As far as a stored procedure goes, I think you will end up with a very specific and kludgy solution. I would recommend against it. Unless you can make hard assumptions about the RID values to drive your columns, I can't think of a good solution. If the RID and QID columns are driven from lookup tables and you're happy to have blanks/NULLs in the crosstab where no RID/QID intersection exists, you could do something of the form:

select (... defaults ... )
where ...
and not exists ( query below )
select ( ... values ... )
where ... (query reused in not exists above)
order by ...

The stored procedure could then take these values and concatenate them for each RID.

I think it may be time to close this question and create a new and more specific one. The final answer will not reflect the original question and altering the original question will make this thread more confusing.


Author Comment

ID: 2377720
Let me get one more clarification and we'll close things up.  I still think its relevent to the question of how to build a crosstab query, since it is not obvious.  

I understand that you're recommending to do the construction on the client (I'm using JBuilder BTW).  I originally considered this solution, but was tenetive since I could end up having 2 copies of the large dataset on the client.  The way I see it, I would have:


   2) The constructed dataset (Adding columns as needed (QIDs) and Rows as needed (RIDs), then binding this constructed dataset to a gridcontrol.  In JBuilder, constructing a dataset then binding it to a Grid is preferable since the dataset can be easily cached using JBuilder's JDataStore object.  An unbound grid cannot cache its data.

So you are suggesting to do things on the client side so not to bog down the server with a UNION query like you presented, right?  I will probably pursue this solution, but for my own edification, I would like to try it both ways and see this difference in performance.

Can you please provide me with a little more information that will help me construct the Stored Procedure Query on the server?  How does the query you included create a table with columns defined in the QID column of the original table?  This is the part that really confuses me.  The only way I could think of was to do a "SELECT QID DISTINCT" first then string concatinating the results in the final query.  Does your suggestion get around this?

Let me give you a few more pieces of information to clarify what I need the stored procedure to return.

1) The query will not be scoped.  That is, I need it to return the entire table's data.

2) There should be no other tables involved.

3) I don't mind NULLs in intersection points that don't have an associated record in the table.

It is unclear to me what should be put into the "defaults", "query below" & "values" tags in your SQL.

Thanks for your help,

Expert Comment

ID: 2377794
Lots of questions :)

First, I am suggesting that you do the assembly client-side. Unless there are things in your situation I haven't thought about, I would rather take the assembly of data client-side leaving the database to do what it does best - query data and return the minimal amount to get the job done. This keeps network traffic to a minimum and keeps the database from doing a bunch of logic that it may not be optimized for.

The UNION query may not be all that slow on the database - you don't even need it since just ordering by RID,QID will bring the data back the way you want it for JBuilder assembly.

I'm not sure why you would go for a data-bound control unless you have to do some manipulation that goes back to the server - this could be a drag. If you don't need to manipulate data in the control I would recommend an unbound control that you programmatically populate yourself.

Because you're returning the whole table, I'm going to assume there are more than a few records in the table. Minimizing the transfer of data is a benefit that I would go for - if you build up the crosstab matrix server-side, you have to bring all of the information back for an x*y matrix of information - some/much of which are empty cells.

Here is a possible query that you could use if you insist on getting the data back for easy construction of a crosstab - note that it is not very efficient:

desc demo;
name   null?   type
rid            NUMBER(2)
qid            NUMBER(2)
answer         char(1)

select * from demo;
RID   QID   Answer
1     2     y
1     2     y
2     3     n
3     2     n
4     4     y
5     3     n

SELECT a.rid,b.qid, '-'
WHERE NOT EXISTS ( SELECT 'x' FROM demo c WHERE c.rid = a.rid and c.qid = b.qid )
SELECT rid,qid,answer
FROM demo;

RID     QID     Answer
1      2      y
1      3      -
1      4      -
2      2      -
2      3      n
2      4      -
3      2      n
3      3      -
3      4      -
4      2      -
4      3      -
4      4      y
5      2      -
5      3      n
5      4      -

Notice that you now have a full cross-tab-ready array based on all of the distinct RID and QID values. This is dead easy to populate into a table and you should be able to base your data-bound grid on this query as well.

I can't give you a stored procedure solution that I can guarantee will work with your database (I'm on Oracle) but the query above should make it straight forward for you to progress.


Author Comment

ID: 2378141

Thanks for the info, but I'm afraid I didn't make my self clear.  The structure of the returned query (using revided data from your example) should look like:


RID   QID   Answer
1     1     y
1     2     y
2     2     n
3     3     n
4     3     y
5     4     n


RID    1        2        3        4
1       y         y        -         -
2       -          -        n        n
3       -          -        n        -
4       -          -        y        -
5       -          -        -        n

It seems that the result set you list only achieves the insertion of null records, but doesn't change the resulting query STRUCTURE.

As for network efficiency, I think traffic will be lower if I can do this on the server because I expect to have a low percentage of NULLs (most questions will be filled in) and also because the table structure above eliminates having to return a QID for each element of data since the QIDs are now column headers.

I swear we're almost done here.  :-)


Expert Comment

ID: 2378253
I understood you perfectly. The structure that my query returned is 100% conducive to _constructing_ the crosstab result. Your code would have a very easy time assembling the results into a matrix format because all values come back for _all_ cells. In other words, whereas the standard select returns 6 results, the query I provided provides 5x4 results for a matrix consisting of 5 unique RIDs and 4 unique QIDs.

As I pointed out, there is no standard query to create a crosstab query on the server. You would do the assembly server-side. I argue "why bother" when the assembly client-side is faster and arguably easier.

I cannot see anyway you can construct a query that will allow you to leverage a data-bound grid. You can perhaps create a stored procedure that will return OUT parameters or return a VARCHAR2 string for each row, but I don't think this is effective.


Author Comment

ID: 2378434
I just finished implementing a client side crosstab DataSet (JBuilder's object for representing 2 dimensional data) by first adding columns for each unique QID in the dataset, then processing the simple SQL statement "SELECT RID,QID,ANSWER FROM RESPONSEDATA".  

Unfortunately, this was what I was trying to avoid doing on the client side.  I don't care if there are missing elements in the grid, since the column values are initialized to NULL and I only set the items in a row for which there are rows in the raw data table.  I thought that if I could do basically the same thing on the server(get the rows in one query, construct a temporary table and then populate and return it), it would be faster (I wasn't too worried about tying up the server CPU for this particular application).

Finding an implementation a server side Stored Procedure to do this has proven very elusive, so I guess after I get this project stablized, I will have to spend some time learning Interbase's Stored Procedure language and do it myself.

I appreciate your effort given to this question.


Expert Comment

ID: 2378953
Keep in mind that a temp table probably isn't your way to do things. You will probably be hitting the disk for the temp table create/update. Also, when you insert the information into the temp table, how do you know how many columns you would have? You don't know how many QIDs/RIDs you have in advance - you could hypothetically create the table at runtime (DDL) then populate it (DML) but I suspect that would be very slow.

Good luck,


Featured Post

Easily Design & Build Your Next Website

Squarespace’s all-in-one platform gives you everything you need to express yourself creatively online, whether it is with a domain, website, or online store. Get started with your free trial today, and when ready, take 10% off your first purchase with offer code 'EXPERTS'.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Creating a Cordova application which allow user to save to/load from his Dropbox account the application database.
Creating a Cordova application which allow user to save to/load from his Dropbox account the application database.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…

600 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question