Limits of SQL - summarizing reports

I am working on a reporting tool called Ruport.  It is open source / free software.  It is in Ruby. It is designed to allow people to do "reporting" by  dumping data and having it summarize it rather than writing tricky sql to do the same.  It is at the 0.50 stage and now there are two branches, a stable branch and a development branch.  It is still VERY much in development.  We have three developers now.  I pay one, the others are volunteers.

I would like some input into what you think tricky sql is, that ruport could do.  Some of my ideas for things I find sql either a pain to do or inefficient to do are:
a. aging reports
b. getting running balance columns
c. difference reports (i.e. you print a report that lists deliquent accounts; but you have one from last week.  You want to know which ones on the report were not on the report last week.)
d. summary reports, i.e. you dump a list of sales, and ruport can produce a summary of those sales by month
e. when you want the minimum value of a column in a table, and the rest of values for the fields of that row

Some features or Ruport:
1. pulling data from more than one source - i.e. from sql, from mysql, from a .csv file
2. we can graph pretty easily
3. we do html, pdf, e-mail, and .csv easily
4. plugin for active record / rudimentary support for rails

Here is an open thread where I am exploring the limits of sql using some reporting examples that I need to do:

So, in summary, can you (A) help me define what is commonly needed in reporting (cause we have to be able to generalize it) that is hard to do in sql?
Also, I realize that dumping data to Ruport and calculating will break down if someone thinks they are going to dump 1,000,000 rows and tell it to add.
It is used right now on files that are within the 20,000 - 50,000 range and runs within a minute I think (much faster on 'nix than 'indows)
So what is a reasonable requirement here?  We are thinking of at some point dumping into a database, rather than a .csv file and writing
the summary features to take advantage of the speed of a database, which we will never match in Ruby.



LVL 13
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

I am very skeptical.

If it is a complex rule to do the report, then it is a complex rule and thats all there is to do it. No matter what platform you use, you're going to have to define a complex rule.

But I can guarantee if it is complex, then it is going to run better on a database server platform then it is on a 'client' platform, assumming it has lots of summaries, joins etc.

A database sever is optimised to run summaries, joins etc. A standalone client application will never come close speedwise.

>> (A) help me define what is commonly needed in reporting (cause we have to be able to generalize it) that is hard to do in sql?

The two things that come immediately to mind are:

1. A 'windowing' report where you might want to ad up the 'last three' records, rolling all the way through a table
2. A crosstab, the bane of every query writers existence. Crosstabs should ~always~ be done in the reporting layer, not in the database.

>> (B) ...dumping data to Ruport and calculating will break down with 1,000,000 rows. It is used on files that are 20,000 - 50,000 records . Runs within a minute.

Not sure what you're asking here but you have correctly identified that a client app will never ever match a databases server in this particular area (I just added up my 7 million row fact table and it took 28 seconds on unoptimised hardware)

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
gsgiAuthor Commented:
My tool is about ease of use, not speed.
If it takes 8 hours for it to run a sophisticated report and e-mail it to me at night once a week or once a month, and I can define this report in my tool easily, without complicated mind numbing sql, then I will choose this route.  This concept is not for everyone and I recognize this and agree with it.  For me, I'll trade the comuting time, for the time it takes to optimize, troubleshoot and test a good sql solution. I really really do appreciate your feedback!  -gsgi
gsgiAuthor Commented:
Could you give me a brief example of "add up the 'last three' records, rolling all the way through a table" ... thanks.
Do you mean like, the last three records of a particular fk in a table where last is based on a date? -gsgi

Become a CompTIA Certified Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

Yeah thats what I meant, like a rolling seven day report... you can do it in SQL but its a bit intensive.

For some reports, there is just no simple way to define it.

Given a particular 'complex' report,

I could create it with some 'mind numbing' SQL. You can do the same report in a proprietary reporting tool. Either way the rules to do it are very complex. Its just the development platform that you use. If I was to find the complex report developed in your particular tool I would probably think it was mind numbing!

The other thing is if you can develop your report in SQL then ~any~ presentation tool can use it (Crystal, Cognos, Reporting Services or Ruport, whatever). The report definition is not confined to a particular platform. SQL is a standard language.

To go further, I have a lot of consulting with various reporting tools. These reporting tools attempt to simplify report building enough so that an end user can just drag and drop elements to create a report.

This works great for simple reports.

But when we get a complex report, there is just no way an end user can build it in any of these tools. I could certainly build it in the reporting tool but it is still complex and requires a higher understanding of databases and tables and joins anyway.

But I do wholeheartedly agree with your point - in my book the simplest solution is always the best. If you can in fact define a report more simply in a reporting tool as opposed to another then by all means that is a good reason to use it.

I have just found many times in the past that various reporting tools have attempted to make report building an easy task by 'abstracting' the database and this works great for simple reports but not for anything with any complexity....

One thing that can make your SQL a lot less painful is if you are working from a 'star schema' rather than a standard OLTP schema.
earth man2Commented:
In postgresql and oracle you can use set returning functions in plpgsql / PL/SQL to divide and conquer complex SQL.

In oracle you have a v powerful set of analytic functions which enable you to do complex sql in one pass and the ability to drag in external data at the same time.

You have to be careful to define the scope of your project, else you are creating yet another SQL engine.
gsgiAuthor Commented:
Here is the scope of my project:  I am a consultant (an expert) in a particular medical database software system.  I handle many aspects, from citrix installs, to database reporting, to WAN configuration for my clients.  This is why I do not have time to spend writing and debugging and testing crazy sql.  I am so busy.  Having this tool helps me because I don't need to even access the clients database directly - I add nothing to their database, which IT professionals at some places protect like you are going to delete all of their tables.  When I do a database dump, I get basic data, with simple quieries, then my tool manipulates that and produces a report.  My clients use Oracle or SQL server, and my tool doesn't care.  Also, if I need to optimize something, I use my tool to check that the answers I am getting back are correct.  So if it takes three hours to add up 7 million rows and do an 'aging' on them, that only takes 50 secs in an indexed database - well I only run it a few times, to verify that some join or where clause or null somewhere in my query hasn't dropped a few rows :-) from the calculation.  This has happened to me and I hate it.  

Often, reports in excel have many columns and rows that do not come from one query.

Provider    Jones    Smith    Willis   Burns
January                 2,000    3,000   4,000
February    1,000      500       600  

Patients Seen
Provider    Jones    Smith    Willis   Burns
January                   20        30       40
February    100        50        60  

Summary    Revenue   Patients Seen  Appts Scheduled   Missed Appts  % of Missed to Scheduled
February     1,000              100            

January       2,000                20
February        500                50

See how, in this report, Jones didn't even start until February?  It is hard to get sql to say Jones January 0 for data that doesn't exist without complex logic.
My tool just does it (after a lot of bitching from my developer).  Also, you can now see that as this spreadsheet grows, each column is from a separate query.  Because I am using many simple queries, I like it better than one monster.  Also the last column, 6, is a calculation from columns 4 and 5.

Thanks for your input.  I really appreciate the feedback and I appreciate your time.


Actually to get Jones 0 revenue for january, all you need to do is an outer join to a calendar table. Also the calculation you have up there is quite trivial too (if I understand it right).

earthman2 comment about Oracle is true. It has some more advanced functions than SQL Server (windowing and stuff)

I think it just comes down to 'you can't be an expert in everything'. You create solutions in the technologies you are used to which is fine as long as the client gets somethingh delivered! But I would have some reservations about putting all of the reporting business logic into a particular reporting tool as opposed to lower down (in the database)

Basically I've seen a lot of reporting tools all claiming to make things easier than SQL but for anything more than a trivial report, it ends up being just as complicated to develop it in the reporting tool as it is in SQL, so really it comes down to which platform you prefer to develop in.

Time does become an issue if you are deploying a production report that needs to run every day. 5 hour vs 5 min is just not acceptable for a daily production report.

Interesting discussion, I can tell we are at opposite sides here! (but can still keep it civil :) )
gsgiAuthor Commented:
Yeah, well I am not particularly surprised because your reaction is very much the same reaction we received when we showed it to the Ruby users group in NYC.  Those that spent every day professionally in a database didn't see much value in it.  Other programmers were a little more interested.  What is interesting to me, is that as I have been developing and playing with the tool, I have been writing more complex sql, and summarizing less in my tool than I expected.  So this goes to your point.  Maybe developing the tool is just teaching me to be a better SQL user :-)

At any rate, could you please read my other post, it has gone stale:

Thanks, I really enjoy the feedback.

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.