Link to home
Start Free TrialLog in
Avatar of gsgi
gsgiFlag for United States of America

asked on

Limits of SQL - summarizing reports

I am working on a reporting tool called Ruport.  It is open source / free software.  It is in Ruby. It is designed to allow people to do "reporting" by  dumping data and having it summarize it rather than writing tricky sql to do the same.  It is at the 0.50 stage and now there are two branches, a stable branch and a development branch.  It is still VERY much in development.  We have three developers now.  I pay one, the others are volunteers.

I would like some input into what you think tricky sql is, that ruport could do.  Some of my ideas for things I find sql either a pain to do or inefficient to do are:
a. aging reports
b. getting running balance columns
c. difference reports (i.e. you print a report that lists deliquent accounts; but you have one from last week.  You want to know which ones on the report were not on the report last week.)
d. summary reports, i.e. you dump a list of sales, and ruport can produce a summary of those sales by month
e. when you want the minimum value of a column in a table, and the rest of values for the fields of that row

Some features or Ruport:
1. pulling data from more than one source - i.e. from sql, from mysql, from a .csv file
2. we can graph pretty easily
3. we do html, pdf, e-mail, and .csv easily
4. plugin for active record / rudimentary support for rails

Here is an open thread where I am exploring the limits of sql using some reporting examples that I need to do:
https://www.experts-exchange.com/questions/21984358/Can-sql-do-this.html

So, in summary, can you (A) help me define what is commonly needed in reporting (cause we have to be able to generalize it) that is hard to do in sql?
and
(B)
Also, I realize that dumping data to Ruport and calculating will break down if someone thinks they are going to dump 1,000,000 rows and tell it to add.
It is used right now on files that are within the 20,000 - 50,000 range and runs within a minute I think (much faster on 'nix than 'indows)
So what is a reasonable requirement here?  We are thinking of at some point dumping into a database, rather than a .csv file and writing
the summary features to take advantage of the speed of a database, which we will never match in Ruby.

thanks,

-gsgi

ASKER CERTIFIED SOLUTION
Avatar of nmcdermaid
nmcdermaid

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of gsgi

ASKER

My tool is about ease of use, not speed.
If it takes 8 hours for it to run a sophisticated report and e-mail it to me at night once a week or once a month, and I can define this report in my tool easily, without complicated mind numbing sql, then I will choose this route.  This concept is not for everyone and I recognize this and agree with it.  For me, I'll trade the comuting time, for the time it takes to optimize, troubleshoot and test a good sql solution. I really really do appreciate your feedback!  -gsgi
Avatar of gsgi

ASKER

Could you give me a brief example of "add up the 'last three' records, rolling all the way through a table" ... thanks.
Do you mean like, the last three records of a particular fk in a table where last is based on a date? -gsgi

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of earth man2
In postgresql and oracle you can use set returning functions in plpgsql / PL/SQL to divide and conquer complex SQL.

In oracle you have a v powerful set of analytic functions which enable you to do complex sql in one pass and the ability to drag in external data at the same time.

You have to be careful to define the scope of your project, else you are creating yet another SQL engine.
Avatar of gsgi

ASKER

Here is the scope of my project:  I am a consultant (an expert) in a particular medical database software system.  I handle many aspects, from citrix installs, to database reporting, to WAN configuration for my clients.  This is why I do not have time to spend writing and debugging and testing crazy sql.  I am so busy.  Having this tool helps me because I don't need to even access the clients database directly - I add nothing to their database, which IT professionals at some places protect like you are going to delete all of their tables.  When I do a database dump, I get basic data, with simple quieries, then my tool manipulates that and produces a report.  My clients use Oracle or SQL server, and my tool doesn't care.  Also, if I need to optimize something, I use my tool to check that the answers I am getting back are correct.  So if it takes three hours to add up 7 million rows and do an 'aging' on them, that only takes 50 secs in an indexed database - well I only run it a few times, to verify that some join or where clause or null somewhere in my query hasn't dropped a few rows :-) from the calculation.  This has happened to me and I hate it.  

Often, reports in excel have many columns and rows that do not come from one query.

Revenue
Provider    Jones    Smith    Willis   Burns
January                 2,000    3,000   4,000
February    1,000      500       600  

Patients Seen
Provider    Jones    Smith    Willis   Burns
January                   20        30       40
February    100        50        60  

Summary    Revenue   Patients Seen  Appts Scheduled   Missed Appts  % of Missed to Scheduled
Jones          
January      
February     1,000              100            

Smith
January       2,000                20
February        500                50

See how, in this report, Jones didn't even start until February?  It is hard to get sql to say Jones January 0 for data that doesn't exist without complex logic.
My tool just does it (after a lot of bitching from my developer).  Also, you can now see that as this spreadsheet grows, each column is from a separate query.  Because I am using many simple queries, I like it better than one monster.  Also the last column, 6, is a calculation from columns 4 and 5.

Thanks for your input.  I really appreciate the feedback and I appreciate your time.

-Greg

Avatar of nmcdermaid
nmcdermaid

Actually to get Jones 0 revenue for january, all you need to do is an outer join to a calendar table. Also the calculation you have up there is quite trivial too (if I understand it right).

earthman2 comment about Oracle is true. It has some more advanced functions than SQL Server (windowing and stuff)


I think it just comes down to 'you can't be an expert in everything'. You create solutions in the technologies you are used to which is fine as long as the client gets somethingh delivered! But I would have some reservations about putting all of the reporting business logic into a particular reporting tool as opposed to lower down (in the database)


Basically I've seen a lot of reporting tools all claiming to make things easier than SQL but for anything more than a trivial report, it ends up being just as complicated to develop it in the reporting tool as it is in SQL, so really it comes down to which platform you prefer to develop in.


Time does become an issue if you are deploying a production report that needs to run every day. 5 hour vs 5 min is just not acceptable for a daily production report.


Interesting discussion, I can tell we are at opposite sides here! (but can still keep it civil :) )
Avatar of gsgi

ASKER

Yeah, well I am not particularly surprised because your reaction is very much the same reaction we received when we showed it to the Ruby users group in NYC.  Those that spent every day professionally in a database didn't see much value in it.  Other programmers were a little more interested.  What is interesting to me, is that as I have been developing and playing with the tool, I have been writing more complex sql, and summarizing less in my tool than I expected.  So this goes to your point.  Maybe developing the tool is just teaching me to be a better SQL user :-)

At any rate, could you please read my other post, it has gone stale: https://www.experts-exchange.com/questions/21984358/Can-sql-do-this.html

Thanks, I really enjoy the feedback.

-gsgi