How to Unravel a Tricky Query

Introduction

If you browse through the Oracle zones or any of the other database-related zones you'll come across some complicated solutions and sometimes you'll just have to wonder how anyone came up with them. The simple answer is: practice. Long time database developers have written thousands of queries and, like any other skill, the more you do it the easier it becomes. More importantly those thousands of queries were written to address a variety of different problems. So, with experience you'll have had a chance to see lots of different solutions and patterns of syntax and techniques that work well for problems of a certain flavor.

That's precisely why I participate in Experts-Exchange. I want exposure to more problems, from more people, across more industries. The more problems I try to solve the better prepared I'll be for the challenges in my own job; and that brings us to the birth of this article.

Recently, I was asked to write a query to show top activity in an event log. My customer wanted to see which 5 events had occured the most often. He also needed to see the 10 most recent events that weren't in the top 5. Multiple filtering requirements but only one result set. That's sort of tricky! Usually I just write the solution and don't give much thought to how I came to it. When people ask "How did you do that?" I can explain what my solution does and why it works; but typically don't really recall all of the minor choices I made along the way leading to the final answer. This time I decided I would pay attention to my own thought processes and take note of why each decision was made in the discovery and construction. The goal being to, hopefully, come up with a semi-reusable system for others to use when facing a tough problem. Obviously the steps and trials of developing other queries will have their own quirks. In this article I'm hoping to show how I approached the problem and broke it into digestable pieces. In addition I use some analytic syntax that is often overlooked by developers. Finally I make extensive use of inline views (nested queries) as a tool for modular development and this task definitely benefited from that approach.

Look at the data

If you work with a set of data all the time you might be able to skip this step as you'll already have an idea of the table contents. In my case, I hadn't touched this particular database in months and had never seen this particular table. So let's take a look and see what's there. The sample data I've provided in the attachment isn't an exact copy; but structurally it's similar to what I had to work with. We can see it's a pretty simple structure: values and times. My customer's real table had other columns but they weren't relevant to the problem so I didn't include them here.

This is an intentionally simplified set but it still contains some useful information.

Everything is populated. Checking the table definition I can see that's enforced by constraints on the column definition. If that wasn't the case I'd have to go back to the customer to determine if/how missing dates or ids should be included in the results.

The dates have values down to the second. Many systems truncate results to the day, but not this one. So whatever my final solution, I need to preserve the full date/time integrity.

Look for obvious skew. If there are lots of repeated values then sorting will lead to ties and we'll need to address them with some rule. We'll see exactly that situation later.

Are there any unexpected or magic values? For example: columns called "no", "num", "number", etc. that have non-numeric values in them. What about the reverse, any string columns that do have numeric values? Are there any magic values such as IDs of -1, if so, maybe they weren't specified in the requirements but do they have special handling?

SQL> select * from event_log;
                      
                        EVENT_ID EVENT_DATE
                      ---------- -------------------
                             228 2012-03-07 15:13:53
                               9 2012-03-07 01:20:25
                              76 2012-03-07 22:27:24
                              15 2012-03-07 07:32:07
                             106 2012-03-08 15:35:01
                             104 2012-03-07 00:55:58
                             105 2012-03-06 01:06:36
                              85 2012-03-08 08:59:29
                             284 2012-03-08 03:42:35
                              39 2012-03-09 05:49:49
                              64 2012-03-06 12:51:51
                              33 2012-03-06 01:00:19
                              37 2012-03-06 13:00:27
                              ....

How to Unravel a Tricky Query

Introduction

Look at the data

Look at the requirements

First attempt

Second attempt

Conclusion

Comments (9)