<

How to Unravel a Tricky Query

Published on
20,609 Points
6,209 Views
19 Endorsements
Last Modified:
Awarded
How to Unravel a Tricky Query

Introduction


If you browse through the Oracle zones or any of the other database-related zones you'll come across some complicated solutions and sometimes you'll just have to wonder how anyone came up with them.  The simple answer is: practice.  Long time database developers have written thousands of queries and, like any other skill, the more you do it the easier it becomes.  More importantly those thousands of queries were written to address a variety of different problems.  So, with experience you'll have had a chance to see lots of different solutions and patterns of syntax and techniques that work well for problems of a certain flavor.

That's precisely why I participate in Experts-Exchange.  I want exposure to more problems, from more people, across more industries.  The more problems I try to solve the better prepared I'll be for the challenges in my own job; and that brings us to the birth of this article.

Recently, I was asked to write a query to show top activity in an event log.  My customer wanted to see which 5 events had occured the most often.  He also needed to see the 10 most recent events that weren't in the top 5.  Multiple filtering requirements but only one result set. That's sort of tricky!  Usually I just write the solution and don't give much thought to how I came to it.  When people ask "How did you do that?"  I can explain what my solution does and why it works; but typically don't really recall all of the minor choices I made along the way leading to the final answer.  This time I decided I would pay attention to my own thought processes and take note of why each decision was made in the discovery and construction.  The goal being to, hopefully, come up with a semi-reusable system for others to use when facing a tough problem. Obviously the steps and trials of developing other queries will have their own quirks. In this article I'm hoping to show how I approached the problem and broke it into digestable pieces.  In addition I use some analytic syntax that is often overlooked by developers.  Finally I make extensive use of inline views (nested queries) as a tool for modular development and this task definitely benefited from that approach.

Look at the data


If you work with a set of data all the time you might be able to skip this step as you'll already have an idea of the table contents.  In my case, I hadn't touched this particular database in months and had never seen this particular table.  So let's take a look and see what's there.  The sample data I've provided in the attachment isn't an exact copy; but structurally it's similar to what I had to work with.   We can see it's a pretty simple structure: values and times.  My customer's real table had other columns but they weren't relevant to the problem so I didn't include them here.  

This is an intentionally simplified set but it still contains some useful information.

Everything is populated. Checking the table definition I can see that's enforced by constraints on the column definition.  If that wasn't the case I'd have to go back to the customer to determine if/how missing dates or ids should be included in the results.
The dates have values down to the second. Many systems truncate results to the day, but not this one.  So whatever my final solution, I need to preserve the full date/time integrity.
Look for obvious skew.  If there are lots of repeated values then sorting will lead to ties and we'll need to address them with some rule. We'll see exactly that situation later.
Are there any unexpected or magic values?  For example: columns called "no", "num", "number", etc. that have non-numeric values in them. What about the reverse, any string columns that do have numeric values?  Are there any magic values such as IDs of -1, if so, maybe they weren't specified in the requirements but do they have special handling?


SQL> select * from event_log;

  EVENT_ID EVENT_DATE
---------- -------------------
       228 2012-03-07 15:13:53
         9 2012-03-07 01:20:25
        76 2012-03-07 22:27:24
        15 2012-03-07 07:32:07
       106 2012-03-08 15:35:01
       104 2012-03-07 00:55:58
       105 2012-03-06 01:06:36
        85 2012-03-08 08:59:29
       284 2012-03-08 03:42:35
        39 2012-03-09 05:49:49
        64 2012-03-06 12:51:51
        33 2012-03-06 01:00:19
        37 2012-03-06 13:00:27
        ....

Open in new window

       


Look at the requirements


There are three rules I need to follow in the final answer.  

Top 5 by count
Top 10 by time
Top 10 does not include any of the top 5

At this point I did a mental evaluation that all of this could probably be done with just one query in a single pass of the table.  How did I come to that conclusion?  That's sort of hard to say; but it basically boils down to the exposures I mentioned previously.  I've never written this exact query before but I've written lots of counting queries and I've written lots of sorting and ranking queries.  At this point I didn't know exactly how I would do them both together, but nothing jumped out at me immediately that it would need multiple access to the same table.

So, already I've started tuning a query that I haven't even written yet!  But, there's no reason to trust me on that, and maybe I was just having a good day for insight.  Maybe on a different day it wouldn't have just clicked for me like that.  So, let's examine another approach.  In my own head I was able to skip this step but it's always a good idea to test our assumptions.

With any large request I always try to break it into pieces.  This query was easy to do so as each of the first two requirements can be thought of as query itself and the third requirement as a where clause.  My goal in breaking up the request is to give myself pieces that I can put together.  For example, the table doesn't have a count built in for the the event_ids, so I'll have to build that.  Similarly there is no ordering until I do that and finally there are no top 5/10 identifiers either so I'll need to give myself those too.  These may seem obvious but I do the same breakdown on all queries.  As intermediate steps I might create redundant sub-queries or extra layers but I can always simplify later if necesary but I can't construct a solution without the pieces.  So let's dig into this one.

First attempt


Looking at the first requirement, getting the counts is pretty easy.

SQL> SELECT   event_id, COUNT(*)
  2      FROM event_log
  3  GROUP BY event_id
  4  ORDER BY 2 DESC;

  EVENT_ID   COUNT(*)
---------- ----------
         9         15
         3         15
         1         12
         8         12
         5         12
        30         12
        38         11
       106         11
        61         11
        54         10
        47         10
        12         10
        65         10
         7         10
        22          9
        34          9
       100          9
        29          9
        73          9
         6          9
         ...

Open in new window


Adding a ROWNUM stop key gives us that answer

SQL> SELECT event_id, cnt
  2    FROM (SELECT   event_id, COUNT(*) cnt
  3              FROM event_log
  4          GROUP BY event_id
  5          ORDER BY 2 DESC)
  6   WHERE ROWNUM <= 5;

  EVENT_ID        CNT
---------- ----------
         3         15
         9         15
         1         12
         5         12
        30         12

Open in new window


Uh oh!  This looks like it works, but 4 ids have a count of 12.  So, top 5 causes us to lose one of them, in this case it was event_id 8, but we have no rule established to enforce that.  It could have been any of them.  Regardless of the final implementation, this will always be an issue.  So back to the customer for a refinement of the requirements.  In the event of a tie, how do we decide which events qualify for top 5 and which get bumped out?  In my case the answer came back: the events with the most recent occurence win in a tie.

Ah ha! Not only does that solve the issue of which records qualify for top 5, it also makes a connection to the second requirement.  In both cases we need to keep track of the most recent occurence of each event_id.  Let's not jump ahead though, first we'll finish getting this requirement correct.  So, I add MAX(event_date) to find the most recent occurence of each event_id.  I'll bump up the ROWNUM counter to 6 so I can see all of the event_ids I'm interested in and confirm correct operation.

SQL> SELECT event_id, cnt, maxdate
  2    FROM (SELECT   event_id, COUNT(*) cnt, MAX(event_date) maxdate
  3              FROM event_log
  4          GROUP BY event_id
  5          ORDER BY 2 DESC,  3 desc)
  6   WHERE ROWNUM <= 6;

  EVENT_ID        CNT MAXDATE
---------- ---------- -------------------
         3         15 2012-03-09 04:12:36
         9         15 2012-03-08 17:46:10
        30         12 2012-03-10 02:19:51
         1         12 2012-03-09 22:35:27
         5         12 2012-03-09 08:51:27
         8         12 2012-03-09 07:08:20

Open in new window


Conveniently enough we see event_id 8 is the last one so it should have been removed anyway.  Now we have a rule in the code to enforce this behavior as a repeatable result.  Fixing the ROWNUM filter to 5 we have the first requirement completed.

SQL> SELECT event_id, cnt, maxdate
  2    FROM (SELECT   event_id, COUNT(*) cnt, MAX(event_date) maxdate
  3              FROM event_log
  4          GROUP BY event_id
  5          ORDER BY 2 DESC,  3 desc)
  6   WHERE ROWNUM <= 5;

  EVENT_ID        CNT MAXDATE
---------- ---------- -------------------
         3         15 2012-03-09 04:12:36
         9         15 2012-03-08 17:46:10
        30         12 2012-03-10 02:19:51
         1         12 2012-03-09 22:35:27
         5         12 2012-03-09 08:51:27

Open in new window


For the second requirement (top 10 most recent results) that's simply ordering the results and applying a ROWNUM stop key to return only 10.

SQL> SELECT event_id, event_date
  2    FROM (SELECT   event_id, event_date
  3              FROM event_log
  4          ORDER BY event_date DESC)
  5   WHERE ROWNUM <= 10;

  EVENT_ID EVENT_DATE
---------- -------------------
        67 2012-03-10 08:20:37
       254 2012-03-10 06:43:31
        30 2012-03-10 02:19:51
         1 2012-03-09 22:35:27
       134 2012-03-09 19:05:10
        16 2012-03-09 16:17:44
        68 2012-03-09 15:33:54
       200 2012-03-09 13:35:05
       146 2012-03-09 12:23:27
       190 2012-03-09 12:09:28

Open in new window


But, here we can see the third requirement creeping in.  IDs 1 and 30 are in the top 10 by time and the top 5 by count.  So we need to exclude those before counting.  That means we'll have to have the event ids of the first part applied as a filter to the second part.

SQL> SELECT event_id, event_date
  2    FROM (SELECT   event_id, event_date
  3              FROM event_log
  4             WHERE event_id NOT IN (SELECT event_id
  5                                      FROM (SELECT   event_id, COUNT(*) cnt, MAX(event_date) maxdate
  6                                                FROM event_log
  7                                            GROUP BY event_id
  8                                            ORDER BY 2 DESC, 3 DESC)
  9                                     WHERE ROWNUM <= 5)
 10          ORDER BY event_date DESC)
 11   WHERE ROWNUM <= 10;

  EVENT_ID EVENT_DATE
---------- -------------------
        67 2012-03-10 08:20:37
       254 2012-03-10 06:43:31
       134 2012-03-09 19:05:10
        16 2012-03-09 16:17:44
        68 2012-03-09 15:33:54
       200 2012-03-09 13:35:05
       146 2012-03-09 12:23:27
       190 2012-03-09 12:09:28
        79 2012-03-09 12:01:51
       168 2012-03-09 11:09:08

10 rows selected.

Open in new window


So, now we have all three requirements satistified.  Simply union them together and we'll have the query done.  Except the first query ended up with 3 columns, here we only have two because we're missing the counts.  It's not actually a requirement to show the counts for the most recent so I'll simply stub in a dummy column with NULLs and it'll be done.

SQL> SELECT event_id, cnt, maxdate
  2    FROM (SELECT   event_id, COUNT(*) cnt, MAX(event_date) maxdate
  3              FROM event_log
  4          GROUP BY event_id
  5          ORDER BY 2 DESC, 3 DESC)
  6   WHERE ROWNUM <= 5
  7   union all
  8   SELECT event_id, null, event_date
  9    FROM (SELECT   event_id, event_date
 10              FROM event_log
 11             WHERE event_id NOT IN (SELECT event_id
 12                                      FROM (SELECT   event_id, COUNT(*) cnt, MAX(event_date) maxdate
 13                                                FROM event_log
 14                                            GROUP BY event_id
 15                                            ORDER BY 2 DESC, 3 DESC)
 16                                     WHERE ROWNUM <= 5)
 17          ORDER BY event_date DESC)
 18   WHERE ROWNUM <= 10;

  EVENT_ID        CNT MAXDATE
---------- ---------- -------------------
         3         15 2012-03-09 04:12:36
         9         15 2012-03-08 17:46:10
        30         12 2012-03-10 02:19:51
         1         12 2012-03-09 22:35:27
         5         12 2012-03-09 08:51:27
        67            2012-03-10 08:20:37
       254            2012-03-10 06:43:31
       134            2012-03-09 19:05:10
        16            2012-03-09 16:17:44
        68            2012-03-09 15:33:54
       200            2012-03-09 13:35:05
       146            2012-03-09 12:23:27
       190            2012-03-09 12:09:28
        79            2012-03-09 12:01:51
       168            2012-03-09 11:09:08

Open in new window


And, as you might expect, upon seeing these results, the customer informed me: while it wasn't officially a requirement, showing blank counts isn't desirable.  So, this path ended up being a bit of a dead end.  I'm glad I came up with a different approach for the real task.  But, before abandoning this I'll run it again with autotrace on to capture some performance statistics. If my other approach isn't very efficient we may want to come back to this.  I ran several times until the statistics stabilized with these results.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2002  consistent gets
          0  physical reads
          0  redo size
        721  bytes sent via SQL*Net to client
        407  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
        570  sorts (memory)
          0  sorts (disk)
         15  rows processed

Open in new window



Second attempt


Earlier I noted an "Ah ha!" moment when the first 2 requirements tied together and I needed times included with my counts for part 1.  From the end point of the last attempt we now see we also need counts with the times for part 2.  When I need to do set based operations like COUNT while still maintaining other row-level data that leads me toward analytics instead of aggregates.   I still need a count, so the COUNT analytic will correspond to the COUNT aggregate.  To do the date sorting I'll use ROW_NUMBER to group and sort dates corresponding to each event_id, with row_number of 1 being the most recent date for each.  I could use RANK or DENSE_RANK too, but they offer some extra functionality in their counting that I don't really need.

SQL> SELECT event_id,
  2         event_date,
  3         ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY event_date DESC) datern,
  4         COUNT(*) OVER (PARTITION BY event_id) cnt
  5    FROM event_log;

  EVENT_ID EVENT_DATE              DATERN        CNT
---------- ------------------- ---------- ----------
         0 2012-03-08 06:21:59          1          5
         0 2012-03-07 17:47:54          2          5
         0 2012-03-07 05:56:34          3          5
         0 2012-03-07 02:42:24          4          5
         0 2012-03-06 18:33:37          5          5
         1 2012-03-09 22:35:27          1         12
         1 2012-03-08 22:20:09          2         12
         1 2012-03-08 03:28:43          3         12
         1 2012-03-08 02:23:03          4         12
         1 2012-03-08 00:21:47          5         12
         1 2012-03-07 11:16:54          6         12
         1 2012-03-07 07:39:22          7         12
         1 2012-03-07 02:17:49          8         12
         1 2012-03-07 01:58:17          9         12
         1 2012-03-06 19:53:21         10         12
         1 2012-03-06 17:11:25         11         12
         1 2012-03-05 17:36:21         12         12
         2 2012-03-08 07:56:31          1          4
         2 2012-03-07 22:11:33          2          4
         2 2012-03-07 04:12:57          3          4
         2 2012-03-06 09:25:23          4          4
         3 2012-03-09 04:12:36          1         15
         3 2012-03-08 22:18:34          2         15
         3 2012-03-08 08:23:16          3         15
         3 2012-03-07 16:07:27          4         15
         ...

Open in new window


That gives me some of the pieces I need.  I have counts for each event_id and I have event_dates numbered with 1 being the most recent for each event_id.  I also need to sort the counts so I can find the 5 largest counts and remembering to include the dates as the tie breaker.  By restricting the DATERN column to 1 means I'll only keep one row for each event_id and that row will be the one with the most recent event_date.  Adding another ROW_NUMBER call lets me order the counts so I can later find the top 5 among them.

SQL> SELECT event_id,
  2         event_date,
  3         datern,
  4         cnt,
  5         ROW_NUMBER() OVER (ORDER BY cnt DESC, event_date DESC) cntrn
  6    FROM (SELECT event_id,
  7                 event_date,
  8                 ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY event_date DESC) datern,
  9                 COUNT(*) OVER (PARTITION BY event_id) cnt
 10            FROM event_log)
 11   WHERE datern = 1;

  EVENT_ID EVENT_DATE              DATERN        CNT      CNTRN
---------- ------------------- ---------- ---------- ----------
         3 2012-03-09 04:12:36          1         15          1
         9 2012-03-08 17:46:10          1         15          2
        30 2012-03-10 02:19:51          1         12          3
         1 2012-03-09 22:35:27          1         12          4
         5 2012-03-09 08:51:27          1         12          5
         8 2012-03-09 07:08:20          1         12          6
       106 2012-03-08 15:35:01          1         11          7
        38 2012-03-08 15:22:04          1         11          8
        61 2012-03-08 09:57:17          1         11          9
        54 2012-03-09 00:18:46          1         10         10
        65 2012-03-08 17:19:22          1         10         11
        12 2012-03-08 07:54:54          1         10         12
        47 2012-03-08 06:22:32          1         10         13
         7 2012-03-08 05:12:03          1         10         14
        26 2012-03-09 06:42:13          1          9         15
       100 2012-03-08 23:24:27          1          9         16
        34 2012-03-08 19:07:45          1          9         17
        33 2012-03-08 14:47:33          1          9         18
         6 2012-03-08 13:20:39          1          9         19
        73 2012-03-08 08:57:17          1          9         20
        22 2012-03-08 05:05:41          1          9         21
        29 2012-03-08 04:27:15          1          9         22
        77 2012-03-07 18:43:47          1          9         23
        25 2012-03-09 10:45:12          1          8         24
        21 2012-03-09 10:21:11          1          8         25
        39 2012-03-09 05:49:49          1          8         26
        83 2012-03-09 00:55:41          1          8         27
        40 2012-03-09 00:44:21          1          8         28
       112 2012-03-08 19:51:43          1          8         29
        10 2012-03-08 16:43:33          1          8         30
        56 2012-03-08 10:02:15          1          8         31
        23 2012-03-08 06:51:25          1          8         32
        24 2012-03-08 06:16:38          1          8         33
        72 2012-03-07 18:31:40          1          8         34
        ...

Open in new window


Now I have everything I need.  With DATERN already filtered I have the most recent records.  With CNTRN I know which rows have the giggest counts.  The tricky part now is splitting them into two groups. One to satisfy each of the first 2 requirements.  I'll add a new column that checks the counts and creates a group for the top 5 and another group for everything else.

SQL> SELECT   event_id,
  2           event_date,
  3           datern,
  4           cnt,
  5           cntrn,
  6           CASE WHEN cntrn <= 5 THEN cntrn END countgroup
  7      FROM (SELECT event_id,
  8                   event_date,
  9                   datern,
 10                   cnt,
 11                   ROW_NUMBER() OVER (ORDER BY cnt DESC, event_date DESC) cntrn
 12              FROM (SELECT event_id,
 13                           event_date,
 14                           ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY event_date DESC) datern,
 15                           COUNT(*) OVER (PARTITION BY event_id) cnt
 16                      FROM event_log)
 17             WHERE datern = 1)
 18  ORDER BY countgroup nulls last;

  EVENT_ID EVENT_DATE              DATERN        CNT      CNTRN COUNTGROUP
---------- ------------------- ---------- ---------- ---------- ----------
         3 2012-03-09 04:12:36          1         15          1          1
         9 2012-03-08 17:46:10          1         15          2          2
        30 2012-03-10 02:19:51          1         12          3          3
         1 2012-03-09 22:35:27          1         12          4          4
         5 2012-03-09 08:51:27          1         12          5          5
         8 2012-03-09 07:08:20          1         12          6
       106 2012-03-08 15:35:01          1         11          7
        38 2012-03-08 15:22:04          1         11          8
        61 2012-03-08 09:57:17          1         11          9
        54 2012-03-09 00:18:46          1         10         10
        65 2012-03-08 17:19:22          1         10         11
        12 2012-03-08 07:54:54          1         10         12
        47 2012-03-08 06:22:32          1         10         13
         7 2012-03-08 05:12:03          1         10         14
        26 2012-03-09 06:42:13          1          9         15
       100 2012-03-08 23:24:27          1          9         16
        34 2012-03-08 19:07:45          1          9         17
        33 2012-03-08 14:47:33          1          9         18
         6 2012-03-08 13:20:39          1          9         19
        73 2012-03-08 08:57:17          1          9         20
        22 2012-03-08 05:05:41          1          9         21
        29 2012-03-08 04:27:15          1          9         22
        77 2012-03-07 18:43:47          1          9         23
        25 2012-03-09 10:45:12          1          8         24
        21 2012-03-09 10:21:11          1          8         25
        39 2012-03-09 05:49:49          1          8         26
        83 2012-03-09 00:55:41          1          8         27
        40 2012-03-09 00:44:21          1          8         28
       112 2012-03-08 19:51:43          1          8         29
        10 2012-03-08 16:43:33          1          8         30
        56 2012-03-08 10:02:15          1          8         31
        23 2012-03-08 06:51:25          1          8         32
        24 2012-03-08 06:16:38          1          8         33
        72 2012-03-07 18:31:40          1          8         34
        64 2012-03-07 12:52:13          1          8         35
        ...

Open in new window


Sorting by the new COUNTGROUP column has given us the first requirement and, as a bonus, the third requirement as well.  Since only the top 5 are in that group, if we simply order the null group by their dates, the second requirement will be fullfilled and the filtering of the top 5 will be implicit by the null vs not-null grouping.

SQL> SELECT   event_id,
  2           event_date,
  3           datern,
  4           cnt,
  5           cntrn,
  6           countgroup
  7      FROM (SELECT event_id,
  8                   event_date,
  9                   datern,
 10                   cnt,
 11                   cntrn,
 12                   CASE WHEN cntrn <= 5 THEN cntrn END countgroup
 13              FROM (SELECT event_id,
 14                           event_date,
 15                           datern,
 16                           cnt,
 17                           ROW_NUMBER() OVER (ORDER BY cnt DESC, event_date DESC) cntrn
 18                      FROM (SELECT event_id,
 19                                   event_date,
 20                                   ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY event_date D
ESC)
 21                                       datern,
 22                                   COUNT(*) OVER (PARTITION BY event_id) cnt
 23                              FROM event_log)
 24                     WHERE datern = 1))
 25  ORDER BY countgroup NULLS LAST, event_date DESC;

  EVENT_ID EVENT_DATE              DATERN        CNT      CNTRN COUNTGROUP
---------- ------------------- ---------- ---------- ---------- ----------
         3 2012-03-09 04:12:36          1         15          1          1
         9 2012-03-08 17:46:10          1         15          2          2
        30 2012-03-10 02:19:51          1         12          3          3
         1 2012-03-09 22:35:27          1         12          4          4
         5 2012-03-09 08:51:27          1         12          5          5
        67 2012-03-10 08:20:37          1          4         94
       254 2012-03-10 06:43:31          1          1        188
       134 2012-03-09 19:05:10          1          1        189
        16 2012-03-09 16:17:44          1          7         36
        68 2012-03-09 15:33:54          1          6         57
       200 2012-03-09 13:35:05          1          1        190
       146 2012-03-09 12:23:27          1          6         58
       190 2012-03-09 12:09:28          1          1        191
        79 2012-03-09 12:01:51          1          7         37
       168 2012-03-09 11:09:08          1          2        146
        25 2012-03-09 10:45:12          1          8         24
        87 2012-03-09 10:30:38          1          4         95
        21 2012-03-09 10:21:11          1          8         25
        41 2012-03-09 07:47:26          1          5         76
         8 2012-03-09 07:08:20          1         12          6
        93 2012-03-09 07:04:00          1          7         38
       172 2012-03-09 06:58:36          1          2        147
        52 2012-03-09 06:45:10          1          2        148
        26 2012-03-09 06:42:13          1          9         15
        15 2012-03-09 05:57:06          1          6         59
        39 2012-03-09 05:49:49          1          8         26
        27 2012-03-09 05:12:15          1          6         60
        98 2012-03-09 04:44:38          1          6         61
        82 2012-03-09 04:41:59          1          7         39
        ...

Open in new window

       

And finally, since the top 5 by count is already in place, we simply need to cut off the results after the next 10. So, we'll remove our calculation columns and apply a ROWNUM stop key for 15 rows and that will give us the final solution for the customer.

SQL> SELECT event_id, event_date, cnt
  2    FROM (SELECT   event_id,
  3                   event_date,
  4                   datern,
  5                   cnt,
  6                   cntrn,
  7                   countgroup
  8              FROM (SELECT event_id,
  9                           event_date,
 10                           datern,
 11                           cnt,
 12                           cntrn,
 13                           CASE WHEN cntrn <= 5 THEN cntrn END countgroup
 14                      FROM (SELECT event_id,
 15                                   event_date,
 16                                   datern,
 17                                   cnt,
 18                                   ROW_NUMBER() OVER (ORDER BY cnt DESC, event_date DESC) cntrn
 19                              FROM (SELECT event_id,
 20                                           event_date,
 21                                           ROW_NUMBER()
 22                                               OVER (PARTITION BY event_id ORDER BY event_date DE
SC)
 23                                               datern,
 24                                           COUNT(*) OVER (PARTITION BY event_id) cnt
 25                                      FROM event_log)
 26                             WHERE datern = 1))
 27          ORDER BY countgroup NULLS LAST, event_date DESC)
 28   WHERE ROWNUM <= 15;

  EVENT_ID EVENT_DATE                 CNT
---------- ------------------- ----------
         3 2012-03-09 04:12:36         15
         9 2012-03-08 17:46:10         15
        30 2012-03-10 02:19:51         12
         1 2012-03-09 22:35:27         12
         5 2012-03-09 08:51:27         12
        67 2012-03-10 08:20:37          4
       254 2012-03-10 06:43:31          1
       134 2012-03-09 19:05:10          1
        16 2012-03-09 16:17:44          7
        68 2012-03-09 15:33:54          6
       200 2012-03-09 13:35:05          1
       146 2012-03-09 12:23:27          6
       190 2012-03-09 12:09:28          1
        79 2012-03-09 12:01:51          7
       168 2012-03-09 11:09:08          2

15 rows selected.

Open in new window


Before declaring victory though, turn on autotrace and run it a few more times to check the efficiency of it.  We may need to backtrack and revive the other method.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        750  bytes sent via SQL*Net to client
        407  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
         15  rows processed

Open in new window


This way is a lot better!  7 vs 2002 consistent gets (io operations) and 3 vs 570 sorts.  This query text is 29 bytes larger, but that's insignificant.

Other than the table/column names and some minor formatting this is the same result I submitted to my customer.

Conclusion


As mentioned above it's not going to be possible to simply copy these steps verbatim for every query you need to write.  Hopefully though you can take it as an outline of how to approach a query as well as some syntax tips for getting the results you need.   And last, even though in my real task I started with the second attempt I described above, it was still worthwhile exploring other options to ensure the best option is being provided to our customers.  

There is no secret to learning good SQL, or if there is, I don't know it.  I simply write and test lots of queries.  Note what works and what doesn't.  Remember to test for efficiency, note what works well and what doesn't.  I'm sure I could have eventually gotten the first approach to do what I needed but it's already doing hundreds of times the amount of IO and memory operations as the second version so there's no incentive to add even more processing to that option.  You might notice I did not measure time in this example.  There are only 1000 rows in the test data so both options run quickly.  I measure the gets and sorts because those will better reflect the ability to scale as the data volume increases.  If my real data is many gigabytes of rows I definitely don't want to use a query that does 300 times the IO scanning all of that data.


As a final note:  The syntax used above is Oracle specific;but most of it should be portable to other platforms.  The ROWNUM filter could be replaced with a TOP clause in SQL Server or FETCH ONLY clause in DB2.  The analytics should also be supported in recent versions of both as well.  You will not be able to use that analytic syntax in Access.  MySQL doesn't yet support the analytics but you might be able to do some embedded procedural logic to simulate the functionality.

I hope you found this helpful.

Thanks for reading!
Questions and comments are welcome, as always.
sample-data.sql
19
Comment
Author:sdstuber
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
10 Comments
 
LVL 38

Expert Comment

by:younghv
sdstuber,
Yet another fine EE Article. Nicely done.

"Yes" vote above.
0
 
LVL 74

Author Comment

by:sdstuber
Thanks for the comment and the vote.  I'm happy to hear you enjoyed it.
0
 
 

Administrative Comment

by:mbizup
sdstuber,

We've awarded this Editor's Choice - it is a great article.

Keep 'em coming!


mbizup
EE Page Editor
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 74

Author Comment

by:sdstuber
Thank you!
0
 

Expert Comment

by:bkcpzh
Very annoying to read articles with an "I" in almost every sentence. Try to write your article in a more scientific way. What your saying is very usefull though.
0
 
LVL 74

Author Comment

by:sdstuber
Thank you for the comment, but when writing about a real example that "I" encountered.  It seems appropriate to relate it from my experience.

Also, I prefer to write in a conversational style when discussing a topic without rigor.  Contrast to some of the other topics I've written about.  Different styles depending on the nature of the discussion.

Thanks again though. I'll your position into consideration when writing my next article.
0
 
LVL 61

Expert Comment

by:mbizup
<< Try to write your article in a more scientific way.  >>

From a different point of view, the style here actually appealed to me - and made the article an enjoyable read.

I agree that style should be dependent on the topic, on the target audience and on the mood/tone the Author wants to set.  The style for a  formal research paper should be different than that for an article aimed at people who are kicked back, feet up on the desk, after work with a beer and burger, reading the article more for the fun of it than out of necessity (picturing some of the Experts here answering questions in their spare time :-)  ).

Those are two opposite extremes of course, but just examples of why you might want to use conversational vs formal styles.
0
 
LVL 49

Expert Comment

by:PortletPaul
I liked it (including the use of I)! However I also like brevity :-) and a synopsis at the top of a lengthy article can be really handy.

Be Familiar with the data
look for exceptions (or "magic" values)
look at the constraints (eg. can there be nulls?)

Think Hard about the Requirements
Consider alternatives
Confirm requirements from early results - this can lead to 'unstated' requirements

Perform Trials
consider the results, are they meeting the requirments?
discuss results (can lead to 'unstated requirements')
iterate (more trials)

Consider Performance!
don't just guess, get facts
use those facts to help decide: Am I finished?
query duration isn't necessarily the best measure
'gets and sorts' statistics will better reflect scalability

I especially liked this, now I know why I joined in
"why I participate in Experts-Exchange.[?]  I want exposure to more problems, from more people, across more industries.  The more problems I try to solve the better prepared I'll be for the challenges in my own job;

Thank you sdstuber
0
 
LVL 74

Author Comment

by:sdstuber
Thanks for the vote and comment.  I would do a "bullet list" like you have if I were to talk about this topic in a slide presentation.  In an article, particularly conversational, I prefer a simple introduction: "Here's a high level summary of what I'm going to talk about"  and then get into it.

If you're not interested in my intro, an outline isn't going to sell it.
If you are interested, then you don't need it. Including one would be redundant since the actual text is pretty short.

Thanks again for reading!
0
 

Expert Comment

by:jknj72
Great article, I find myself in these situations on a regular basis and need to think more about he big picture when trying to figure out an approach to a solution.
This sight and the experts that take the time to help are a true blessing and I am grateful for that. I look forward to the day I can do the same.

Thanks sdstuber!
0

Featured Post

NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

Join & Write a Comment

This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
This video shows how to copy an entire tablespace from one database to another database using Transportable Tablespace functionality.
Suggested Courses

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month