Solved

Median Function in QMF SQL Query for DB2

Posted on 2010-08-29
9
2,529 Views
Last Modified: 2012-05-10
According to this link, DB2 does not offer the "Median" function that is available in MS-Excel for instance.

Does anyone know of an alternative method that I could run in my query to produce the "Median" result?  The SQL below better demonstrates what I'm looking to achieve if "Median" was actually a function that was available to me.

Select FIELD1, Median([FIELD2])
From TABLE1
Group by FIELD1
0
Comment
Question by:KP_SoCal
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 3
9 Comments
 
LVL 45

Expert Comment

by:Kent Olsen
ID: 33554539
Hi KP,

It will vary a bit depending on which flavor of DB2 that you use, but this is one way:

  with a (rn, field1)
  as
  (
    SELECT row_number () over (), FIELD1 FROM mytable
  )
  SELECT field1 FROM a WHERE rn = (SELECT cast (max(rn) / 2 as int) FROM a);


Good Luck,
Kent
0
 
LVL 35

Accepted Solution

by:
Gary Patterson earned 250 total points
ID: 33554690
Are you running DB2 on z/os (IBM mainframe), or i/os (AS/400)?  

What OS version?

Your documentation link is to the z/os version of DB2, not the i/os (OS/400, i5/os, whatever you want to call it) version, but your question is tagged for AS/400.  DB2 functionality varies from version to version, and platform to platform, so please be specific.

A median() function isn't available on either platform, as far as I know, so you'll have to come up with a way to derive it yourself.  

Here's the basic process for calculating the median of a set of non-null values.  This solution suits itself well to the use of arrays in a high-level language more so than to SQL, IMO, so I'd probably write a user-defined function in RPG (my language of choice for this sort of thing), COBOL, Java, or SQL Procedure Language.  Of course, there are lots of ways to derive a median, but if you have to do it frequently, you might want to consider just creating a user-defined function in the interest of ease-of use.  SQL solutions to this problem are cumbersome, and less-then-intuitive, if you ask me.

1) Sort the list of non-null values (ascending or descending order).
2) Count the number of values in the list.
3) If the number of values in the list is odd, select the middle value.  (n= number of values in the list.  Median value is the value found at list position (n+1)/2.
4) If the number of values in the list is even, calculate the average of the two middle values found (n=number of value in the list.  Median = ((Value at position n/2) + (Value at position n/2+1)) / 2.

Examples:

List A: 1,5,7,19,22 (odd number of values)  
n=5
5+1/2 = 3
Median = 3rd value on list = 7
List B: 1,5,6,19,22,23 (even number of values)
n=6
n/2 = 3rd value on list = 6
n/2+1 = 4th value on list = 19
Median = (6+19)/2 = 12.5
- Gary Patterson
0
 
LVL 35

Expert Comment

by:Gary Patterson
ID: 33554724
Kent's solution, unfortunately (and unusually!), has a couple of problems :

1) It derives a single median value from an entire data set, not a series of medians from each partition of a partitioned data set, as requested (GROUP BY Field1).
2) It only works for lists with an odd number of values, and I believe it actually selects the wrong value (by one position) even in that case (though that is easy to fix by making it max(rn)+1 / 2).

http://mathworld.wolfram.com/StatisticalMedian.html

- Gary Patterson
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 45

Expert Comment

by:Kent Olsen
ID: 33554760
Hi Gary,

Good catch on all of that.  That's what I get for trying to think after a day of hard labor.  :)  


If he's got the OLAP extensions, this gets a lot easier.  :)



Kent

  with a (rn, field2)
  as
  (
    SELECT row_number () over (partition by FIELD1), FIELD2 FROM mytable
  )
  SELECT field2 FROM a WHERE rn = (SELECT cast ((max(rn) + 1) / 2 as int) FROM a);

Open in new window

0
 

Author Comment

by:KP_SoCal
ID: 33554931
Guys, thanks for the quick responses.  I'm not sure which DB2 I'm running.  I'm querying the server from my PC via QMF.  My iSeries version is V5R4.  I'm not sure if this information helps.

I was able to run Kent's SQL (listed below), but I didn't get the expected results.  When grouping FIELD1, I have a total of 5 records.  The SQL only returned 4 records.  

I attached an Excel spreadsheet that better illustrates the results I'm looking to achieve.  In my spreadsheet, you'll see that I actually need to group the data on FIELD0 and FIELD1. Hope this makes sense.  I really appreciate the input.  Thanks! ;-)


with a (rn, field2)
  as
  (
    SELECT row_number () over (partition by FIELD1), FIELD2 FROM mytable
  )
  SELECT field2 FROM a WHERE rn = (SELECT cast ((max(rn) + 1) / 2 as int) FROM a);

Open in new window

Results.xls
ScreenPrint.bmp
0
 
LVL 45

Assisted Solution

by:Kent Olsen
Kent Olsen earned 250 total points
ID: 33555273
Hi KP,

The spreadsheet is averaging the two middle values if there are an even number for that group.  That seems like an odd way to compute a median.

That can be added to the query below, but I'm most familiar with a median being an actual value in the list.

The query below returns one more field, and joins the filter back to the CTE.


Kent

with a (rn, field1, field2)
as
(
  SELECT row_number () over (partition by FIELD1), field1, FIELD2 FROM mytable
)
SELECT field1, field2 
FROM a 
WHERE rn = (SELECT cast ((max(rn) + 1) / 2 as int) FROM a aa WHERE a.field1 = aa.field1);

Open in new window

0
 

Author Closing Comment

by:KP_SoCal
ID: 33555754
Thanks guys for all your help on this!  I'll test this out tomorrow.  If I run into any more snags, I'll create a separate post.  Thanks again.

KP
0
 

Author Comment

by:KP_SoCal
ID: 33555762
As an FYI, here's a really great article on the subject as well, though it's primarily related to Access.

http://www.experts-exchange.com/Microsoft/Development/MS_Access/A_2529.html
0
 
LVL 35

Expert Comment

by:Gary Patterson
ID: 33558261
That's closer, but is only correct for data sets with an odd number of members.  

For data sets with an even number of members, you need to calculate the average of the values found at max(rn)/2 and max(rn)/2+1.

- Gary Patterson
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SQL - format decimal in a string 5 51
table joins in qry 17 88
IBM i Break Messages 1 27
SQL query 7 18
Recursive SQL in UDB/LUW (you can use 'recursive' and 'SQL' in the same sentence) A growing number of database queries lend themselves to recursive solutions.  It's not always easy to spot when recursion is called for, especially for people una…
'Between' is such a common word we rarely think about it but in SQL it has a very specific definition we should be aware of. While most database vendors will have their own unique phrases to describe it (see references at end) the concept in common …
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question