Link to home
Start Free TrialLog in
Avatar of larrydare
larrydare

asked on

Fill in NULL values in Outer Join

Problem Statement:
I am performing an outer join between 2 tables.    On the rows where are no matching values,  NULLs are in the column.   All this is expected.  

Now what I would like to happen is that instead of a NULL,  the previous value in the column is used.

Example
Table 1  - CHANGEDATA
Input_month |  Team_Name | ChangeNumber
----------------------------------------
1/1/2008            CC          12
2/1/2008            CC         3
3/1/2008            CC         0
4/1/2008            CC         0
5/1/2008            CC         4
6/1/2008            CC         3
7/1/2008            CC         0
8/1/2008            CC         2
9/1/2008            CC         4
10/1/2008           CC        10
11/1/2008           CC         0
12/1/2008           CC         0

Table 2 - BASEDATA
Input_month | Team_Name | Baseline
----------------------------------------
1/1/2008            CC       100
4/1/2008            CC       111
6/1/2008            CC       120
10/1/2008           CC       125

Doing an outer join yields the following
Input_month |  Team_Name | ChangeNumber | Baseline
--------------------------------------------------
1/1/2008            CC          12            100
2/1/2008            CC         3
3/1/2008            CC         0
4/1/2008            CC         0            111
5/1/2008            CC         4
6/1/2008            CC         3            120
7/1/2008            CC         0
8/1/2008            CC         2
9/1/2008            CC         4
10/1/2008           CC        10            125
11/1/2008           CC         0
12/1/2008           CC         0


Now here is what I would like:
Input_month |  Team_Name | ChangeNumber | Baseline
--------------------------------------------------
1/1/2008            CC          12            100
2/1/2008            CC         3            100
3/1/2008            CC         0            100
4/1/2008            CC         0            111
5/1/2008            CC         4            111
6/1/2008            CC         3            120
7/1/2008            CC         0            120
8/1/2008            CC         2            120
9/1/2008            CC         4            120
10/1/2008           CC        10            125
11/1/2008           CC         0            125
12/1/2008           CC         0            125

Where if Baseline is NULL, I would like the previous non null value.

NON WORKING SOLUTION.
My original solution was to use a User Defined Function to return the Baseline number
So the query would look like this
select a.input_month, a.Team_name, a.ChangeNumber,
     NVL(b.baseline, LastValue(a.input_month, a.Team_Name)) "Baseline"
from CHANGEDATA a, BASEDATA b
where a.input_month = b.input_month (+)
and a.team_name = b.team_name (+)

This query only yielded one row,  the first one.  
The problem is that the User Defined Function, LastValue  querys the table.  If LastValue just returns a hardcoded value,  I get all the rows back,  but the Baseline value does not show the last value.

The following is the LastValue Function:
create or replace function lastvalue(nmonth date,nteam varchar2) return number
as
newbaseline number;
begin
select baseline into newbaseline
from
(select baseline, max(input_month)
from
BASEDATA
where input_month <= nmonth
and TeamName = nteam
group by baseline);
return newbaseline;
end lastvalue;
/

If you run this function from a single row select,   it does return the proper value,  but when you expect multiple rows,   I only get one.


Question:
Is there a different SQL syntax that will give me what I want?
OR
What am I doing wrong in LastValue definition that is causing it to return only one row?


Example Data Creation Statement:
CREATE TABLE CHANGEDATA
(INPUT_MONTH DATE,
 TEAM_NAME VARCHAR2(10),
 CHANGENUMBER NUMBER)
/
INSERT INTO CHANGEDATE VALUES (TO_DATE('JAN-01-2008','MON-DD-YYYY'),'CC',12);
INSERT INTO CHANGEDATE VALUES (TO_DATE('FEB-01-2008','MON-DD-YYYY'),'CC',3);
INSERT INTO CHANGEDATE VALUES (TO_DATE('MAR-01-2008','MON-DD-YYYY'),'CC',0);
INSERT INTO CHANGEDATE VALUES (TO_DATE('APR-01-2008','MON-DD-YYYY'),'CC',0);
INSERT INTO CHANGEDATE VALUES (TO_DATE('MAY-01-2008','MON-DD-YYYY'),'CC',4);
INSERT INTO CHANGEDATE VALUES (TO_DATE('JUN-01-2008','MON-DD-YYYY'),'CC',3);
INSERT INTO CHANGEDATE VALUES (TO_DATE('JUL-01-2008','MON-DD-YYYY'),'CC',0);
INSERT INTO CHANGEDATE VALUES (TO_DATE('AUG-01-2008','MON-DD-YYYY'),'CC',2);
INSERT INTO CHANGEDATE VALUES (TO_DATE('SEP-01-2008','MON-DD-YYYY'),'CC',4);
INSERT INTO CHANGEDATE VALUES (TO_DATE('OCT-01-2008','MON-DD-YYYY'),'CC',10);
INSERT INTO CHANGEDATE VALUES (TO_DATE('NOV-01-2008','MON-DD-YYYY'),'CC',0);
INSERT INTO CHANGEDATE VALUES (TO_DATE('DEC-01-2008','MON-DD-YYYY'),'CC',0);

CREATE TABLE BASEDATA
(INPUT_MONTH DATE,
TEAM_NAME VARCHAR2(10),
BASELINE NUMBER)
/
INSERT INTO BASEDATA VALUES (TO_DATE('JAN-01-2008','MON-DD-YYYY'),'CC',100);
INSERT INTO BASEDATA VALUES (TO_DATE('APR-01-2008','MON-DD-YYYY'),'CC',111);
INSERT INTO BASEDATA VALUES (TO_DATE('JUN-01-2008','MON-DD-YYYY'),'CC',120);
INSERT INTO BASEDATA VALUES (TO_DATE('OCT-01-2008','MON-DD-YYYY'),'CC',125;

NORMAL OUTER JOIN
select a.input_month, a.Team_name, a.ChangeNumber,b.baseline
from CHANGEDATA a, BASEDATA b
where a.input_month = b.input_month (+)
and a.team_name = b.team_name (+)

Avatar of Sean Stuber
Sean Stuber

Is it a true statement that the populated baselines will always be equal to or greater than the previous values?
last_value is a built in analytic function to Oracle.

And I would write it like this...

SELECT   a.input_month, a.team_name, a.changenumber, nvl(b.baseline,
         last_value(baseline ignore nulls) over(order by a.input_month asc rows unbounded preceding))
    FROM changedata a, basedata b
   WHERE a.input_month = b.input_month(+) AND a.team_name = b.team_name(+)
ORDER BY 1

however, the "ignore nulls" clause isn't available in 9i.

So, you can substitute Max instead, but only if your data will be increasing over time


SELECT   a.input_month, a.team_name, a.changenumber, nvl(b.baseline,
         max(baseline) over(order by a.input_month  rows unbounded preceding))
    FROM changedata a, basedata b
   WHERE a.input_month = b.input_month(+) AND a.team_name = b.team_name(+)
ORDER BY 1
ASKER CERTIFIED SOLUTION
Avatar of Sean Stuber
Sean Stuber

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
for 9i, my preference is the last one, it doesn't require the assumption of increasing baselines over time.

after 9i,  I like the last_value version with the "ignore nulls" option
Avatar of larrydare

ASKER

Thanks so much...