larrydare
asked on
Fill in NULL values in Outer Join
Problem Statement:
I am performing an outer join between 2 tables. On the rows where are no matching values, NULLs are in the column. All this is expected.
Now what I would like to happen is that instead of a NULL, the previous value in the column is used.
Example
Table 1 - CHANGEDATA
Input_month | Team_Name | ChangeNumber
-------------------------- ---------- ----
1/1/2008 CC 12
2/1/2008 CC 3
3/1/2008 CC 0
4/1/2008 CC 0
5/1/2008 CC 4
6/1/2008 CC 3
7/1/2008 CC 0
8/1/2008 CC 2
9/1/2008 CC 4
10/1/2008 CC 10
11/1/2008 CC 0
12/1/2008 CC 0
Table 2 - BASEDATA
Input_month | Team_Name | Baseline
-------------------------- ---------- ----
1/1/2008 CC 100
4/1/2008 CC 111
6/1/2008 CC 120
10/1/2008 CC 125
Doing an outer join yields the following
Input_month | Team_Name | ChangeNumber | Baseline
-------------------------- ---------- ---------- ----
1/1/2008 CC 12 100
2/1/2008 CC 3
3/1/2008 CC 0
4/1/2008 CC 0 111
5/1/2008 CC 4
6/1/2008 CC 3 120
7/1/2008 CC 0
8/1/2008 CC 2
9/1/2008 CC 4
10/1/2008 CC 10 125
11/1/2008 CC 0
12/1/2008 CC 0
Now here is what I would like:
Input_month | Team_Name | ChangeNumber | Baseline
-------------------------- ---------- ---------- ----
1/1/2008 CC 12 100
2/1/2008 CC 3 100
3/1/2008 CC 0 100
4/1/2008 CC 0 111
5/1/2008 CC 4 111
6/1/2008 CC 3 120
7/1/2008 CC 0 120
8/1/2008 CC 2 120
9/1/2008 CC 4 120
10/1/2008 CC 10 125
11/1/2008 CC 0 125
12/1/2008 CC 0 125
Where if Baseline is NULL, I would like the previous non null value.
NON WORKING SOLUTION.
My original solution was to use a User Defined Function to return the Baseline number
So the query would look like this
select a.input_month, a.Team_name, a.ChangeNumber,
NVL(b.baseline, LastValue(a.input_month, a.Team_Name)) "Baseline"
from CHANGEDATA a, BASEDATA b
where a.input_month = b.input_month (+)
and a.team_name = b.team_name (+)
This query only yielded one row, the first one.
The problem is that the User Defined Function, LastValue querys the table. If LastValue just returns a hardcoded value, I get all the rows back, but the Baseline value does not show the last value.
The following is the LastValue Function:
create or replace function lastvalue(nmonth date,nteam varchar2) return number
as
newbaseline number;
begin
select baseline into newbaseline
from
(select baseline, max(input_month)
from
BASEDATA
where input_month <= nmonth
and TeamName = nteam
group by baseline);
return newbaseline;
end lastvalue;
/
If you run this function from a single row select, it does return the proper value, but when you expect multiple rows, I only get one.
Question:
Is there a different SQL syntax that will give me what I want?
OR
What am I doing wrong in LastValue definition that is causing it to return only one row?
Example Data Creation Statement:
CREATE TABLE CHANGEDATA
(INPUT_MONTH DATE,
TEAM_NAME VARCHAR2(10),
CHANGENUMBER NUMBER)
/
INSERT INTO CHANGEDATE VALUES (TO_DATE('JAN-01-2008','MO N-DD-YYYY' ),'CC',12) ;
INSERT INTO CHANGEDATE VALUES (TO_DATE('FEB-01-2008','MO N-DD-YYYY' ),'CC',3);
INSERT INTO CHANGEDATE VALUES (TO_DATE('MAR-01-2008','MO N-DD-YYYY' ),'CC',0);
INSERT INTO CHANGEDATE VALUES (TO_DATE('APR-01-2008','MO N-DD-YYYY' ),'CC',0);
INSERT INTO CHANGEDATE VALUES (TO_DATE('MAY-01-2008','MO N-DD-YYYY' ),'CC',4);
INSERT INTO CHANGEDATE VALUES (TO_DATE('JUN-01-2008','MO N-DD-YYYY' ),'CC',3);
INSERT INTO CHANGEDATE VALUES (TO_DATE('JUL-01-2008','MO N-DD-YYYY' ),'CC',0);
INSERT INTO CHANGEDATE VALUES (TO_DATE('AUG-01-2008','MO N-DD-YYYY' ),'CC',2);
INSERT INTO CHANGEDATE VALUES (TO_DATE('SEP-01-2008','MO N-DD-YYYY' ),'CC',4);
INSERT INTO CHANGEDATE VALUES (TO_DATE('OCT-01-2008','MO N-DD-YYYY' ),'CC',10) ;
INSERT INTO CHANGEDATE VALUES (TO_DATE('NOV-01-2008','MO N-DD-YYYY' ),'CC',0);
INSERT INTO CHANGEDATE VALUES (TO_DATE('DEC-01-2008','MO N-DD-YYYY' ),'CC',0);
CREATE TABLE BASEDATA
(INPUT_MONTH DATE,
TEAM_NAME VARCHAR2(10),
BASELINE NUMBER)
/
INSERT INTO BASEDATA VALUES (TO_DATE('JAN-01-2008','MO N-DD-YYYY' ),'CC',100 );
INSERT INTO BASEDATA VALUES (TO_DATE('APR-01-2008','MO N-DD-YYYY' ),'CC',111 );
INSERT INTO BASEDATA VALUES (TO_DATE('JUN-01-2008','MO N-DD-YYYY' ),'CC',120 );
INSERT INTO BASEDATA VALUES (TO_DATE('OCT-01-2008','MO N-DD-YYYY' ),'CC',125 ;
NORMAL OUTER JOIN
select a.input_month, a.Team_name, a.ChangeNumber,b.baseline
from CHANGEDATA a, BASEDATA b
where a.input_month = b.input_month (+)
and a.team_name = b.team_name (+)
I am performing an outer join between 2 tables. On the rows where are no matching values, NULLs are in the column. All this is expected.
Now what I would like to happen is that instead of a NULL, the previous value in the column is used.
Example
Table 1 - CHANGEDATA
Input_month | Team_Name | ChangeNumber
--------------------------
1/1/2008 CC 12
2/1/2008 CC 3
3/1/2008 CC 0
4/1/2008 CC 0
5/1/2008 CC 4
6/1/2008 CC 3
7/1/2008 CC 0
8/1/2008 CC 2
9/1/2008 CC 4
10/1/2008 CC 10
11/1/2008 CC 0
12/1/2008 CC 0
Table 2 - BASEDATA
Input_month | Team_Name | Baseline
--------------------------
1/1/2008 CC 100
4/1/2008 CC 111
6/1/2008 CC 120
10/1/2008 CC 125
Doing an outer join yields the following
Input_month | Team_Name | ChangeNumber | Baseline
--------------------------
1/1/2008 CC 12 100
2/1/2008 CC 3
3/1/2008 CC 0
4/1/2008 CC 0 111
5/1/2008 CC 4
6/1/2008 CC 3 120
7/1/2008 CC 0
8/1/2008 CC 2
9/1/2008 CC 4
10/1/2008 CC 10 125
11/1/2008 CC 0
12/1/2008 CC 0
Now here is what I would like:
Input_month | Team_Name | ChangeNumber | Baseline
--------------------------
1/1/2008 CC 12 100
2/1/2008 CC 3 100
3/1/2008 CC 0 100
4/1/2008 CC 0 111
5/1/2008 CC 4 111
6/1/2008 CC 3 120
7/1/2008 CC 0 120
8/1/2008 CC 2 120
9/1/2008 CC 4 120
10/1/2008 CC 10 125
11/1/2008 CC 0 125
12/1/2008 CC 0 125
Where if Baseline is NULL, I would like the previous non null value.
NON WORKING SOLUTION.
My original solution was to use a User Defined Function to return the Baseline number
So the query would look like this
select a.input_month, a.Team_name, a.ChangeNumber,
NVL(b.baseline, LastValue(a.input_month, a.Team_Name)) "Baseline"
from CHANGEDATA a, BASEDATA b
where a.input_month = b.input_month (+)
and a.team_name = b.team_name (+)
This query only yielded one row, the first one.
The problem is that the User Defined Function, LastValue querys the table. If LastValue just returns a hardcoded value, I get all the rows back, but the Baseline value does not show the last value.
The following is the LastValue Function:
create or replace function lastvalue(nmonth date,nteam varchar2) return number
as
newbaseline number;
begin
select baseline into newbaseline
from
(select baseline, max(input_month)
from
BASEDATA
where input_month <= nmonth
and TeamName = nteam
group by baseline);
return newbaseline;
end lastvalue;
/
If you run this function from a single row select, it does return the proper value, but when you expect multiple rows, I only get one.
Question:
Is there a different SQL syntax that will give me what I want?
OR
What am I doing wrong in LastValue definition that is causing it to return only one row?
Example Data Creation Statement:
CREATE TABLE CHANGEDATA
(INPUT_MONTH DATE,
TEAM_NAME VARCHAR2(10),
CHANGENUMBER NUMBER)
/
INSERT INTO CHANGEDATE VALUES (TO_DATE('JAN-01-2008','MO
INSERT INTO CHANGEDATE VALUES (TO_DATE('FEB-01-2008','MO
INSERT INTO CHANGEDATE VALUES (TO_DATE('MAR-01-2008','MO
INSERT INTO CHANGEDATE VALUES (TO_DATE('APR-01-2008','MO
INSERT INTO CHANGEDATE VALUES (TO_DATE('MAY-01-2008','MO
INSERT INTO CHANGEDATE VALUES (TO_DATE('JUN-01-2008','MO
INSERT INTO CHANGEDATE VALUES (TO_DATE('JUL-01-2008','MO
INSERT INTO CHANGEDATE VALUES (TO_DATE('AUG-01-2008','MO
INSERT INTO CHANGEDATE VALUES (TO_DATE('SEP-01-2008','MO
INSERT INTO CHANGEDATE VALUES (TO_DATE('OCT-01-2008','MO
INSERT INTO CHANGEDATE VALUES (TO_DATE('NOV-01-2008','MO
INSERT INTO CHANGEDATE VALUES (TO_DATE('DEC-01-2008','MO
CREATE TABLE BASEDATA
(INPUT_MONTH DATE,
TEAM_NAME VARCHAR2(10),
BASELINE NUMBER)
/
INSERT INTO BASEDATA VALUES (TO_DATE('JAN-01-2008','MO
INSERT INTO BASEDATA VALUES (TO_DATE('APR-01-2008','MO
INSERT INTO BASEDATA VALUES (TO_DATE('JUN-01-2008','MO
INSERT INTO BASEDATA VALUES (TO_DATE('OCT-01-2008','MO
NORMAL OUTER JOIN
select a.input_month, a.Team_name, a.ChangeNumber,b.baseline
from CHANGEDATA a, BASEDATA b
where a.input_month = b.input_month (+)
and a.team_name = b.team_name (+)
Is it a true statement that the populated baselines will always be equal to or greater than the previous values?
last_value is a built in analytic function to Oracle.
And I would write it like this...
SELECT a.input_month, a.team_name, a.changenumber, nvl(b.baseline,
last_value(baseline ignore nulls) over(order by a.input_month asc rows unbounded preceding))
FROM changedata a, basedata b
WHERE a.input_month = b.input_month(+) AND a.team_name = b.team_name(+)
ORDER BY 1
however, the "ignore nulls" clause isn't available in 9i.
So, you can substitute Max instead, but only if your data will be increasing over time
SELECT a.input_month, a.team_name, a.changenumber, nvl(b.baseline,
max(baseline) over(order by a.input_month rows unbounded preceding))
FROM changedata a, basedata b
WHERE a.input_month = b.input_month(+) AND a.team_name = b.team_name(+)
ORDER BY 1
And I would write it like this...
SELECT a.input_month, a.team_name, a.changenumber, nvl(b.baseline,
last_value(baseline ignore nulls) over(order by a.input_month asc rows unbounded preceding))
FROM changedata a, basedata b
WHERE a.input_month = b.input_month(+) AND a.team_name = b.team_name(+)
ORDER BY 1
however, the "ignore nulls" clause isn't available in 9i.
So, you can substitute Max instead, but only if your data will be increasing over time
SELECT a.input_month, a.team_name, a.changenumber, nvl(b.baseline,
max(baseline) over(order by a.input_month rows unbounded preceding))
FROM changedata a, basedata b
WHERE a.input_month = b.input_month(+) AND a.team_name = b.team_name(+)
ORDER BY 1
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
for 9i, my preference is the last one, it doesn't require the assumption of increasing baselines over time.
after 9i, I like the last_value version with the "ignore nulls" option
after 9i, I like the last_value version with the "ignore nulls" option
ASKER
Thanks so much...