Link to home
Start Free TrialLog in
Avatar of nQuote
nQuoteFlag for United States of America

asked on

Complex SQL query

I have two tables Tbl1 and Tbl2. I have a SQL as follows:

select Tbl1.Col1, Tbl1.Col2, Tbl2.ValA
from Tbl1
let outer join
(
select ColA, ValA, Date1, Date2
from Tbl2
where Condition=Xyz
)Tbl2
on Tbl1.ColA = Tbl2.ColA and
Tbl1.ColDate between Tbl2.Date1 and Tbl2.Date2

There could be a situation where ColA matches but there is no row in Tbl2 where ColDate is between Date1 and Date2. In that case, I have to find the next higher Date1 and Date2 combination in Tbl2 and find the corresponding Tbl2.ValA value for it.

I know that I can write the values into a temporary table first and then update but it seems cumbersome. I cannot use a cursor and traverse through it as the number of rows is huge. Any help would be appreciated.
Avatar of Brian Crowe
Brian Crowe
Flag of United States of America image

what does "next higher Date1 and Date2 combination" mean?
Avatar of nQuote

ASKER

The values might be like this:

Date1         Date2       ValA
3/4/2016  4/5/2017       A1
6/7/2017  8/8/2017       B1
9/6/2018  4/2/2019       C1

So, for example, ColDate in Tbl1 might be less than 3/4/2016. In that case, it has to take the ValA for Date1=3/4/2016 and Date2=4/5/2017. Similarly, there might be a situation where ColDate in Tbl1 might be 4/10/2017. In that case, it should take the ValA for Date1=6/7/2017 and Date2=8/8/2017.
Assuming the date ranges in Table2 are not overlapping and that Tbl2.Date2 is always greater than or equal to Tbl2.Date1 then you can do something like the following...

SELECT Tbl1.Col1, Tbl1.Col2, Tbl2.ValA,
   ROW_NUMBER() OVER(PARTITION BY 
FROM Tbl1
OUTER APPLY
(
   SELECT TOP (1) Tbl2.ColA, Tbl2.ValA, Tbl2.Date1, Tbl2.Date2
   FROM Tbl2
   WHERE Tbl2.Condition = 'XYZ'
      AND Tbl2.ColA = Tbl1.ColA
      AND Tbl2.Date2 >= Tbl1.ColDate
   ORDER BY Tbl2.ColA
) AS Tbl2

Open in new window

Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

TOP(1) isn't Oracle syntax and the ROW_NUMBER seems incomplete.


Based on the sample data, here is what I came up with.  If it isn't correct, please adjust the test case and provide expected results based on the update test case.

/*
drop table tab1 purge;
create table tab1(cola char(1), coldate date);
insert into tab1 values('a',to_date('3/4/2016','MM/DD/YYYY')); 
insert into tab1 values('a',to_date('4/10/2017','MM/DD/YYYY')); 
commit;

drop table tab2 purge;
create table tab2(cola char(1), Date1 date, Date2 date, ValA char(2));
insert into tab2 values('a',to_date('3/4/2016','MM/DD/YYYY'), to_date('4/5/2017','MM/DD/YYYY'),'A1');
insert into tab2 values('a',to_date('6/7/2017','MM/DD/YYYY'), to_date('8/8/2017','MM/DD/YYYY'),'B1');
insert into tab2 values('a',to_date('9/6/2018','MM/DD/YYYY'), to_date('4/2/2019','MM/DD/YYYY'),'C1');
commit;

*/

select cola, coldate, date1, date2, prev_date2, vala
from
tab1 t1
join (
	select cola, date1, date2, vala, lag(date2) over(partition by cola order by date2) prev_date2
	from tab2
) t2
using(cola)
where
	t1.coldate between t2.date1 and t2.date2
	or
	t1.coldate between t2.prev_date2 and t2.date1
;

Open in new window

Avatar of nQuote

ASKER

slightvw, I am assuming that by "join" you are meaning left outer join,right?
No.  Don't see the need for a left join given what you posted.
Avatar of nQuote

ASKER

My actual join is left outer join. My actual SQL is far bigger. I just posted the relevant parts. Also, shouldn't "t1.coldate between t2.prev_date2 and t2.date1" be "t1.coldate between t2.date1 and t2.prev_date2"?
>>Also, shouldn't "t1.coldate between t2.prev_date2 and t2.date1" be "t1.coldate between t2.date1 and t2.prev_date2"?

Don't think so.  Using LAG it looks at the previous row's value so prev_date2 should be before the current rows date1.

I can only go by the test data you posted and I posted the complete test case I used.  It seems to produce the results you wanted.
Avatar of nQuote

ASKER

FYI: My Oracle version is 11G. One thing I should have mentioned is ColA is there in Tbl1 and the sub-SELECT Tbl2. I have a lot of columns selected in Tbl1 along with ColA. I don't know if that makes a difference. When I am doing:
using(Tbl2.ColA)
it is giving an error saying:
"Only simple column names allowed here"

When I do just:
using(ColA)
I get an error saying "missing right parenthesis"
I checked all parentheses thoroughly. They all match.
Avatar of nQuote

ASKER

I changed the Tbl2.ColA as ColA2 and used:
using(ColA2)
Now I am just getting the parenthesis error.
Then use the old join syntax:
join ... on t1.cola=t2.cola

Then you'll need to use the alias in the select piece so cola would become t1.cola or t2.cola, whichever one you want, like in your original query.
Avatar of nQuote

ASKER

Just want to mention that, in the actual SQL, after the left outer join with Tbl2, there is one more left outer join between Tbl2 and another table,Tbl3. I don't think that makes a difference but just letting you know.
With the left outer join, you just might not get a ValA value.
Avatar of nQuote

ASKER

Ok,let me restate my original SQL. I'll include Tbl3 as it might be useful.

select Tbl1.Col1, Tbl1.Col2, Tbl2.ValA
from Tbl1
left outer join
(
select ColA, ValA, Date1, Date2
from Tbl2
where Condition=Xyz
)Tbl2
on Tbl1.ColA = Tbl2.ColA and
(Tbl1.ColDate between Tbl2.Date1 and Tbl2.Date2)
left outer join Tbl3 on Tbl2.ValA = Tbl3.ValA

I rewrote the SQL by following your answer. I added the Tbl3.ValB. Earlier I just wanted to keep it little simpler. As you can see, Tbl2.ValA is being used to join with Tbl3.ValA to get Tbl3.ValB:

select Tbl1.Col1, Tbl1.Col2, Tbl3.ValB
from Tbl1
left outer join
(
select ColA, ValA, Date1, Date2, lag(Date2) over(partition by ColA order by Date2) As Date2_2
from Tbl2
where Condition=Xyz
) Tbl2
on Tbl1.ColA = Tbl2.ColA and
left outer join Tbl3 on Tbl2.ValA = Tbl3.ValA
where (Tbl1.ColDate between Tbl2.Date1 and Tbl2.Date2
or Tbl1.ColDate between Tbl2.Date2_2 and Tbl1.Date1)

This runs fine but the final count is not correct. I am looking further.
Then add tables and data to my test case and provide the expected results from the data in the test case.
Avatar of nQuote

ASKER

Here is my actual SQL with table names altered. Actual number of columns I am retrieving is much larger but I mentioned just two relevant ones here.
The main important tables here are Tbl1 and Tbl2 which really is an alias of a sub-SELECT. The count should always be the count of Tbl1, which is the
driving table.

Really appreciate your help. Please let me know if additional sample data will help.

SQL I initially wrote:

SELECT
      Tbl1.ColA, Tbl6.ColVal
   FROM Tbl1
   LEFT OUTER JOIN Tbl5
   ON (Tbl1.ColVal1_1 = Tbl5.ColVal1_2 AND Tbl5.ColFgl = 'Y')
   LEFT OUTER JOIN
   (
         SELECT Tbl3.ColA, Tbl3.ColB, Tbl4.ColDate2, Tbl4.ColDate3
           FROM Tbl3
     INNER JOIN Tbl4
             ON Tbl3.ColC = Tbl4.ColC
          WHERE Tbl4.ColFlg2 = 'Y'
   ) Tbl2
   ON Tbl1.ColA = Tbl2.ColA AND Tbl1.ColDate1 BETWEEN Tbl2.ColDate2 ANDTbl2.ColDate3
   LEFT OUTER JOIN Tbl6 ON Tbl2.ColB = Tbl6.ColB

SQL I wrote using your suggestion:

SELECT
      Tbl1.ColA, Tbl6.ColVal
   FROM Tbl1
   LEFT OUTER JOIN Tbl5
   ON (Tbl1.ColVal1_1 = Tbl5.ColVal1_2 AND Tbl5.ColFlg = 'Y')
   LEFT OUTER JOIN
   (
         SELECT Tbl3.ColA, Tbl3.ColB, Tbl4.ColDate2, Tbl4.ColDate3, LAG(Tbl4.ColDate3)
                OVER(PARTITION BY Tbl3.ColA ORDER BY Tbl4.ColDate3) As ColDate3_2
           FROM Tbl3
     INNER JOIN Tbl4
             ON Tbl3.ColC = Tbl4.ColC
          WHERE Tbl4.ColFlg2 = 'Y'
   ) Tbl2
   ON Tbl1.ColA = Tbl2.ColA
   LEFT OUTER JOIN Tbl6 ON Tbl2.ColB = Tbl6.ColB
   WHERE (Tbl1.ColDate BETWEEN Tbl2.ColDate2 AND Tbl2.ColDate3
          OR Tbl1.ColDate BETWEEN Tbl2.ColDate3_2 AND Tbl2.ColDate2)


The query doesn't help.  What can I do with that and how will I know if any changes I make actually work?

I need data and expected results you want from the data.

In #a43090271 look at how I created tables, then inserted sample data.  That way I can actually run a query to manke sure it provides the results you want.

Without tables and data there really isn't anything I can do.
Avatar of nQuote

ASKER

Thank you for your response. I'll provide actual data for all tables.
Remember that I don't need or want "real" data that may be sensitive.  Just something showing me what you are trying to achieve.

Don't forget to post the results you want from whatever data you provide.  Not a description of them, the actual results you want.
Avatar of nQuote

ASKER

Here are sample data values.I can put more detail/data if needed.

Tbl1:
ColA  ColDate
10      6/8/2017
20      3/2/2016
30      9/5/2018

Tbl2 (this is an alias of a sub-SELECT):
ColA  ColDate2  ColDate3  ColB
10      3/4/2016    4/5/2017  1
20      6/7/2017    8/8/2017  2
30      9/6/2018    4/2/2019  3

Tbl6:
ColB  ColVal
1        Abc
2        Def
3        Ghi

The final values I am expecting from the SQL:

Tbl1.ColA  Tbl6.ColVal
10              Def
20              Abc
30              Ghi

Explanation:
1) For Tbl1.ColA=10, corresponding ColDate is 6/8/2017. It falls in the range 6/7/2017 and 8/8/2017 so the ColB value to retrieve is 2 and the corresponding Tbl6.ColVal value to retrieve is Def.

2) For Tbl1.ColA=20, corresponding ColDate is 3/2/2016. It is less than all the ranges in Tbl2. As a result, the corresponding ColB value it should use is 1 which is the next highest range. The corresponding Tbl6.ColVal value to retrieve is Abc.

3) For Tbl1.ColA=30, corresponding ColDate is 9/5/2018. It falls in between the 2nd and 3rd ranges in Tbl2 but doesn't fall in any of the three ranges. As a result, the corresponding ColB value it should use is 3 which is the next highest range. The corresponding Tbl6.ColVal value to retrieve is Ghi.
So you are saying the ColA column in tbl2 doesn't really matter and you cannot use it to join to tbl1?
Avatar of nQuote

ASKER

No it matters. It is the first column I am using to join with Tbl1. By the way, Tbl2 is an alias of a sub query which has Tbl3 which has the actual column ColA. I am putting my new SQL here again:

SELECT
      Tbl1.ColA, Tbl6.ColVal
   FROM Tbl1
   LEFT OUTER JOIN Tbl5
   ON (Tbl1.ColVal1_1 = Tbl5.ColVal1_2 AND Tbl5.ColFlg = 'Y')
   LEFT OUTER JOIN
   (
         SELECT Tbl3.ColA, Tbl3.ColB, Tbl4.ColDate2, Tbl4.ColDate3, LAG(Tbl4.ColDate3)
                OVER(PARTITION BY Tbl3.ColA ORDER BY Tbl4.ColDate3) As ColDate3_2
           FROM Tbl3
     INNER JOIN Tbl4
             ON Tbl3.ColC = Tbl4.ColC
          WHERE Tbl4.ColFlg2 = 'Y'
   ) Tbl2
   ON Tbl1.ColA = Tbl2.ColA
   LEFT OUTER JOIN Tbl6 ON Tbl2.ColB = Tbl6.ColB
   WHERE (Tbl1.ColDate BETWEEN Tbl2.ColDate2 AND Tbl2.ColDate3
          OR Tbl1.ColDate BETWEEN Tbl2.ColDate3_2 AND Tbl2.ColDate2)
The query you posted doesn't really mean anything since I don't have all the tables used in it...  and it doesn't "work" so I really don't need to look at it.

If I join tbl1 to tbl2 using ColA, how do I meet this requirement:
1) For Tbl1.ColA=10, corresponding ColDate is 6/8/2017. It falls in the range 6/7/2017 and 8/8/2017 so the ColB value to retrieve is 2 and the corresponding Tbl6.ColVal value to retrieve is Def.


If I join using '10' from both tables, how am I supposed to access tbl2 where ColA=20 which is the range you want?
I had to again assume quite a bit until I get an answer to the above question about the the joins.

This produces the expected results but I'm not confident in the complete requirements.

I don't have 11g to test against but I'm pretty sure I didn't use any 12c SQL in here.
/*

drop table tab1 purge;
create table tab1(cola number, coldate date);
insert into tab1 values(10,to_date('6/8/2017','MM/DD/YYYY'));
insert into tab1 values(20,to_date('3/2/2016','MM/DD/YYYY'));
insert into tab1 values(30,to_date('9/5/2018','MM/DD/YYYY'));
commit;

drop table tab2 purge;
create table tab2(cola number, coldate2 date, coldate3 date, colb number);
insert into tab2 values(10,to_date('3/4/2016','MM/DD/YYYY'),to_date('4/5/2017','MM/DD/YYYY'),1);
insert into tab2 values(20,to_date('6/7/2017','MM/DD/YYYY'),to_date('8/8/2017','MM/DD/YYYY'),2);
insert into tab2 values(30,to_date('9/6/2018','MM/DD/YYYY'),to_date('4/2/2019','MM/DD/YYYY'),3);
commit;

drop table tab6 purge;
create table tab6(colb number, colval char(3));
insert into tab6 values(1,'Abc');
insert into tab6 values(2,'Def');
insert into tab6 values(3,'Ghi');
commit;

The final values I am expecting from the SQL:
Tbl1.ColA  Tbl6.ColVal
10              Def
20              Abc
30              Ghi

*/

with base as (
	select
		t1.cola t1_cola, coldate, t2.cola t2_cola, coldate2, coldate3, colb
		,case
			when coldate < min(coldate2) over()								then 1
			when t1.cola=t2.cola and coldate between coldate2 and coldate3	then 2
			when t1.cola=t2.cola and coldate < coldate2						then 3
			when coldate between coldate2 and coldate3						then 4
			when coldate2 > coldate											then 5
		end range_found
	from tab1 t1
	cross join tab2 t2
),
which_range as (
	select  t1_cola, colb, row_number() over(partition by t1_cola order by range_found) rn
	from base
)
select t1_cola, colval
from which_range wr
	join tab6 t6 on wr.colb=t6.colb and rn=1
order by t1_cola
;

Open in new window

Avatar of nQuote

ASKER

Thank you very much. I was going to review your question and was going to give a complete answer. That's why I was taking some time. I'll try this solution and also try to give a complete answer to your earlier question. Once again, really appreciate your help.
Avatar of nQuote

ASKER

To answer your question, "If I join using '10' from both tables, how am I supposed to access tbl2 where ColA=20 which is the range you want?", that is what I am trying to do. The join is Tbl1.ColA=Tbl2.ColA and ColDate BETWEEN ColDate2  AND ColDate3. If first part of the join is successful but the second part fails, I need to go to the next higher range. For ColA=10, the range to go to is 6/7/2017 - 8/8/2017 and the corresponding ColB/ColVal value to get is 2/Def. We might have to do this using some other technique but let's see.

I am running your SQL with the WITH but it is running for a long time, probably due to the CROSS JOIN. I'll let it run. Let's see how long it takes.

Instead of doing a CROSS JOIN, how about doing a LEFT OUTER JOIN between T1 and T2 on ColA?
Avatar of nQuote

ASKER

I ran your SQL. I am seeing two issues:
1) The counts do not match. For example, for ColA in (1,2,3,4,5), the original SQL returns 20 rows but I am getting a much higher number of rows.
2) The ColVal value is empty except for the one row in each set. So, for example, for ColA=1 only one row has the ColVal value. The others are blank.
>>For ColA=10, the range to go to is 6/7/2017 - 8/8/2017

But the 6/7/2017 range isn't for ColA=10 so even a left join won't pull it in.  It is for ColA=20.  Given what you are telling me, I see no way to do anything other than a cross join to pull in ALL possible combinations.

>>We might have to do this using some other technique but let's see.

That is where I added conditions to the case statement when the colA values match.  If not match on them, then only check date values.

>> but I am getting a much higher number of rows.

Likely due to the cross join and some missing case statements or a logic error.

>>The counts do not match.

I can only go by what you provide me.  I need more sample data and expected results that shows where things don't work.
Avatar of nQuote

ASKER

slightwv, now I see the incorrect information I gave you. Here are the updated values of the tables. We can have duplicate values for ColA (I am sorry I didn't see this earlier):
NOTE: There might be other ColA values in the table but I am showing you the example with the 10 value.

Tbl1:
ColA  ColDate
10      6/8/2017
10      3/2/2016
10      9/5/2018

Tbl2 (this is an alias of a sub query):
ColA  ColDate2  ColDate3  ColB
10      3/4/2016    4/5/2017  1
10      6/7/2017    8/8/2017  2
10      9/6/2018    4/2/2019  3

Tbl6:
ColB  ColVal
1        Abc
2        Def
3        Ghi

The final values I am expecting from the SQL:

Tbl1.ColA  Tbl6.ColVal
10              Def
10              Abc
10              Ghi

Explanation:
1) For the first record with Tbl1.ColA=10, corresponding ColDate is 6/8/2017. It falls in the range 6/7/2017 and 8/8/2017 so the ColB value to retrieve is 2 and the corresponding Tbl6.ColVal value to retrieve is Def.

2) For the second record with Tbl1.ColA=10, corresponding ColDate is 3/2/2016. It is less than all the ranges in Tbl2. As a result, the corresponding ColB value it should use is 1 which is the next highest range. The corresponding Tbl6.ColVal value to retrieve is Abc.

3) For the third record with Tbl1.ColA=10, corresponding ColDate is 9/5/2018. It falls in between the 2nd and 3rd ranges in Tbl2 but doesn't fall in any of the three ranges. As a result, the corresponding ColB value it should use is 3 which is the next highest range. The corresponding Tbl6.ColVal value to retrieve is Ghi.
I'm betting this is still wrong because you are likely missing some requirements in the data since you seem to be changing them between posts.

Still not seeing the need for a left join between tbl1 and tbl2 but seem to really want one so add it.  The results based on this test case are the same.

You haven't provided sample data and expected results for where there isn't a row in tbl2 that isn't covered by an existing rule.  Except where coldate is > than the max coldate3 in tbl3.  Not sure what you want then.

You didn't add coldate to your latest expected results so you have no way to know what colb goes with what row but I knew what you meant.

This produces the expected results given the sample data.

/*
drop table tab1 purge;
create table tab1(cola number, coldate date);
insert into tab1 values(10,to_date('6/8/2017','MM/DD/YYYY'));
insert into tab1 values(10,to_date('3/2/2016','MM/DD/YYYY'));
insert into tab1 values(10,to_date('9/5/2018','MM/DD/YYYY'));
commit;

drop table tab2 purge;
create table tab2(cola number, coldate2 date, coldate3 date, colb number);
insert into tab2 values(10,to_date('3/4/2016','MM/DD/YYYY'),to_date('4/5/2017','MM/DD/YYYY'),1);
insert into tab2 values(10,to_date('6/7/2017','MM/DD/YYYY'),to_date('8/8/2017','MM/DD/YYYY'),2);
insert into tab2 values(10,to_date('9/6/2018','MM/DD/YYYY'),to_date('4/2/2019','MM/DD/YYYY'),3);
commit;

drop table tab6 purge;
create table tab6(colb number, colval char(3));
insert into tab6 values(1,'Abc');
insert into tab6 values(2,'Def');
insert into tab6 values(3,'Ghi');
commit;

*/

with base as (
	select
		t1.cola t1_cola, coldate, t2.cola t2_cola, coldate2, coldate3, colb
		,case
			when coldate < min(coldate2) over()								then 1
			--when coldate < coldate2 										then 1
			when coldate between coldate2 and coldate3						then 2
			when coldate2 > coldate											then 3
else 4
		end range_found
	from tab1 t1
	join tab2 t2 on t1.cola=t2.cola
),
which_range as (
	select t1_cola, coldate, t2_cola, coldate2, coldate3, colb , range_found,row_number() over(partition by coldate order by range_found, coldate2) rn
	from base
)
select t1_cola, coldate, colval
from which_range wr
	join tab6 t6 on wr.colb=t6.colb and rn=1
order by coldate
/

Open in new window

Avatar of nQuote

ASKER

Thank you for your post. Let me review everything again and I'll try to give more sample data for all tables. The only problem is the data is confidential so I have to totally change them and post.
Avatar of nQuote

ASKER

The LEFT OUTER JOIN is needed between Tbl1 and Tbl2 because there are few rows (less than 5) in Tbl1 out of 500k+ rows which do not exist in Tbl2.
>>Tbl1 and Tbl2 because there are few rows (less than 5) in Tbl1 out of 500k+ rows which do not exist in Tbl2.

Then what value gets assigned to colval?

>>The only problem is the data is confidential so I have to totally change them and post.

I understand and have to do the same thing.  Changing real data should be simple global search/replace?
Avatar of nQuote

ASKER

For those few rows, there won't be a ColVal value.
OK, if I add a row to tbl1 with nothing int tbl2:
insert into tab1 values(20,to_date('9/5/2050','MM/DD/YYYY'));


Then I get this SQL:
with base as (
	select
		t1.cola t1_cola, coldate, t2.cola t2_cola, coldate2, coldate3, colb
		,case
			when coldate < min(coldate2) over()								then 1
			--when coldate < coldate2 										then 1
			when coldate between coldate2 and coldate3						then 2
			when coldate2 > coldate											then 3
else 4
		end range_found
	from tab1 t1
	left join tab2 t2 on t1.cola=t2.cola
),
which_range as (
	select t1_cola, coldate, t2_cola, coldate2, coldate3, colb , range_found,row_number() over(partition by coldate order by range_found, coldate2) rn
	from base
)
select t1_cola, coldate, colval
from which_range wr
	left join tab6 t6 on wr.colb=t6.colb
where rn=1
order by coldate
/

Open in new window


My results:
   T1_COLA COLDATE             COL
---------- ------------------- ---
        10 03/02/2016 00:00:00 Abc
        10 06/08/2017 00:00:00 Def
        10 09/05/2018 00:00:00 Ghi
        20 09/05/2050 00:00:00

Open in new window

Just had a thought that you don't need the left join on tbl6 if you get those values first with a join to tbl2.

Updated SQL:
with 
get_tbl2_colvals as (
	select cola, coldate2, coldate3, colval
	from tab2 t2
	join tab6 t6 on t2.colb=t6.colb
),
base as (
	select
		t1.cola t1_cola, coldate, t2.cola t2_cola, coldate2, coldate3, colval
		,case
			when coldate < min(coldate2) over()								then 1
			--when coldate < coldate2 										then 1
			when coldate between coldate2 and coldate3						then 2
			when coldate2 > coldate											then 3
else 4
		end range_found
	from tab1 t1
	left join get_tbl2_colvals t2 on t1.cola=t2.cola
),
which_range as (
	select t1_cola, coldate, t2_cola, coldate2, coldate3, colval, range_found, row_number() over(partition by coldate order by range_found, coldate2) rn
	from base
)
select t1_cola, coldate, colval
from which_range wr
where rn=1
order by coldate
/

Open in new window

Avatar of nQuote

ASKER

slightwv, thank you very much for your posts. I am going to try out the solution. A big difficulty is that, due to confidentiality, I have to change data and table names back and forth. This is probably causing some of the problems. Anyway, will let you know tonight or tomorrow.
Avatar of nQuote

ASKER

I tried your latest SQL. Here is one piece of information I found with the data. For a certain ColA value, let's say 10, there are 2 rows in Tbl1 whereas there are 6 rows in Tbl2 (which is really a sub query constituting Tbl3 as the driving table so 6 rows in Tbl3). I did a LEFT OUTER JOIN and INNER JOIN between Tbl1 and Tbl2. Both return the same number of rows so I am thinking that this does not make a difference. I also tried doing DISTINCT in the inner SELECTs. The one for get_tbl2_colvals returns lesser number of rows with DISTINCT.

Here are two issues I found with the result (I tried with a set of 5 different ColA values):
1) The number of rows returned is 8 whereas there are 10 in Tbl1. No data was returned for one of the 5 ColA values.
2) Some of the ColVal values are empty.

I am researching more.
Avatar of nQuote

ASKER

The latest SQL returns a much smaller number of rows. This solution might need a temporary table.
>>The latest SQL returns a much smaller number of rows.

That doesn't tell me much.  Does thatseem correct or incorrect?

>>This solution might need a temporary table.

Usually not necessary in Oracle.
Avatar of nQuote

ASKER

The driving table has 400k+ rows. This SQL returns about 1k.
Still doesn't tell me much.  Remember, I don't know anything about your system or data.

I'm assuming you are expecting more than 1,000 rows?

You'll need to add additional sample data that you feel should be returned that the query isn't returning...

So, please add additional insert statements to the test case I posted then provide the expected results from the new data.
Avatar of nQuote

ASKER

Ok. I'll present a completely detailed scenario. One question:
"when coldate < min(coldate2) over() 1"
What does this mean? We are not partitioning over anything, right? In that case, what does it mean?
That is the MIN over the entire result set.  I forgot to change that.  It was a left-over from the cross join I had when the provided data wasn't correct.

Should probably be partitioned by colA:
                  when coldate < min(coldate2) over(partition by t1.cola)                  then 1
Avatar of nQuote

ASKER

slightvw, thank you very much for your help. I really cannot thank you enough.
------------------------------------------------------------------------------------------------------------------------------
/*
--This has 4 rows for cola=10.
drop table tab1 purge;
create table tab1(cola number, coldate_start date, coldate_end date);
insert into tab1 values(10,to_date('06/05/2018','MM/DD/YYYY'),to_date('07/04/2018','MM/DD/YYYY'));
insert into tab1 values(10,to_date('07/05/2018','MM/DD/YYYY'),to_date('08/04/2018','MM/DD/YYYY'));
insert into tab1 values(10,to_date('08/05/2018','MM/DD/YYYY'),to_date('09/04/2018','MM/DD/YYYY'));
insert into tab1 values(10,to_date('09/05/2018','MM/DD/YYYY'),to_date('12/31/2099','MM/DD/YYYY'));
commit;

--This is actually a subquery. This has 80+ rows for this cola=10. I am only showing few of the 80+ rows.
--NOTE: For all rows in tab2, coldate3-13=coldate2.

drop table tab2 purge;
create table tab2(cola number, colb number, coldate2 date, coldate3 date);
insert into tab2 values(10,1,to_date('02/04/2018','MM/DD/YYYY'),to_date('02/17/2018','MM/DD/YYYY'));
insert into tab2 values(10,1,to_date('06/10/2018','MM/DD/YYYY'),to_date('06/23/2018','MM/DD/YYYY'));
insert into tab2 values(10,2,to_date('06/24/2018','MM/DD/YYYY'),to_date('07/07/2018','MM/DD/YYYY'));
insert into tab2 values(10,2,to_date('08/05/2018','MM/DD/YYYY'),to_date('08/18/2018','MM/DD/YYYY'));
insert into tab2 values(10,3,to_date('09/16/2018','MM/DD/YYYY'),to_date('09/29/2018','MM/DD/YYYY'));
commit;

drop table tab6 purge;
create table tab6(colb number, colval char(3));
insert into tab6 values(1,'Abc');
insert into tab6 values(2,'Def');
insert into tab6 values(3,'Ghi');
commit;
*/

Result I am expecting (these are for the four rows in tab1):

cola   coldate_start   coldate_end   colval
10     06/05/2018      07/04/2018    Def
10     07/05/2018      08/04/2018    Def
10     08/05/2018      09/04/2018    Ghi
10     09/05/2018      12/31/2099    Ghi

Explanation:
--I am showing borderline cases so that you can understand better.
--For tab1, for the 1st row, the col_end_date(07/04/2018) falls in the 3rd row of tab2. So the colb value it'll get is 2.
--For tab1, the 2nd row, the col_end_date(08/04/2018) doesn't fall anywhere. As a result, it'll use the 4th row in tab2
which is the next highest range. So the colb value it'll get is 2.
--For tab 1, the 3rd row, the col_end_date(09/04/2018) doesn't fall anywhere. As a result, it'll use the 5th row in tab2
which is the next highest range. So the colb value it'll get is 3.
--For tab 1, the 4th row, the col_end_date is 12/31/2099. This is a special case. For this one, the coldate_start needs to be used.
--It'll look for a range in tab2 first. If it doesn't find one, it'll use the next highest range which is 09/16/2018-09/29/2018.
So the colb value it'll get is 3.

NOTE:
1) I spent a lot of time with another person and she/he gave me this requirement about the unique case for the last row.
2) I am not including data for tab3 and tab4 because they are within the subquery which is tab2. I am also not including data for tab5 because it is a LEFT OUTER JOIN with the driving table (tab1) and has no bearing on the problem I am facing. Please let me know if you want me to include them as well.
3) I tried my best to find any abnormalities in the data but so far this is all I have found. I am still researching.
4) If this doesn't solve the problem, I'll try to post the actual dates for the 4 rows and 80+ rows in tab1 and tab2.
>>the col_end_date is 12/31/2099. This is a special case

For this "special case",  will it ALWAYS be 12/31/2099?
Is that a hard-coded "special" value?
ASKER CERTIFIED SOLUTION
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of nQuote

ASKER

I am trying your solution out. I am trying to take chunks of it and seeing why the counts are coming different. Will let you know.
Avatar of nQuote

ASKER

slightwv, can you please tell me if this line is still correct:
           when coldate < min(coldate2) over() then 1
or should it be?
            when coldate < min(coldate2) over(partition by t1.cola)                  then 1
I think the one with partition by is what you need.

There have been so many variations, I cannot keep them all straight and tend to just copy one from above and tweak it.

Best to try and understand what the SQL is doing then when there is a copy/paste issue, you can fix it yourself.
Avatar of nQuote

ASKER

slightwv, thank you for your response. I am trying to find a solution in a different way. I'll post it soon. In the mean time, I have two questions. I was wondering if you can help me with these:

1) If have the following SQL (this is part of cursor) :
select max(Tbl2.Dt_Key) into vDt_Main_Key
  from Tbl1
 inner join Tbl2 on Tbl1.Dt_Key=Tbl2.Dt_Main_Key
 where Tbl1.ColA=vValA
     and Tbl2.Cal_Dt between vDtOne and vDtTwo;

select Loc_Col into vLoc_Col
  from Tbl1.ColA=vValA
   and Tbl1.Dt_Key=vDt_Main_Key;

I would like to combine them into one SQL using like PARTITION OVER...ORDER BY. As you can see, the driving table in both SQLs is Tbl1.

2) After I find the values in a cursor, I need to do a SELECT of the values. However, something like:
   select vValA, vDt_Main_Key, vLoc_Col from dual
  as the last line in a cursor (before closing the cursor) is giving an error. There'll be multiple rows. How do I get around this problem?

Really appreciate your help.
1) Not sure you can combine the two queries.  If the first query ALWAYS returns the same Loc_Col for ALL rows in the first query, then you don't need a window clause.  Just a simple max with a group by might work.

I cannot say for sure because, again, I don't know your tables or data.

2) I don't have any idea.  I don't know the larger piece of the code and what you are trying to do.

You mention closing a cursor so I assume you have PL/SQL code and some loop?  Are you sure you even need that?


To try to get you to understand, what if I tell you I was working on a Nuclear Reactor and accidentally pulled out a red and blue wire and asked you where I re-attach them?  Would you be able to tell me?
Avatar of nQuote

ASKER

slightwv, again thank you very much for your response. I really apolgize for putting you through this. Your help is enormously appreciated.

Please disregard my previous post (that'll probably be a last resort as it'll require a temp table). The SQL below is the one I am trying to resolve. I made some headway by using a different table but I am still stuck. I was given some requirement changes by someone. I put in two methods using which I am trying to resolve the issue. If you could read both methods first that would be great. If there is a simple solution to METHOD2 that would be great. If there is no solution in METHOD1 and no simple solution to METHOD2, I'll use a temp table in METHOD2.

METHOD 1:
---------
/*
drop table Tbl1 purge;
create table Tbl1  (ColKey number, Srart_Dt date, End_Dt date);
insert into Tbl1  values(10,to_date('06/05/2018','MM/DD/YYYY'),to_date('07/04/2018','MM/DD/YYYY'));
insert into Tbl1  values(10,to_date('07/05/2018','MM/DD/YYYY'),to_date('08/04/2018','MM/DD/YYYY'));
insert into Tbl1  values(10,to_date('08/05/2018','MM/DD/YYYY'),to_date('09/04/2018','MM/DD/YYYY'));
insert into Tbl1  values(10,to_date('09/05/2018','MM/DD/YYYY'),to_date('12/31/2099','MM/DD/YYYY'));
commit;

drop table Tbl2 purge;
create table Tbl2  (ColKey number, Dt_Key number, P_P_Yr number, P_P number, Loc_Key number);
insert into Tbl2  values(10, 51, 2019, 11, 123);
insert into Tbl2  values(10, 52, 2019, 12, 234);
insert into Tbl2  values(10, 53, 2019, 13, 435);
insert into Tbl2  values(10, 54, 2019, 14, 435);
insert into Tbl2  values(10, 55, 2019, 15, 456);
commit;

drop table Tbl3_Dt purge;
create table Tbl3_Dt (Dt_Main_Key number, Cal_Dt date);
insert into Tbl3_Dt values(51, to_date('07/05/2018','MM/DD/YYYY'));
insert into Tbl3_Dt values(52, to_date('08/05/2018','MM/DD/YYYY'));
insert into Tbl3_Dt values(53, to_date('09/05/2018','MM/DD/YYYY'));
insert into Tbl3_Dt values(54, to_date('10/05/2018','MM/DD/YYYY'));
insert into Tbl3_Dt values(55, to_date('11/05/2018','MM/DD/YYYY'));

drop table Tbl4 purge;
create table Tbl4 (Loc_Key number, Loc_Cd char(3));
insert into Tbl4 values(123,'Abc');
insert into Tbl4 values(234,'Def');
insert into Tbl4 values(435,'Ghi');
insert into Tbl4 values(436,'Jkl');
commit;
*/

with first_sql as
(
     select ColKey,Start_Dt,End_Dt,Dt_Key,Cal_Dt,Loc_Key
       from
          (
           select R1.ColKey,Start_Dt,End_Dt,Cal_Dt,Dt_Main_Key,Another_Dt_F,Loc_Key
             from
                 (
                  select Tbl1.ColKey,Tbl1.Start_Dt,Tbl1.End_Dt,Tbl3_Dt.Cal_Dt,Tbl3_Dt.Main_Key,Tbl2.Another_Dt_F
                    from Tbl1
         left outer join Tbl3_Dt
                      on (case when Tbl1.End_Dt <> '31-DEC-2099' then Tbl1.End_Dt else Tbl1.Start_Dt end) = Tbl3_Dt.Cal_Dt
                 ) R1
         left outer join Tbl2
                      on R1.Another_Dt_F = Tbl2.P_P_Yr || Tbl2.P_P
                     and R.ColKey = Tbl2.ColKey
           ) R2
       where (R2.Loc_Key is NULL or R2.Loc_Key < 1)
),
second_sql as
(
     select Tbl2.ColKey, max(Tbl3_Dt.Dt_Main_Key) As Dt_Main_Key
       from Tbl2
 inner join Tbl3_Dt on Tbl2.Dt_Key=Tbl3_Dt.Dt_Main_Key
 inner join first_sql on Tbl2.ColKey = second_sql.ColKey
      where Tbl3_Dt.Cal_Dt between first_sql.Start_Dt and first_sql.End_Dt
   group by Tbl2.ColKey
),
third_sql as
(
     select Tbl4.Loc_Cd
       from Tbl2
 inner join second_sql on Tbl2.ColKey = second_sql.ColKey
 inner join Tbl4 on Tbl2.Loc_Key = Tbl4.Loc_Key
      where Tbl2.ColKey = second_sql.ColKey
        and Tbl2.Dt_Key = second_sql.Dt_Main_Key
)
select *
  from third_sql

What I am trying to achieve is a SQL like this (obviously I can't figure it out). Loc_Cd calculation is the problem:
    select ColKey,Start_Dt,End_Dt,Loc_Cd
      from third_sql

Additional info:
The following SQL:

  select P_P_Yr,P_P,Loc_Key
    from Tbl2
   where ColKey=10
order by P_P_Yr desc, P_P desc

Returns something like this:

P_P_Yr   P_P   Loc_Key
2018     10     123
2018      9     234
2018      6     435
2018      5     436

The objective is that, for the missing ones, P_P = 7 and 8, the value for Loc_Cd in the final sql
should be the value based on the next highest P_P value that falls within the same date range. So it should be
the one for 6 which is 435 and the corresponding Loc_Cd value of Jkl.

METHOD 2:
---------
Here is the cursor method. I simplified it a lot. The problem I am facing is how to output data like a SELECT.

declare
       cursor myCursor is
              select ColKey,Dt_Key,P_P_Yr,P_P,Loc_Key
                from Tbl2;

r_myCursor myCursor%ROWTYPE;
vVal1 number;
vVal2 number;

begin
       open myCursor;
       loop
           fetch myCursor into r_myCursor;
           exit when myCursor%NOTFOUND;

           select Loc_Key into vVal1
             from Tbl2
            where ColKey=r_myCursor.ColKey;

           select P_P_Yr into vVal2
             from Tbl2
            where ColKey=r_myCursor.ColKey;

            select vVal1, vVal2 from dual;
       end loop;

       close myCursor;
end;

NOTE: The "select vVal, vVal2 from dual" line is failing. It is telling me that "an INTO clause is expected". I need to output 1000+ rows. I was trying to output one line from each iteration of the cursor but apparently that doesn't work. I can create a temp table and insert all rows into the temp table and then do a SELECT after closing the cursor. I was wondering if there is another way to achieve without creating a temp table? Like may be using varray or something similar?


>>as it'll require a temp table

Probably not.  You mentioned that in the original post and we are back around to it 2 weeks later.  I didn't see the need 2 weeks ago and I don't now.

>>If you could read both methods first that would be great.

I scanned the code you posted.  You keep changing what you are asking for.  Now we are up to 4 tables with all new column names.  No expected results from the sample data you provided.  

I don't see an attempt to incorporate my previous SQL into the latest SQL you posted.  I see a mix of CTEs (with) and nested sub-queries.  Choose one approach and stick with it.  I suggest CTEs like I have provided.  Sub-selects makes the SQL hard to read and understand.

It continues to access the same tables over and over.  I would look to see if you REALLY need that.  If you look at what I provided, it doesn't.  Each CTE further refines the data but doesn't go against the base tables every time.

It appears that you want copy/paste/run SQL instead of understanding the SQL I've provided and learn how to adapt it to your requirements.  I'll never understand your tables, data or requirements at the level needed to get you where you need to be.

I have provided a solution that appears to work for the requirements you gave us.  If you understand how it is doing what it does, you should be able to apply the same concepts to your real-world requirements.

I suggest you take a few minutes and read:
https://www.experts-exchange.com/articles/9869/How-to-Unravel-a-Tricky-Query.html

I have no idea what Method 2 is supposed to do other than be the incorrect way to do whatever it is trying to do.  I recommend you forget you've ever thought about it instead of trying to fix it.
Avatar of nQuote

ASKER

slighwv, I really cannot appreciate you enough and my deepest apologies. METHOD 2 is working for my solution (since the number of rows is 3k+, using a cursor is acceptable). It is just that a cursor doesn't allow a SELECT of one row at a time so I have to insert into a temporary table and then SELECT all rows together from the temporary table.

The reason requirements are changing is because somebody is providing them to me.

I'll look at what you posted and follow what you said. Will keep you posted.
A select statement uses a cursor to return the results.  Your method2 appears it can be accomplished in a single select so you shouldn't need your own pl/sql block and a temp table to return the results of a select.  Let the SQL engine do what it does...
Avatar of nQuote

ASKER

slightvw, thank you very much for your help. I was able to resolve my issue using your CTE technique. I really cannot thank you enough. You put in an enormous effort to help me. You didn't have to but you did by taking your valuable time.
No problem.  Happy to help!