Solved

Best way to create an index for a triple union (i.e., how do views work ?)

Posted on 2014-02-08
16
496 Views
Last Modified: 2014-02-12
I have a medium complex query (from one table) that I need to union two times, and I'm looking for the best way to come up with the indexes. I may need to create a view of the entire thing or make a view of each of the three queries and then union these three views

I am limited because of Sql Server-SSIS that I can't have an expression greater than 4,000 chars. This seems really goofy and it's tripped me up on several occasions, requiring significant hoop-jumping to get around this limitation.

I think the easiest thing to do is create one view:
 create view my_big_view as
select *
from
(
select /*+ parallel(pd,8) */ [this and that], sum(paid_amt) - sum(subsidy_amt) as tot_dollars
  from medpartd.phar_trans_det pd
  where pd.aud_srce_sys_cd = '212'
   and pd.oracle_gl_prcs_dt is null
   and pd.lics_copay_ctgy in ('1', '2', '3', '4')   
  group by [this and that]
--
union
--
select /*+ parallel(pd,8) */ [this and that], sum(subsidy_amt)
  where pd.aud_srce_sys_cd = '212'
   and pd.oracle_gl_prcs_dt is null
   and pd.lics_copay_ctgy in ('1', '2', '3', '4')   
  group by [this and that]
--
union
--
select /*+ parallel(pd,8) */ [this and that], sum(paid_amt)
  from medpartd.phar_trans_det pd
  where pd.aud_srce_sys_cd = '212'
   and pd.oracle_gl_prcs_dt is null
   and nvl(pd.lics_copay_ctgy,'0') =  '0'   
  group by [this and that]
)

Open in new window


then I just say
select * from my_big_view
where adjdn_dt < '1-feb-2014'

Open in new window


does that make sense ? If so, I was thinking of an index on
- aud_srce_sys_cd (there's only two, 140, 169)
- oracle_gl_prcs_dt (this is populated for 90% of the records)
- lics_copay_ctgy (values 0,1,2,3,4,5)

-- the above three are inside the view

- adjdn_dt - steady volume throughout the year

-- this last field is outside the view, but I retrieve it IN the view

so the view would not get much data, because oracle_gl_prcs_dt is populated for 90% of the records, so we'd only be pulling 10% records via the view

so I assume Oracle reads all of the data in the view, THEN it applies the final where clause. So would it make sense to have the final field (adjdn_dt) in the index as well ?

Note that I pull "adjdn_dt into the view, so hopefully that when the actual where clause ("where  adjdn_dt < '1-feb-2014')
- those this field (adjdn_dt) has already been pulled via the view (so Oracle has it at the ready to apply the filter to the cache)
0
Comment
Question by:Alaska Cowboy
  • 9
  • 7
16 Comments
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
Comment Utility
>> If so, I was thinking of an index on - aud_srce_sys_cd (there's only two, 140, 169)

For only two entries, I doubt an index would be useful.

The other two columns might be OK but a lot also depends on the amount of data in the tables and the amount of rows processed by the query.

I don't think the 'is null' will allow index use since null values are not indexed.

It's difficult to know what the proper indexes should be on a Q&A site.  Too many factors.

I would just experiment on a test system to see what works and what doesn't.


I'm also not sure I see the need for the unions.  From the SUMs are you sure it isn't possible to obtain these values without the unions?

Can you provide some sample data and expected results?
0
 
LVL 1

Author Comment

by:Alaska Cowboy
Comment Utility
>>For only two entries, I doubt an index would be useful.
- good point, but I was thinking it would quickly cut the search in half.

>>I don't think the 'is null' will allow index use since null values are not indexed.
- also good point, but when it's null that represents 10% of the data, so that's how I figure would be the best way to get my data. Maybe I can populate it with '1-jan-1980' on all new records, then my search looks for '1-jan-1980'

>>It's difficult to know what the proper indexes should be on a Q&A site.  Too many factors.
- good point. One think I'm looking for is how the union works, then if I say
select *
from
(
select * from table_a where [this and that] 
union
select * from table_a where [this other thing and that other thing]
union
select * from table_a where [a third thing and a fourth thing]
)
where paid_dt < paid_dt_parameter

Open in new window


so I assume it gathers all the data into memory and then applies the paid_dt filter ?
0
 
LVL 1

Author Comment

by:Alaska Cowboy
Comment Utility
side but related question, if you don't mind . . .

in the query for this post, this logic works fine:
- sum(pd.tot_bill_paid_amt) - sum(pd.lics_amt) as Paid_Amt,

but this logic returns an Oracle error:
      sum(case 
             when nvl(pd.lics_copay_ctgy,'x') in ('1', '2', '3', '4') then nvl(pd.tot_bill_paid_amt,0) - nvl(pd.lics_amt,0)
             else nvl(pd.tot_bill_paid_amt,0)
          end) as Paid_Amt

Open in new window


it says "Oracle parallel query error - invalid number"
Oracle parallel error
Here's the full query which works:
--- testing
select /*+ parallel(pd,8) */ '121' as Entity, 
      case 
           when nvl(pd.lics_copay_ctgy,'x') in ('1', '2', '3', '4') then
                case substr(pd.pbp_id,1,3)
                     when '003' then 
                           case 
                               when nvl(pd.dedb_stus_indc_2,'x') in ('C5','A4') then  '531107'
                               when nvl(pd.dedb_stus_indc_2,'x') in ('C6','B4') then  '531104'
                               else '531103'
                           end
                     else -- pbp = 002
                           case 
                              when nvl(pd.dedb_stus_indc_2,'x') in ('C6','B4') then  '531102'
                              else '531101'
                           end                          
                end
           else 
                case substr(pd.pbp_id,1,3)
                     when '003' then 
                           case 
                               when nvl(pd.dedb_stus_indc_2,'x') in ('C5','A4') then  '531107'
                               when nvl(pd.dedb_stus_indc_2,'x') in ('C6','B4') then  '531104'
                               else '531103'
                           end
                     else -- pbp = 002
                           case 
                              when nvl(pd.dedb_stus_indc_2,'x') in ('C6','B4') then  '531102'
                              else '531101'
                           end                          
                end             
      end as Account, 
      '00000' as Cost_Center, '0000' as Cost_Pool, '0000' as Site, '100' as Risk, '070' as Product, '250' as Market, 
      rpad('000',32,'0') as Operations, rpad('000',35,'0') as Project, nvl(rtrim(pd.cja_cd),'0995') as CJA_Cd, '0000' as Future1, '0000' as Future2,
      '0000' as GL_Source, -- need to verify !!!!
      to_char(sysdate,'yyyymmdd') as Process_Dt,  
      sum(pd.tot_bill_paid_amt) - sum(pd.lics_amt) as Paid_Amt,       
      'Rx PBM Claims MEDD' as File_Type
  from medpartd.phar_trans_det pd
  where pd.aud_srce_sys_cd = '212'
   and pd.oracle_gl_prcs_dt is null
  group by 
      case 
           when nvl(pd.lics_copay_ctgy,'x') in ('1', '2', '3', '4') then
                case substr(pd.pbp_id,1,3)
                     when '003' then 
                           case 
                               when nvl(pd.dedb_stus_indc_2,'x') in ('C5','A4') then  '531107'
                               when nvl(pd.dedb_stus_indc_2,'x') in ('C6','B4') then  '531104'
                               else '531103'
                           end
                     else -- pbp = 002
                           case 
                              when nvl(pd.dedb_stus_indc_2,'x') in ('C6','B4') then  '531102'
                              else '531101'
                           end                          
                end
           else 
                case substr(pd.pbp_id,1,3)
                     when '003' then 
                           case 
                               when nvl(pd.dedb_stus_indc_2,'x') in ('C5','A4') then  '531107'
                               when nvl(pd.dedb_stus_indc_2,'x') in ('C6','B4') then  '531104'
                               else '531103'
                           end
                     else -- pbp = 002
                           case 
                              when nvl(pd.dedb_stus_indc_2,'x') in ('C6','B4') then  '531102'
                              else '531101'
                           end                          
                end             
      end, 
      nvl(rtrim(pd.cja_cd),'0995'), to_char(sysdate,'yyyymmdd')      

Open in new window

0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
Comment Utility
Please post some same data and expected results.  I'm still thinking you don't need the union and to access the table three times.

You might be able to get away with the data warehousing window functions and only access the table once.

I need to see actual data and your results from that data before I can say for sure

>>so I assume it gathers all the data into memory and then applies the paid_dt filter ?

The optimizer is 'magic'.  It makes it's own determination on how to attack the query.

Generating the plan helps show you how it does what it does.

explain plan for
select ...--your query

Then to see it:
select * from table(dbms_xplan.display);

>>but I was thinking it would quickly cut the search in half.

If the optimizer thinks it will return something around 20% of the rows, it will opt for a full table scan.

The reason being:  The amoujnt of index blocks it needs to read to get the list of the row id's, then the amount of table blocks to retrieve those rows.

>>also good point, but when it's null that represents 10% of the data, so that's how I figure would be the best way to get my data.

That is fine but I'm pretty sure the "and pd.oracle_gl_prcs_dt is null" cannot use an index.

The execution plan above will confirm.

>>it says "Oracle parallel query error - invalid number"

Let's get the main question answered then we can address this error.
0
 
LVL 76

Accepted Solution

by:
slightwv (䄆 Netminder) earned 250 total points
Comment Utility
OK, I had some time to kill and decided to mock up my own based on your original query.

In doing so, I noticed another issue:  You have no order by on the outer select and no pseudo-column on the unions to force the order.

Without an order by, there is no way to ensure the order.

Here is what I mean by a pseudo column (I reversed the data for grins):
select col1 from (
select 2 sortcol, 'World' col1 from dual
union
select 1 sortcol, 'Hello' col1 from dual
)
order by sortCol

Open in new window


OK, back to your question.

I didn't add the pseudo sort columns since you didn't have them.

If you need each value on a different row, it can still be done but complicates things a little.

If you just need the values, you can remove the unions from my revised query.

Here is a mock-up based on your columns and query above.

Please add to it if necessary to provide a better model.

I added the 'set autotrace' for sqlplus to show not only the results but the execution plan and stats.

drop table tab1 purge;
create table tab1(
	col1 char(1),
	paid_amt number,
	subsidy_amt number,
	aud_srce_sys_cd varchar2(5),
	oracle_gl_prcs_dt char(1),
	lics_copay_ctgy char(1)
);

insert into tab1 values('a',4,1,'212', null,'1');
insert into tab1 values('b',4,1,'212', null,null);
commit;

set autotrace on

--your query returns 3 rows
select col1, sum(paid_amt) - sum(subsidy_amt) as tot_dollars
  from tab1
  where aud_srce_sys_cd = '212'
   	and oracle_gl_prcs_dt is null
   	and lics_copay_ctgy in ('1', '2', '3', '4')   
  group by col1
--
union
select col1, sum(subsidy_amt)
  from tab1
  where aud_srce_sys_cd = '212'
   and oracle_gl_prcs_dt is null
   and lics_copay_ctgy in ('1', '2', '3', '4')   
  group by col1
--
union
--
select col1, sum(paid_amt)
  from tab1
  where aud_srce_sys_cd = '212'
 and oracle_gl_prcs_dt is null
   and nvl(lics_copay_ctgy,'0') =  '0'   
  group by col1
/


--what I was thinking, can be 'better' without the unions.
with mydata as(
select col1,
	sum(paid_amt_1_thru_4) paid_1_4_sum,
	sum(subsidy_amt_1_thru_4) sub_1_4_sum,
	sum(paid_amt_0_copay) sum_copay
from
(
select col1,
	case when lics_copay_ctgy in ('1', '2', '3', '4') then paid_amt end paid_amt_1_thru_4,
	case when lics_copay_ctgy in ('1', '2', '3', '4') then subsidy_amt end subsidy_amt_1_thru_4,
	case when nvl(lics_copay_ctgy,'0') =  '0' then paid_amt end paid_amt_0_copay
from tab1
  where aud_srce_sys_cd = '212' and
	oracle_gl_prcs_dt is null
) group by col1
)
select * from
(
select col1, paid_1_4_sum - sub_1_4_sum myresult from mydata
union
select col1, sub_1_4_sum from mydata
union
select col1, sum_copay from mydata
)
where myresult is not null
/

set autotrace off

Open in new window

0
 
LVL 1

Author Comment

by:Alaska Cowboy
Comment Utility
slightwv, thanks for the review and analysis, I will look at this in detail on Monday.
0
 
LVL 1

Author Comment

by:Alaska Cowboy
Comment Utility
slightwv, thank you so much ! this is a great help, as I have never done "with" before - so that's new, although I have seen others use it, I just never investigated it. I used your model to test, and got the results i am looking for (using the test data).

I made a slight modification, and an extra row of data:

insert into tab1 values('c',10,2,'212',null, '2');

and here's the query, the result is this:
CTGY	MYRESULT
LICS	      3
LICS	       11
non-LICS	  4

with mydata as(
select case 
          when nvl(lics_copay_ctgy,'x') in ('1', '2', '3', '4') then 'LICS'
          else 'non-LICS'
       end as lics_copay_ctgy,
	sum(paid_amt_1_thru_4) paid_1_4_sum,
	sum(subsidy_amt_1_thru_4) sub_1_4_sum,
	sum(paid_amt_0_copay) sum_copay
from
(
select lics_copay_ctgy,
	case when lics_copay_ctgy in ('1', '2', '3', '4') then paid_amt end paid_amt_1_thru_4,
	case when lics_copay_ctgy in ('1', '2', '3', '4') then subsidy_amt end subsidy_amt_1_thru_4,
	case when nvl(lics_copay_ctgy,'0') =  '0' then paid_amt end paid_amt_0_copay
from tab1
  where aud_srce_sys_cd = '212' and
	oracle_gl_prcs_dt is null
) group by case 
              when nvl(lics_copay_ctgy,'x') in ('1', '2', '3', '4') then 'LICS'
              else 'non-LICS'
           end
)
select * from
(
select lics_copay_ctgy, paid_1_4_sum - sub_1_4_sum myresult from mydata
union
select lics_copay_ctgy, sub_1_4_sum from mydata
union
select lics_copay_ctgy, sum_copay from mydata
)
where myresult is not null

Open in new window

0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
Comment Utility
So, do we have a winner or do you need to continue working on this?

Does the WITH that I posted work better than your original?

Do you need the 3 rows of output?  If not, we can probably make it even more efficient by removing the unions.
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 1

Author Comment

by:Alaska Cowboy
Comment Utility
yes, we have a winner ! I do need the three rows so it works perfectly. I will close out shortly. Thanks again !
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
Comment Utility
Cool.  Don't forget about the ORDER issue I mentioned above.

As far as the indexes go, generate the execution plan and look for full table scans on larger tables.

I'm not sure what "nvl(lics_copay_ctgy,'x')" is supposed to do.  I don't see where it would change the output of the case statement.

If you have an index on lics_copay_ctgy the NVL will make it unusable.  If you really need that, you can create a function-based index.
0
 
LVL 1

Author Comment

by:Alaska Cowboy
Comment Utility
nvl(lics_copy_ctgy,'x') handles the situation where lics_copy_ctgy = null, and when that happens, the comparison goes wacky, so I translate null to x to do the comparison in the case statement.
0
 
LVL 1

Author Closing Comment

by:Alaska Cowboy
Comment Utility
really great help !
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
Comment Utility
>>handles the situation where lics_copy_ctgy = null

On an '=' I can see it but you are using IN.

Still don't see where it does anything:

drop table tab1 purge;
create table tab1(lics_copay_ctgy char(1));

insert into tab1 values('1');
insert into tab1 values(null);
insert into tab1 values('6');
commit;

select case when lics_copay_ctgy in ('1','2','3') then 'LICS' else 'non-LICS' end junk from tab1;
select case when nvl(lics_copay_ctgy,'x') in ('1','2','3') then 'LICS' else 'non-LICS' end junk from tab1;

Open in new window

0
 
LVL 1

Author Comment

by:Alaska Cowboy
Comment Utility
>>On an '=' I can see it but you are using IN.
- even with an IN, I thought Oracle couldn't handle comparing NULL to a value . . .

but I ran your test, and I guess I learned something else today . . . nice, thanks.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
Comment Utility
Oracle cannot perform an equal against a null because nothing is '=' to a null.

An IN list populated with values isn't trying to do 'value = null'.

You cannot do:  value in ('6',null,'5') and have a null value hit in the list.
0
 
LVL 1

Author Comment

by:Alaska Cowboy
Comment Utility
ok, excellent, thanks again !
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

How to Create User-Defined Aggregates in Oracle Before we begin creating these things, what are user-defined aggregates?  They are a feature introduced in Oracle 9i that allows a developer to create his or her own functions like "SUM", "AVG", and…
How to Unravel a Tricky Query Introduction If you browse through the Oracle zones or any of the other database-related zones you'll come across some complicated solutions and sometimes you'll just have to wonder how anyone came up with them.  …
Via a live example, show how to restore a database from backup after a simulated disk failure using RMAN.
This video explains what a user managed backup is and shows how to take one, providing a couple of simple example scripts.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now