help needed with index strategy for nasty Oracle query

I have a fairly nasty query that works great, but I think I will ultimately need indexes.

The query is the join of one nasty query and one medium nasty query. Both tables have about 6M rows, growing at 1M per year. it joins insurance members to their claims.

With about 5M records per table, however, the query still comes back in 55s, reliably.

The underlying analytics view is this

select * from
          (select /*+ parallel(b,8) */ b.*, row_number()
          over (partition by ctrt_no, pbp_id, hicn, servc_prov_id, servc_prov_id_qlfr, prsctn_servc_refnc_no, dos, fill_nbr, dspsg_stus_cd order by pde_trans_dt desc, pde_det_skey desc ) rn
                from medpartd.cgdp_rpt_pde b
                where not (to_char(pde_trans_dt,'mmdd') >= '0501'
                           and to_char(b.pde_trans_dt,'yyyy') > to_char(b.dos,'yyyy'))) curr
          where rn = 1
         )

The query (see below) only uses one parameter, the year of the DOS, so to_char('yyyy').

So I have no idea how Oracle builds this view. And even though the query takes 55s reliably, it just makes me nervous with two fairly big tables and a monster query doing two full table scans.

I was thinking of creating an index on DOS
- that gets me to a group of 1M rows which seems to be manageable.

should the index be DOS or actually "to_char(DOS,'yyyy') ?

If I do an index on DOS, then it wouldn't really matter about other fields, since I'm getting data in every day of the year.

But if I make the index YYYY, then I could append two other fields to the index
- to_char(DOS,'YYYY')
- PBP_ID (there are only 4 values, 001, 002, 003, 004
- HICN (this is like an SSN, and one of the main ways we group data).

Notice in the analytics view above I'm doing an order by on pde_trans_dt, pde_det_skey
/************ big query *****************/
select dos_year, pbp_id, ranges, count(*) as num_members, sum(memb_months) as Member_Months,
            sum(tot_num_scripts) as tot_scripts, sum(tot_allow_dollars) as tot_allow,
            round(sum(tot_allow_dollars)/count(*),2) as Avg_Allow,
            round(sum(member_paid)/count(*),2) as Avg_memb_paid,
            round(sum(cost_sharing)/count(*),2) as Avg_cost_sharing, 
            round(sum(suppl_cost_sharing)/count(*),2) as Avg_suppl_cost_sharing, 
            round(sum(reimb_lics)/count(*),2) as Avg_LICS_Reimb, 
            round(sum(reimb_fed_reins)/count(*),2) as Avg_fed_reins_reimb,            
            sum(part_d_scdndry) as Plus_Part_D_Scndry  
     from
     (
      select /*+ parallel(td,4) parallel(mm,4) */ 
             td.dos_year, td.pbp_id, mm.Memb_months, td.tot_num_scripts, tot_allow_dollars, member_paid,
               cost_sharing, suppl_cost_sharing, reimb_lics, reimb_fed_reins, part_d_scdndry,
            case when (td.tot_allow_dollars > 0 and td.tot_allow_dollars < &range1_high) then 'range1'
                    when (td.tot_allow_dollars >= &range1_high and td.tot_allow_dollars < &range2_high) then 'range2'
                    when (td.tot_allow_dollars >= &range2_high and td.tot_allow_dollars < &range3_high) then 'range3'
                    when td.tot_allow_dollars >= &range3_high then 'range4'
               end ranges
              from (
              -- start of HICN level here
              select dos_year, pbp_id, hicn, sum(tot_scripts) as tot_num_scripts,
                    sum(a.tot_allow_dollars) as tot_allow_dollars,
                    sum(a.member_paid_amt) as member_paid,
                    sum(a.Avg_Cost_Sharing) as Cost_Sharing,
                    sum(a.Suppl_Cost_Sharing) as Suppl_Cost_Sharing,
                    sum(a.Reimb_LICS) as Reimb_LICS,
                    sum(a.Reimb_Fed_Reins) as Reimb_Fed_Reins,
                    sum(case a.prcg_exctn_cd
                         when 'M' then Part_D_Scdndry
                         else 0
                    end) as Part_D_Scdndry
             from
             (
             select cu.pbp_id, to_char(cu.dos,'yyyy') as DOS_Year, cu.hicn, cu.prcg_exctn_cd, cu.catas_cov_cd, cu.adjtmt_deln_cd,
             sum(nvl(cu.ingrdt_cost_paid_amt,0) + nvl(cu.dspsg_fee_paid_amt,0) + nvl(cu.tot_atbrtd_sales_tax_amt,0) + nvl(cu.vccn_admn_fee_amt,0)) as tot_allow_dollars,
             sum(case
                when nvl(cu.ingrdt_cost_paid_amt,0) + nvl(cu.dspsg_fee_paid_amt,0)  + nvl(cu.tot_atbrtd_sales_tax_amt,0)  + nvl(cu.vccn_admn_fee_amt,0)  = 0 then 0
                else 1
             end) as tot_scripts,
             case nvl(cu.prcg_exctn_cd,'x')
                  when 'M' then sum(nvl(cu.covr_d_plan_paid_amt,0) + nvl(cu.non_covr_plan_paid_amt,0) + nvl(cu.low_inc_cost_sharg_subsdy_amt,0))
                  else 0
             end as Part_D_Scdndry,
             sum(nvl(cu.covr_d_plan_paid_amt,0) + nvl(cu.non_covr_plan_paid_amt,0) + nvl(cu.low_inc_cost_sharg_subsdy_amt,0)) as member_paid_amt,
             sum(nvl(cu.patn_pay_amt,0) + nvl(cu.oth_troop_amt,0) + nvl(cu.patn_liab_redcn_oth_payer_amt,0)) as Avg_Cost_Sharing,
             sum(nvl(cu.non_covr_plan_paid_amt,0)) as Suppl_Cost_Sharing,
             sum(nvl(cu.low_inc_cost_sharg_subsdy_amt,0)) as Reimb_LICS,
             case
                 when nvl(cu.catas_cov_cd,'x') in ('A','C') then round((.8) * sum(nvl(cu.above_oop_tshld_gros_drug_cost,0)),2)
                 else 0
             end as Reimb_Fed_Reins,
             case ERR_CNT
                  when 0 then 'ok'
                  else
                     case
                       when ERR_1_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_2_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_3_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_4_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_5_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_6_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_7_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_8_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_9_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_10_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762')
                            then 'exclude'
                       else 'ok'
                     end 
             end as P2P_Code             
        from 
        (select "PDE_DET_SKEY","AUD_INSRT_ID","AUD_INSRT_TMSTP","AUD_UPDT_ID","AUD_UPDT_TMSTP","AUD_HASH_CD","AUD_SRCE_SYS_CD","ETL_LOAD_FILE_ID","REC_ID","SEQ_NO","CTRT_NO","PBP_ID","CLM_CNTL_NBR","PDE_TRANS_DT","HICN","CRDHLDR_ID","PATN_DOB","PATN_GNDR_CD","DOS","PAID_DT","PRSCTN_SERVC_REFNC_NO","PRODT_SERVC_ID","SERVC_PROV_ID_QLFR","SERVC_PROV_ID","FILL_NBR","DSPSG_STUS_CD","CPD_CD","DSPS_AS_WRTN_PRODT_SELN_CD","DSPSD_QTY","DAYS_SUPL_QTY","PRSBR_ID_QLFR","PRSBR_ID","DRUG_COV_STUS_CD","ADJTMT_DELN_CD","NON_STD_FRMT_CD","PRCG_EXCTN_CD","CATAS_COV_CD","INGRDT_COST_PAID_AMT","DSPSG_FEE_PAID_AMT","TOT_ATBRTD_SALES_TAX_AMT","BELOW_OOP_TSHLD_GROS_DRUG_COST","ABOVE_OOP_TSHLD_GROS_DRUG_COST","PATN_PAY_AMT","OTH_TROOP_AMT","LOW_INC_COST_SHARG_SUBSDY_AMT","PATN_LIAB_REDCN_OTH_PAYER_AMT","COVR_D_PLAN_PAID_AMT","NON_COVR_PLAN_PAID_AMT","POS_ESTM_REBT_AMT","VCCN_ADMN_FEE_AMT","PRSCTN_ORIGN_CD","ORIG_CLM_RECVD_DT","CLM_ADJDN_BEGAN_TMSTP","TOT_GROS_COV_DRUG_COST_ACM_AMT","TRUE_OOP_ACMLTR_AMT","BRND_GENRC_CD","BEG_BENE_PHSE_INDC","ENDG_BENE_PHSE_INDC","RPTD_GAP_DISC_AMT","TIER","FORMAY_CD","GAP_DISC_PLAN_OVERR_CD","CMS_CALCD_GAP_DISC_AMT","PBP_OF_REC","ALTN_SERVC_PROV_ID_QLFR","ORIG_SBMTG_CTRT","P2P_CTRT_OF_REC","CORRED_HICN_ID","ERR_CNT","ERR_1_QLFR","ERR_2_QLFR","ERR_3_QLFR","ERR_4_QLFR","ERR_5_QLFR","ERR_6_QLFR","ERR_7_QLFR","ERR_8_QLFR","ERR_9_QLFR","ERR_10_QLFR","EXCLN_REAS_CD","SUBGRP_ID","ALTN_SERVC_PROV_ID","PDE_BODY_HDR_SKEY","RN" from
          (select /*+ parallel(b,8) */ b.*, row_number()
          over (partition by ctrt_no, pbp_id, hicn, servc_prov_id, servc_prov_id_qlfr, prsctn_servc_refnc_no, dos, fill_nbr, dspsg_stus_cd order by pde_trans_dt desc, pde_det_skey desc ) rn
                from medpartd.cgdp_rpt_pde b
                where not (to_char(pde_trans_dt,'mmdd') >= '0501'
                           and to_char(b.pde_trans_dt,'yyyy') > to_char(b.dos,'yyyy'))) curr
          where rn = 1
         ) cu  
        group by cu.hicn, cu.prcg_exctn_cd, cu.catas_cov_cd, to_char(cu.dos,'yyyy'), pbp_id, cu.adjtmt_deln_cd,
              case ERR_CNT
                  when 0 then 'ok'
                  else
                     case
                       when ERR_1_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_2_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_3_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_4_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_5_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_6_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_7_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_8_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_9_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762') or
                            ERR_10_QLFR in ('708', '694', '700', '705', '706', '707', '738', '740', '762')
                            then 'exclude'
                       else 'ok'
                     end 
              end) a 
        where dos_year = '&year_yyyy'         
         and nvl(adjtmt_deln_cd,'x') <> 'D' 
         and p2p_code = 'ok'
              group by hicn, dos_year, pbp_id -- end of query to get data by HICN
              ) td
                  left join medpartd.vw_member_months_b4_may mm
                        on mm.hic_nbr = td.hicn
                        and mm.Effective_year = td.dos_year
                        and mm.Plan_id = td.pbp_id
             where dos_year = '&year_yyyy'                       
      ) a      
      group by dos_year, pbp_id, ranges 
      order by pbp_id, ranges

Open in new window

LVL 1
Alaska CowboyAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Alaska CowboyAuthor Commented:
Explain plan below
Description                          Owner       Object Name    Cost           Bytes Card.                                    
SELECT STATEMENT, GOAL = CHOOSE			                26946	3	471
 PX COORDINATOR					
  PX SEND QC (ORDER)	              SYS	:TQ10006	26946	3	471
   SORT GROUP BY			                        26946	3	471
    PX RECEIVE			                                26946	3	471
     PX SEND RANGE	             SYS	:TQ10005	26946	3	471
      SORT GROUP BY			                        26946	3	471
       HASH JOIN RIGHT OUTER			                26945	5926	930382
        PX RECEIVE			                        11958	248	8184
         PX SEND BROADCAST	     SYS	:TQ10003	11958	248	8184
          VIEW	                     MEDPARTD	VW_MEMBER_MONTHS_B4_MAY	11958	248	8184
           HASH GROUP BY		                  	11958	248	13888
            PX RECEIVE			                        11957	248	13888
             PX SEND HASH	      SYS	:TQ10001	11957	248	13888
              PX BLOCK ITERATOR		                 	11957	248	13888
               TABLE ACCESS FULL	MEDPARTD	CGDP_RPT_MTHLY_MEMBSHP_DET	11957	248	13888
        VIEW	BPECK		                                14986	5926	734824
         HASH GROUP BY			                        14986	5926	965938
          PX RECEIVE			                        14986	5926	965938
           PX SEND HASH	               SYS	 :TQ10004	14986	5926	965938
            HASH GROUP BY			                14986	5926	965938
             VIEW	                BPECK	         	14986	5926	965938
              HASH GROUP BY			                 14986	5926	1262238
               PX RECEIVE			                14986	5926	1262238
                PX SEND HASH	       SYS	:TQ10002	14986	5926	1262238
                 HASH GROUP BY		                 	14986	5926	1262238
                  VIEW	BPECK		                        14985	5926	1262238
                   WINDOW SORT PUSHED RANK			14985	5926	835566
                    PX RECEIVE			                14985	5926	835566
                     PX SEND HASH	SYS	:TQ10000	14985	5926	835566
                      WINDOW CHILD PUSHED RANK			14985	5926	835566
                       PX BLOCK ITERATOR			14984	5926	835566
                        TABLE ACCESS FULL	MEDPARTD	CGDP_RPT_PDE	14984	5926	835566

Open in new window

0
Christoffer SwanströmPartnerCommented:
Without having understood all the details of your query I have the impression that there's no great need for optimization here. Full table scans are not inherently bad, in most cases where you need to access a large fraction of the total amount in a table a full table scan is actually faster than access using an index. If your query

select * from
          (select /*+ parallel(b,8) */ b.*, row_number()
          over (partition by ctrt_no, pbp_id, hicn, servc_prov_id, servc_prov_id_qlfr, prsctn_servc_refnc_no, dos, fill_nbr, dspsg_stus_cd order by pde_trans_dt desc, pde_det_skey desc ) rn
                from medpartd.cgdp_rpt_pde b
                where not (to_char(pde_trans_dt,'mmdd') >= '0501'
                           and to_char(b.pde_trans_dt,'yyyy') > to_char(b.dos,'yyyy'))) curr
          where rn = 1
         )

returns less than about 5% or the total rows in the table medpartd.cgdp_rpt_pde then you could consider adding some indexes. Otherwise you could think about partitiniong the table on e.g. dos or pde_trans_dt.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
slightwv (䄆 Netminder) Commented:
I second the thought that full table scans (FTS) are not always bad.  You know your data and cardinality.

I would break this large query into the smallest pieces and look at those first.  Then slowly add them back in until you have nothing else left to tune.

Keying in on the same query that tosse did.  You have to_char on the pde_trans_dt column.  Using a function on an indexed column will not allow the index to be used.

What indexes do you currently have on cgdp_rpt_pde?

If you are not licensed for partitioning maybe a Function Based Index if you cannot rewrite the SQL to not use TO_CHAR.
0
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

Alaska CowboyAuthor Commented:
tosse,

the query you reference returns about 75% of the table, but it has to sort the entire table to pull out the relevant data.

the analytics component takes (for example) three transactions for the same claim, then picks off the most recent one (based on pde_trans_dt and the tiebreaker is seqno). So it's reading the entire d.b.

the not (to_char(pde_trans_dt,'mmdd') >= '0501' and to_char(b.pde_trans_dt,'yyyy') > to_char(b.dos,'yyyy'))' component
throws out transactions received May 1 or after where the PDE_trans_dt is in the year after the DOS.

So my thought on to_char(DOS,'yyyy') as the index as this would allow Oracle to just go straight to one year's worth of data.

slightvw,
>>I would break this large query into the smallest pieces and look at those first.
- yes, that's what I've posted here, the smallest piece, the analytics component

>>Using a function on an indexed column will not allow the index to be used.
- but can't I have a "function index", e.g., to_char(DOS,'yyyy') and then if my query uses the same function the index would be ok ?

>>What indexes do you currently have on cgdp_rpt_pde?
- only one that's not a seqno or FK is
- HICN
- SUBGRP_ID
- DOS

which is used for another program and as you see in the Explain Plan I'm doing two FTS.

>>If you are not licensed for partitioning
- we can do partitioning so that's an option.

I wanted to do my homework before going to the DBA.

Thanks to both.
0
slightwv (䄆 Netminder) Commented:
>>the query you reference returns about 75% of the table

Oracle will likley not use an index even if one exists.  It will almost always be 'cheaper' to do a FTS.

>>we can do partitioning so that's an option.

I'm not an expert on partitioning so you will need to test it but even if you partition on date, you might not benefit by using TO_CHAR on the partition column unless you range partition on to_char(dos,'yyyy').

>>but can't I have a "function index", e.g., to_char(DOS,'yyyy') and then if my query uses the same function the index would be ok ?

You will need to test it.  If you will still hit 75% of the rows then I doubt the index will provide any performance gains.

Do you have a development machine where you can test different scenarios?


0
Alaska CowboyAuthor Commented:
slightwv, ok, thanks. we do have a dev machine to test out.

Like I said, right now it's coming back in 1 minute, reliably, but I am just nervous about it going to 10-20 minutes or longer, which would be unacceptable.

I will do some testing and post back.
0
LowfatspreadCommented:
it may help if you can re-write this bit to NOT USE the not

ie can you actually give it a between date range... or specify the limit e.g 20120430  ?
0
Alaska CowboyAuthor Commented:
Low,

the reason for the NOT is it is actually what's happening in the simplest form.

I have a sample of avoiding the NOT below, but I suppose there are other ways to accomplish it - mine is brute force.

Plus, I can't give it a date range, as the group of transactions received on or after May 1 (where DOS's in the prev year) should be thrown out, for each year that is being considered.

for example, a claim might have three adjustments, say

DOS       Amt         Trans Dt
12-05-2010     50.00        12-07-2010
12-05-2010     55.00        01-07-2011
12-05-2010     45.00        05-07-2011

and I DON'T want the record after May. This applies for multiple years, hence my logic:
not (to_char(pde_trans_dt,'mmdd') >= '0501'
                           and to_char(b.pde_trans_dt,'yyyy') > to_char(b.dos,'yyyy'))

to avoid the NOT, it would have to be
where (to_char(pde_trans_dt,'mmdd') <= '0430'
           and to_char(b.pde_trans_dt,'yyyy') > to_char(b.dos,'yyyy'))  <-- records received before May 1 for last year's DOS's
           or
           (to_char(b.pde_trans_dt,'yyyy') = to_char(b.dos,'yyyy')) <-- records where DOS.year = Trans_Dt.Year



0
LowfatspreadCommented:
in general NOT should be avoided as it makes it more difficult for an efficient plan to be formed...

however it may make it easier for the sql to be maintained/understood.... especially if a tablescan is likely to be the most efficient method anyway.
0
Alaska CowboyAuthor Commented:
Low, ok, I will keep this in mind as I continue to do performance assessment, thanks.
0
Alaska CowboyAuthor Commented:
thank you for the tips and discussion. I will add a few indexes and see what happens.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Oracle Database

From novice to tech pro — start learning today.