How do I tune a sql query?

Below query is running for a long time. It usually runs 3 hours or so. Do I need to add more indexes? Do I need to add hints? Please advise how I tune this query.

delete from tmp_tlf_app_data_stg ttads
where exists (select session_key, hit_key, partition_day, name, value
              from tlf_app_data_fct tadf
              where upper(trim(ttads.session_key)) = upper(trim(tadf.session_key))
              and upper(trim(ttads.hit_key)) = upper(trim(tadf.hit_key))
              and trim(ttads.partition_day) = trim(tadf.partition_day)
              and upper(trim(ttads.name)) = upper(trim(tadf.name))
              and upper(trim(ttads.value)) = upper(trim(tadf.value)))


Just a note,

--tmp_tlf_app_data_stg table has more than 23 millions of records. It's a daily stage table
  and number of records varies everyday
--Composite index was created on tmp_tlf_app_data_stg (session_key, partition_day)
--tlf_app_data_fct table has more than 244.5 millions of records. This table will grow significantly
--Composite index was created on tlf_app_data_fct (session_key, partition_day) and a non-unique index
  was created on tlf_app_data_fct (partition_day)
--Not sure whether indexes are used in a above query, for example in a filter condition
  "where upper(trim(ttads.session_key)) = upper(trim(tadf.session_key))" was used but index was
  not built using upper(trim(session_key))
--I checked explain plain, it looks like it is doing full table scan
IT_ETLAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

sdstuberCommented:
do you need to do the UPPER(TRIM())   ?

if not, remove it.
your indexes simply don't apply to this query with those conditions


If it's required, then you'll need function based indexes to support it.
0
Guy Hengel [angelIII / a3]Billing EngineerCommented:
you CAN use function-based index, including the UPPER(TRIM()) part to support this.
but normally, with the columns indicated, the values should not require this handling.

if they are mixed case, you should think about getting them "uniformized" during the normal process, so they are all upper/lower case in the same way everywhere.

for the rest, you will need a (one single) index with eventually all the columns used in the WHERE clause, the most limiting fields first

"most limiting" means the one with the most distinct values
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
IT_ETLAuthor Commented:
"for the rest, you will need a (one single) index with eventually all the columns used in the WHERE clause, the most limiting fields first"

I can remove upper(trim()) part since transformation is used to trim and to covert data from lowercase to uppercase before populating data to stage and fact tables. A single field SESSION_KEY is mostly used in WHERE clause.

Let's say - SESSION_KEY and HIT_KEY fields are unique on  tmp_tlf_app_data_stg  table. But no primary key or unique index are defined.

Question - Should I create a single index on  tmp_tlf_app_data_stg  (SESSION_KEY) or should I create a composite index on  tmp_tlf_app_data_stg  (SESSION_KEY, HIT_KEY)?

Also, if I would have created index using upper(trim()) functions and then use this index field on WHERE clause, would it help.

For example,

create index idx_tmp_tlf_app_data_stg on tmp_tlf_app_data_stg (upper(trim(session_key)), upper(trim(hit_key)))

select *
from ..........
where upper(trim(ttads.session_key)) = upper(trim(tadf.session_key))
0
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

sdstuberCommented:
create index idx_tmp_tlf_app_data_stg on tmp_tlf_app_data_stg (upper(trim(session_key)))

select *
from ..........
where upper(trim(ttads.session_key)) = upper(trim(tadf.session_key))


this would be the ideal usage if you can't guarantee the data will already be uniform
note, I took out the hitkey because it's not part of your search criteria
0
sdstuberCommented:
also we're assuming  tlf_app_data_fct will be narrow search on  tmp_tlf_app_data_stg

if that's not the case, then an index won't help, and in fact, might hurt.

How many data blocks will be visited by the delete?
How many total data blocks are there?

This might help explain why an index may do more harm than good

http://www.experts-exchange.com/Database/Oracle/A_8393-Why-does-Oracle-ignore-my-index.html
0
PortletPaulfreelancerCommented:
tiny observation:

when using EXISTS you do not need to return multiple fields in the select clause

where exists (select session_key, hit_key, partition_day, name, value ...
I would simply suggest returning the constant 1

where exists (select 1 ...
0
IT_ETLAuthor Commented:
Thanks PortletPaul!

I have created following indexes and above query is running little faster but not as fast as I would like to see.

Index on stage table,

CREATE INDEX ADWSTG.IDX_TMP_TLF_APP_DATA_STG_01 ON ADWSTG.TMP_TLF_APP_DATA_STG (SESSION_KEY)

Indexes on fact table,

CREATE INDEX WEBRPT.IX_TLF_APP_DATA_FCT_01 ON WEBRPT.TLF_APP_DATA_FCT
(SESSION_KEY, PARTITION_DAY)

CREATE INDEX WEBRPT.IDX_TLF_APP_DATA_FCT_02 ON WEBRPT.TLF_APP_DATA_FCT
(SESSION_KEY)



Since HIT_KEY and PARTITION_DAY fields are also used in WHERE clause, so should I create indexes on these fields as well. Would it help?

On stage table,

CREATE INDEX ADWSTG.IDX_TMP_TLF_APP_DATA_STG_02 ON ADWSTG.TMP_TLF_APP_DATA_STG (HIT_KEY)

CREATE INDEX ADWSTG.IDX_TMP_TLF_APP_DATA_STG_03 ON ADWSTG.TMP_TLF_APP_DATA_STG (PARTITION_DAY)

On fact table,

CREATE INDEX WEBRPT.IDX_TLF_APP_DATA_FCT_03 ON WEBRPT.TLF_APP_DATA_FCT
(HIT_KEY)

CREATE INDEX WEBRPT.IDX_TLF_APP_DATA_FCT_04 ON WEBRPT.TLF_APP_DATA_FCT
(PARTITION_DAY)

Not sure creating more single indexes on both tables will help. Please advise.
0
sdstuberCommented:
are your still doing the upper() and trim()  ?  If so, these indexes aren't going to be of any use
any apparent performance gains are more likely due to caching from repeated attempts than these indexes which don't apply to the query as posted.

do you know how to use dbms_xplan?

explain plan for ----your select goes here----;

select * from table(dbms_xplan.display);

Open in new window



copy-paste the results here.

please don't post screen captures or other images of plans, the text plan from dbms_xplan is best
0
IT_ETLAuthor Commented:
I am not using upper(trim()) function to create indexes on stage and fact tables.

Index on stage table,

CREATE INDEX ADWSTG.IDX_TMP_TLF_APP_DATA_STG_01 ON ADWSTG.TMP_TLF_APP_DATA_STG (SESSION_KEY)

Indexes on fact table,

CREATE INDEX WEBRPT.IX_TLF_APP_DATA_FCT_01 ON WEBRPT.TLF_APP_DATA_FCT
(SESSION_KEY, PARTITION_DAY)

CREATE INDEX WEBRPT.IDX_TLF_APP_DATA_FCT_02 ON WEBRPT.TLF_APP_DATA_FCT
(SESSION_KEY)


SQL Statement,

delete from adwstg.tmp_tlf_app_data_stg ttads
where exists (select session_key, hit_key, partition_day, name, value
              from webrpt.tlf_app_data_fct tadf
              where ttads.session_key = tadf.session_key
              and ttads.hit_key = tadf.hit_key
              and ttads.partition_day = tadf.partition_day
              and ttads.name = tadf.name
              and ttads.value = tadf.value)

--tmp_tlf_app_data_stg table has more than 4 plus millions of records. Number of records
   count varies everyday
--tlf_app_data_fct table has more than 244.5 millions of records. This table will grow.

Explain plan for above statement,

DELETE STATEMENT  ALL_ROWS
Cost: 842,259  Bytes: 1,002,132,800  Cardinality: 5,010,664                    
    4 DELETE ADWSTG.TMP_TLF_APP_DATA_STG             
          3 HASH JOIN SEMI  Cost: 842,259  Bytes: 1,002,132,800  Cardinality: 5,010,664  

1  TABLE ACCESS FULL TABLE ADWSTG.TMP_TLF_APP_DATA_STG Cost: 14,996  Bytes: 501,066,400  Cardinality: 5,010,664  
2 TABLE ACCESS FULL TABLE WEBRPT.TLF_APP_DATA_FCT Cost: 260,661  Bytes: 8,737,752,300  Cardinality: 87,377,523
0
Guy Hengel [angelIII / a3]Billing EngineerCommented:
if your statment is giving you full table scans, you have eventually outdated statistics

is the fact table partitioned (on partition_day field) ?
if you want to compare the stage table to exactly 1 partition, you might consider specifying that?

for example, I have partitioned tables with partition names like this:
from webrpt.tlf_app_data_fct tadf partition("2013_10_01")
0
IT_ETLAuthor Commented:
"is the fact table partitioned (on partition_day field) ?
if you want to compare the stage table to exactly 1 partition, you might consider specifying that?"

Table is not partitioned.

Understood your comment................the problem is I have to compare stage table (stage table can have single/multiple partition days) with all the records in fact table not just 1 partition in a fact table. The purpose of doing this is to delete duplicate records.
0
sdstuberCommented:
selecting values in your exists clause doesn't really add value, the optimizer should be excluding those.  If not, or simply to reduce the clutter, just select NULL in the exists.

DELETE FROM adwstg.tmp_tlf_app_data_stg ttads
      WHERE EXISTS
                (SELECT NULL
                   FROM webrpt.tlf_app_data_fct tadf
                  WHERE ttads.session_key = tadf.session_key
                    AND ttads.hit_key = tadf.hit_key
                    AND ttads.partition_day = tadf.partition_day
                    AND ttads.name = tadf.name
                    AND ttads.VALUE = tadf.VALUE);

using an index like this would be ideal

CREATE INDEX webrpt.ix_tlf_app_data_fct_stg_delete
    ON webrpt.tlf_app_data_fct(
        session_key,
        hit_key,
        partition_day,
        name,
        VALUE
    );

creating lots of single-column indexes won't help much because most of the time you'll only be able to use one index.  It is possible for the optimizer to join indexes prior to accessing a table, but when it does that, it's because it can't find the index it really needs.
That's an expensive operation too.
0
IT_ETLAuthor Commented:
Thank you all for posting your comments and suggestions.
0
sdstuberCommented:
why the penalty grade?
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Oracle Database

From novice to tech pro — start learning today.