Link to home
Start Free TrialLog in
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]Flag for Luxembourg

asked on

tuning a query over DBLINK

Input:
  3 Servers, all with Oracle 9.2
  * PROD    --- live production server
  * STANDBY  --- standby database, updated every night by applying the logs, the rest of the day it is open as read_only
  * DW   --- a database that reports using data on the main database.

On PROD (and hence by STANDBY), there is a "big" table (10M rows), partitionned on a time field (creationtime).
+ index on some_field

I have a query that queries this table with a range condition on PROD/STANDBY:

select * from big_table where creationtime >= x and creationtime < y and some_field in ( list of values ) ;
returns in 8:35 minutes (0 rows)

I run the same query over DBLINK:
select * from big_table@PROD where creationtime >= x and creationtime < y and some_field in ( list of values ) ;
returns in 1:44:32 minutes (0 rows)

what could be the cause of this ?
what could be done to optimize this query over the dblink?

IMPORTANT NOTES:
* I cannot add any view, functions, MViews or anything else to the PROD database. It's a custom application database, I have no change permissions to it at all.
* I tried to add the following hints with no (positive) effect:
  - driving_site(big_table)  
  - parallel  
  - all_rows
  - index ( myindex )

Thanks

ASKER CERTIFIED SOLUTION
Avatar of pbocanegra
pbocanegra

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Guy Hengel [angelIII / a3]

ASKER

@pbocanegra:
tnsping only reflects the time to connect, not the query performance as far as I know. if I am wrong there, let me read some article, please.
anyhow:

>The SORT Area
in how far the sort area is considered if there is no ORDER BY or GROUP BY in the query?

> Temporary tablespace
has 6 GB of which 1.5 are free. should be fine so far.

@paquicuba:
I tried with both with and without (additional) index hint, no change.

the explain plan, when run on the PROD, uses the INDEX
the explain plan, when run on the DW, only shows FILTER on REMOTE table
Which user do you use in the DBLINK. Is it the same user that you use when running the query directly on PROD or a different user
same user/schema.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>Can you use following meothod in order to get explain plan?
nice information, good to know.

however, this output confirms that the good index (Index Range Scan) of the table is used.

>are the x & y in your query bind variables.
I use simple variables in the code, but for testing I use litteral values (both same execution time)
If the explain plan is same, then the most probably the problem is in the underlying network. Check the wait events for the session. Check specifically for the waits events like 'SQL*Net .....' wait events. If you are getting high values for these events then the problem would be within the underlying network.

Another test that you can run is
On DW system
start sql*plus and connect to the PROD instance using the same tnsnames entry that your DBLINK uses.
Run the query and check the performance.

If it is a pure network problem, then this query performance should also be slow.

Here a report of the wait events, seems to confirm your last guess goes into the right direction:

now, what part to check first, the SQL*Net seconds waiting (1/820) seconds in average, or rather the latch free timeouts for example?

Ma Avr 04                                                                        page
                                System-wide Wait Analysis
                                 for current wait events

                                                                    Average
Event                                 Total  Seconds        Total      Wait
Name                                  Waits  Waiting     Timeouts (in secs)
------------------------------ ------------ -------- ------------ ---------
SQL*Net more data from dblink    82,052,571  115,298            0      .000
db file sequential read          61,054,126   74,980            0      .000
SQL*Net message to dblink        56,003,325      560            0      .000
SQL*Net message from dblink      56,003,324  453,138            0      .010
db file scattered read           52,383,908   93,782            0      .000
direct path read                 11,853,067  417,496            0      .040
log file parallel write           3,022,317   51,753            0      .020
direct path write                 1,422,830  310,557            0      .220
log file sync                     1,319,468   36,425            7      .030
PX Deq: Table Q Normal            1,189,847   24,136        1,019      .020
latch free                          764,165   29,369      673,500      .040
control file sequential read        341,706      209            0      .000
control file parallel write         308,090   28,788            0      .090
PX Deq: Execute Reply               270,819    6,080          937      .020
buffer busy waits                   198,275   20,527          605      .100
PX Deq Credit: send blkd            142,000    5,506          511      .040
db file parallel write               89,905  124,229            0     1.380
free buffer waits                    89,485   86,982       86,507      .970
PX qref latch                        83,901    1,334       79,869      .020
direct path read (lob)               56,558      360            0      .010
PX Deq: Execution Msg                38,633    5,034        2,089      .130
wakeup time manager                  30,527  887,645       30,507    29.080
LGWR wait for redo copy              27,439      121        6,094      .000
PX Deq Credit: need buffer           12,511      747          140      .060
PX Deq: Table Q qref                 12,452      123           44      .010
rdbms ipc reply                      11,562    4,304        1,400      .370
enqueue                               9,742   24,246        8,021     2.490
write complete waits                  8,326    8,133        8,122      .980
SQL*Net more data from client         7,636       19            0      .000
jobq slave wait                       4,824   14,045        4,687     2.910
log buffer space                      3,882      875            1      .230
library cache pin                     2,167    5,243        1,773     2.420
SQL*Net more data to dblink           2,068        1            0      .000
PX Deq: Signal ACK                    1,658       52          852      .030
single-task message                   1,454      130            0      .090
library cache lock                    1,358    3,959        1,343     2.920
PX Deq: Msg Fragment                  1,165       16            3      .010
PX Deq: Parse Reply                   1,010      110           44      .110
undo segment extension                  498        0          498      .000
local write wait                        421      195          153      .460
async disk IO                           400       61            0      .150
process startup                         275       32            0      .120
db file parallel read                   239       32            0      .140
PX Deq: Join ACK                        233        1            0      .000
SQL*Net break/reset to dblink           198        0            0      .000
wait for stopper event to be i          120       13          117      .100
ncreased

PX Deq: Table Q Sample                  116        2            0      .010
log file switch completion              115       67           32      .590
row cache lock                           89      181           59     2.030
db file single write                     67        4            0      .060
log file sequential read                 54        0            0      .010
log file single write                    54        1            0      .020
wait list latch free                     52        1            0      .020
inactive session                         45       44           45      .990
PX Deq: Table Q Get Keys                 42        1            0      .020
library cache load lock                  20        2            0      .100
PX Deq: Txn Recovery Start                5        0            0      .010
reliable message                          3        3            2      .930
buffer deadlock                           2        0            2      .000
refresh controlfile command               1        0            0      .000
control file heartbeat                    1        4            1     3.910

61 ligne(s) s&#920;lectionn&#920;e(s).
Hi

following is the "SQL*Net message from dblink" Reference Note from Metalink. The note id is 34396.1. See if it can help you in isolating the problem

HTH
Vishal

_________START OF THE REFERENCE NOTE_________________-

This is a reference note for the wait event "SQL*Net message from dblink" which includes the following subsections:
Brief definition
Individual wait details (eg: For waits seen in <View:V$SESSION_WAIT>)
Systemwide wait details (eg: For waits seen in <View:V$SYSTEM_EVENT>)
Reducing waits / wait times
See Note 61998.1 for an introduction to Wait Events.

Definition:
Versions:7.0 - 9.2 Documentation: 9.0
The Oracle shadow process is waiting for a message over a database link from a remote process. Note that this wait is also used when waiting for data from "extproc" or from a remote gateway process.
Individual Waits:
  Parameters:
P1 = driver id
P2 = bytes#
P3 = Not used
driver id
In Oracle8i onwards P1RAW can be decoded into ASCII characters to give a clue as to which Net driver is used.
Eg: P1RAW=0x62657100 = 'beq\0' , P1RAW=0x54435000 = 'TCP\0' etc.
In earlier releases the value here is the value of the disconnect function of the Net driver being used (which is not much use).

bytes# (number of bytes received)
The number of bytes we need to receive. (This is often just 1 even though the expected packet will be much larger)
  Wait Time:
This wait blocks until a message is received from the remote connection (or until an abnormal end of file condition occurs on the underlying Net transport layer). There is no Oracle timeout on the wait.
  Finding Blockers:
The blocker is the network plus the remote process. If the remote process is another database instance (accessed via a database link) then the information in <View:V$SESSION> on the REMOTE database will have the same client details as the V$SESSION entry on the local database for the blocked session. You need to look at the remote instance to determine where that session is spending time.
Systemwide Waits:
If systemwide waits for this event are significant it is best to determine where the remote connections are to and switch attention to the remote instance / instances to determine where they are spending time. One can also look at:
Sessions with high values in <View:V$SESSTAT> for:
SQL*Net roundtrips to/from dblink
bytes sent via SQL*Net to dblink
bytes received via SQL*Net from dblink
The Network between the local and remote systems (problems are usually related to time spent ON the remote instance rather than in the network but it is worth checking if the network between the instances is slow / not)
Reducing Waits / Wait times:
It is normally best to see what propertion of the time is actually spent on the remote instance as the most common cause of long waits over a DB link is that the time is actually all spent doing work on the remote instance (either waiting or working).
Trace the session locally and at the remote instance to see what work a user "transaction" actually consists of
Check the execution plans for any distributed queries
See if frequently accessed remote data can be kept in a local snapshot (Materialized View).
 

Ok guys, I found that my database suffered several problems:

TEMP tablespace close to full (and no autogrow)
DATA tablespace close to full , but with LOW file increase
jobs that failed and rerun automatically all the time

Problems all solved now, tips used, performance is up again.

Thanks for the support
I will distribute the points for some new things learned!

pbocanegra (temp tablespace)
paquicuba (DRIVING_SITE hint)
riazpk (dbms_xplan.display)
vishal68 (bind variables / DBLINK)