?
Solved

Optimize SELECT DISTINCT on a nonunique field

Posted on 2005-04-08
11
Medium Priority
?
6,440 Views
Last Modified: 2008-01-09
I have a table with about 7 million records in it.
One of the fields there contains about 30 different values
There is a nonunique index on that field.
When I run a SELECT DISTINCT nonuniquefield FROM table
it's using full table scan (and runs forever)

I tried
select /*+INDEX(nonuniquefield)*/ DISTINCT nonuniquefield FROM table
and
select /*+INDEX(nonuniqueindex)*/ DISTINCT nonuniquefield FROM table
but Oracle still does a full table scan
Is there any way to speed up this SELECT DISTINCT statement?
0
Comment
Question by:bamboo7431
  • 4
  • 3
  • 2
  • +2
11 Comments
 
LVL 25

Expert Comment

by:jrb1
ID: 13740183
how about

select /*+ index_ffs(table,nonuniquefield) parallel_index(table, nonuniquefield)*/
 DISTINCT nonuniquefield FROM table
0
 
LVL 23

Expert Comment

by:paquicuba
ID: 13740191
Full table scan is faster than using indexes in this case, and DISTINCT is a performance killer.

Instead of :   SELECT DISTINCT COL1 FROM TABLEA;   -- Cost    31868
Use: SELECT MAX(ITEM) FROM IA GROUP BY ITEM;     -- Cost      9920

SQL> SELECT COUNT(*) FROM TABLEA
  2  /

  COUNT(*)
----------
  11522620

Elapsed: 00:00:18.07


SQL> EXPLAIN PLAN SET STATEMENT_ID = '1' FOR SELECT DISTINCT COL1 FROM TABLEA;

Explained.

Elapsed: 00:00:00.00
SQL> @EXPLAIN
Enter Statement ID: 1

Query Plan
--------------------------------------------------------------------------

SELECT STATEMENT          Cost = 31868


SQL> EXPLAIN PLAN SET STATEMENT_ID = '2' FOR SELECT MAX(ITEM) FROM IA GROUP BY ITEM;

Explained.

Elapsed: 00:00:00.00
SQL> @EXPLAIN
Enter Statement ID: 2

Query Plan
--------------------------------------------------------------------------------------

SELECT STATEMENT          Cost = 9920
0
 
LVL 23

Expert Comment

by:paquicuba
ID: 13740195
Use: SELECT MAX(<your_column>) FROM <your_table> GROUP BY <your_item>;  
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 4

Author Comment

by:bamboo7431
ID: 13740238
still does a full table scan...
0
 
LVL 25

Expert Comment

by:jrb1
ID: 13740284
Have you analyzed the table recently?
0
 
LVL 25

Expert Comment

by:jrb1
ID: 13740392
I just added a non-unique key on a field with values similar to yours....not quite as many row, but the same idea.  I ran a query pulling the distinct values, and the plan took the Index Fast Full Scan without a hint or anything.  There must be a reason your program is avoiding the index.
0
 
LVL 25

Expert Comment

by:jrb1
ID: 13740400
After I delete the index and rerun the SQL, I'm back to a full table scan.
0
 
LVL 23

Expert Comment

by:paquicuba
ID: 13741804
You cannot force the Optimizer to use an index if it is not present in the WHERE clause. Like it or not you have to use the fastest path for a full table scan.
0
 
LVL 1

Accepted Solution

by:
cybotto earned 200 total points
ID: 13744017
You can try to create a bitmap index instead and specify in it the index hint (note table alias)

SELECT /*+ INDEX (t t_col_bmp_idx) */  DISTINCT nonuniquefield
FROM table t

OR

SELECT /*+ INDEX (t t_col_bmp_idx) */ nonuniquefield
FROM table t
GROUP BY nonuniquefield

OR trying to trick

SELECT /*+ INDEX (t t_col_bmp_idx) */  DISTINCT nonuniquefield
FROM table t
WHERE nonuniquefield <> 'XXXXXXXXXXXXXXXXXXXXXXX'

Creating a b-tree index and gather statistics on this index will kill it, since has only 30 values from 7millions rows.
0
 
LVL 13

Expert Comment

by:riazpk
ID: 13746289
@@ paquicuba
"You cannot force the Optimizer to use an index if it is not present in the WHERE clause. Like it or not you have to use the fastest path for a full table scan"

That is not true. Here is a test case:

SQL> create table test (a number primary key, b number);

Table created.

SQL> insert into test values (1,2);

1 row created.

SQL> ed
Wrote file afiedt.buf

  1* insert into test values (2,2)
SQL> /

1 row created.

SQL> ed
Wrote file afiedt.buf

  1* insert into test values (3,2)
SQL> /

1 row created.

SQL> select * from test;

         A          B
---------- ----------
         1          2
         2          2
         3          2

SQL> set autotrace on
SQL> ed
Wrote file afiedt.buf

  1* analyze table test compute statistics for table for all indexes for all indexed columns
SQL> /

Table analyzed.

SQL> select * From test;

         A          B
---------- ----------
         1          2
         2          2
         3          2


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=3 Bytes=27)
   1    0   TABLE ACCESS (FULL) OF 'TEST' (Cost=2 Card=3 Bytes=27)




Statistics
----------------------------------------------------------
         27  recursive calls
          0  db block gets
         10  consistent gets
          0  physical reads
          0  redo size
        255  bytes sent via SQL*Net to client
        270  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          3  rows processed

SQL> set autotrace off
SQL> select index_name from user_indexes where table_name='TEST';

INDEX_NAME
------------------------------
SYS_C00124977

SQL> set autotrace on

SQL> ed
Wrote file afiedt.buf

  1* select /*+index(test SYS_C00124977) */ * from test
SQL> /

         A          B
---------- ----------
         1          2
         2          2
         3          2


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=3 Bytes=27)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST' (Cost=2 Card=3 Byt
          es=27)

   2    1     INDEX (FULL SCAN) OF 'SYS_C00124977' (UNIQUE) (Cost=1 Ca
          rd=3)





Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        256  bytes sent via SQL*Net to client
        270  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          3  rows processed
0
 
LVL 4

Author Comment

by:bamboo7431
ID: 13762413
The
WHERE nonuniquefield <> 'XXXXXXXXXXXXXX'
clause did the trick even without the hint.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How to Create User-Defined Aggregates in Oracle Before we begin creating these things, what are user-defined aggregates?  They are a feature introduced in Oracle 9i that allows a developer to create his or her own functions like "SUM", "AVG", and…
Configuring and using Oracle Database Gateway for ODBC Introduction First, a brief summary of what a Database Gateway is.  A Gateway is a set of driver agents and configurations that allow an Oracle database to communicate with other platforms…
Via a live example show how to connect to RMAN, make basic configuration settings changes and then take a backup of a demo database
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question