Hello,
I have problem with the cost base query optimizer (oracle 8.1.7) in case of
InterMedia Text indexes.
I have a table with a clob column indexed with InterMedia Text as:
create index IND_DOC_DOCUMENT_BUFFER On DOCUMENTS(DOCUMENT_BUFFER)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS(''FILTER CTXSYS.NULL_FILTER SECTION GROUP CTXSYS.AUTO_SECTION_GROUP'
);
There are 60000 xml documents in the table, and all the documents contains the word 'test' within the 'type' tag.
When I execute the following query:
select * from (
select /* FIRST_ROWS */ id from documents where contains(document_buffer,'
test')>0 order by id desc
)
where rownum<=20;
the query optimizer produces the following execution plan:
Query Plan
----------
SELECT STATEMENT Cost = 826
COUNT STOPKEY
VIEW
TABLE ACCESS BY INDEX ROWID DOCUMENTS
INDEX FULL SCAN DESCENDING PK_DOC_ID
In this case the query optimizer detects, that the contains expression is not
selective, so the query plan does not use the text index. The query quickly returns
the first 20 rows.
In contrary, when I use the contains(document_buffer,'
test within type')>0 expression,
the query optimizer choses an other query plan:
Query Plan
----------
SELECT STATEMENT Cost = 58
COUNT STOPKEY
VIEW
SORT ORDER BY STOPKEY
TABLE ACCESS BY INDEX ROWID DOCUMENTS
DOMAIN INDEX IND_DOC_DOCUMENT_BUFFER
In this case the query optimizer is not able to estimate the selectivity of the contains
expression properly, so the query plan use the text index to retrive the selected rows.
The result set (all the 60000 rows) must be sorted before the first 20 record is retrived
and it takes 80 seconds executing the query.
The expressions "contains(document_buffer,
'test')>0"
and
"contains(document_buffer,
'test within type')>0" yields different selectivity
for the query optimizer, Though all the documents contains the word 'test'
within the 'type' tag, and so the two expression should have the same selectivity.
Is there someone who knows how to make the query optimizer to build the proper query paln
in the second case as well? Recently I have to run a query to decide the selectivity of
the expression and use "contains(...) || null >0" expression to avoid using the text index
when the expression is not selective.