?
Solved

Accuracy of NUM_ROWS with "estimate statistics"

Posted on 2003-02-26
7
Medium Priority
?
1,602 Views
Last Modified: 2012-08-14
I'm trying to find a quicker way to obtain row counts on very large tables (25+ million rows).  Using a "select count(*)" query on these monster tables is just too slow.

One idea that has been proposed is to analyze the table using "estimate statistics" and then to query all_tables for the NUM_ROWS value.  My question for you experts out there is: How accurate are the NUM_ROWS values when the ANALYZE command has estimated statistics instead of computing the statistics?  What factors contribute to the row counts being close or far off?

Alternatively, if any of you have any better suggestions for obtaining row counts on very large tables, I would love to hear them.

Thanks!

Chris Miller
0
Comment
Question by:LazarWolfe
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 4

Expert Comment

by:boriskalavsky
ID: 8029163
NUM_ROWS will show number of rows at the time ANALYZE command ran.
0
 
LVL 4

Expert Comment

by:boriskalavsky
ID: 8029212
select count(1) from <table_name>; is a some what faster.
0
 
LVL 5

Expert Comment

by:jpkemp
ID: 8030223
A very rough but fast estimate may be gained through the use of the SAMPLE clause, e.g.:

SELECT COUNT(*) * 100 FROM mytable SAMPLE(1);

This only counts a sample of the rows from the table, and is nondeterministic (may return different values on different invocations).

The syntax is:

SELECT ... FROM {table} SAMPLE(n);

"n" is a percentage. If you specify 100, it will look at all rows (although this would be not very useful!).

Jeff
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 5

Expert Comment

by:jpkemp
ID: 8030235
Sorry, "n" may only take values in the range 0.000001 up to but not including 100.

Jeff
0
 

Accepted Solution

by:
yavah earned 400 total points
ID: 8031278
I really can't think about any other way to obtain the NUM_ROWS for a table more than the statistics.

If the database version is 8i or 9i I suggest not to use the ANALYZE command; instead Oracle recommends to use DBMS_STATS package.  

exec DBMS_STATS.GATHER_DATABASE_STATS (....) or
exec DBMS_STATS.GATHER_SCHEMA_STATS (....)
exec DBMS_STATS.GATHER_TABLE_STATS (....)


It is possible to define COMPUTE or ESTIMATE for gathering the statistics.


With DBMS_STATS you can gather statistics for the database, for a schema or for a table (or index).  When it is done for the database or a schema you can automatically update the statistics when new records are inserted or deleted, using 'GATHER STALE' option.

The advantage of using DBMS_STATS instead of ANALYZE is you can execute it in parallel, so it makes it faster; and it's more efficient for partitioned tables.


The difference between COMPUTE and ESTIMATE relies on the porcentage defined for ESTIMATE: 20 or 30 percent is a good sample size and it is pretty close to the result obtained with CPOMPUTE.  If the percentage it's not defined, Oracle uses as a sample only 1024 rows.

You can find more information in the Oracle's Performance Guide manual.

I hope this helps.

MS
0
 
LVL 2

Expert Comment

by:tschofiled
ID: 8032408
I assume your intention here is to build some stats info for the optimizer.

Here is the approach we took :-

1) Copy database (nor export/import) to a test system
2) analyze tables and indexes - your choice of estimate or whatever.
3) Collect the stats info and put it into a table.
4) Run a job as frequently as required (you know your data and its change profile) to count the number of rows in tables -
select /*+ index_ffs (tab, pk_tab) */ count (*) from table tab;
5) Collect the number and apply the dbms_stats.set_table_stats(.....)
dbms_stats.set_index_stats(.....)

This runs on a 7 tbyte partitioned (14) database for the current partition in about 35 minutes - 67 tab partitions and 95 index partitions.

Tony
0
 

Author Comment

by:LazarWolfe
ID: 8034909
Thanks to all for the great comments!

Chris Miller
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Truncate is a DDL Command where as Delete is a DML Command. Both will delete data from table, but what is the difference between these below statements truncate table <table_name> ?? delete from <table_name> ?? The first command cannot be …
Working with Network Access Control Lists in Oracle 11g (part 2) Part 1: http://www.e-e.com/A_8429.html Previously, I introduced the basics of network ACL's including how to create, delete and modify entries to allow and deny access.  For many…
This video shows information on the Oracle Data Dictionary, starting with the Oracle documentation, explaining the different types of Data Dictionary views available by group and permissions as well as giving examples on how to retrieve data from th…
This video shows how to recover a database from a user managed backup
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question