• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1616
  • Last Modified:

Accuracy of NUM_ROWS with "estimate statistics"

I'm trying to find a quicker way to obtain row counts on very large tables (25+ million rows).  Using a "select count(*)" query on these monster tables is just too slow.

One idea that has been proposed is to analyze the table using "estimate statistics" and then to query all_tables for the NUM_ROWS value.  My question for you experts out there is: How accurate are the NUM_ROWS values when the ANALYZE command has estimated statistics instead of computing the statistics?  What factors contribute to the row counts being close or far off?

Alternatively, if any of you have any better suggestions for obtaining row counts on very large tables, I would love to hear them.

Thanks!

Chris Miller
0
LazarWolfe
Asked:
LazarWolfe
1 Solution
 
boriskalavskyCommented:
NUM_ROWS will show number of rows at the time ANALYZE command ran.
0
 
boriskalavskyCommented:
select count(1) from <table_name>; is a some what faster.
0
 
jpkempCommented:
A very rough but fast estimate may be gained through the use of the SAMPLE clause, e.g.:

SELECT COUNT(*) * 100 FROM mytable SAMPLE(1);

This only counts a sample of the rows from the table, and is nondeterministic (may return different values on different invocations).

The syntax is:

SELECT ... FROM {table} SAMPLE(n);

"n" is a percentage. If you specify 100, it will look at all rows (although this would be not very useful!).

Jeff
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
jpkempCommented:
Sorry, "n" may only take values in the range 0.000001 up to but not including 100.

Jeff
0
 
yavahCommented:
I really can't think about any other way to obtain the NUM_ROWS for a table more than the statistics.

If the database version is 8i or 9i I suggest not to use the ANALYZE command; instead Oracle recommends to use DBMS_STATS package.  

exec DBMS_STATS.GATHER_DATABASE_STATS (....) or
exec DBMS_STATS.GATHER_SCHEMA_STATS (....)
exec DBMS_STATS.GATHER_TABLE_STATS (....)


It is possible to define COMPUTE or ESTIMATE for gathering the statistics.


With DBMS_STATS you can gather statistics for the database, for a schema or for a table (or index).  When it is done for the database or a schema you can automatically update the statistics when new records are inserted or deleted, using 'GATHER STALE' option.

The advantage of using DBMS_STATS instead of ANALYZE is you can execute it in parallel, so it makes it faster; and it's more efficient for partitioned tables.


The difference between COMPUTE and ESTIMATE relies on the porcentage defined for ESTIMATE: 20 or 30 percent is a good sample size and it is pretty close to the result obtained with CPOMPUTE.  If the percentage it's not defined, Oracle uses as a sample only 1024 rows.

You can find more information in the Oracle's Performance Guide manual.

I hope this helps.

MS
0
 
tschofiledCommented:
I assume your intention here is to build some stats info for the optimizer.

Here is the approach we took :-

1) Copy database (nor export/import) to a test system
2) analyze tables and indexes - your choice of estimate or whatever.
3) Collect the stats info and put it into a table.
4) Run a job as frequently as required (you know your data and its change profile) to count the number of rows in tables -
select /*+ index_ffs (tab, pk_tab) */ count (*) from table tab;
5) Collect the number and apply the dbms_stats.set_table_stats(.....)
dbms_stats.set_index_stats(.....)

This runs on a 7 tbyte partitioned (14) database for the current partition in about 35 minutes - 67 tab partitions and 95 index partitions.

Tony
0
 
LazarWolfeAuthor Commented:
Thanks to all for the great comments!

Chris Miller
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now