Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 532
  • Last Modified:

how do I get data counts for tables in a database that has 6 terabytes of data

I have a test environment and a production environment. I have production data in my test environment. I want to get a count from both databases and identify the count difference if any between the environments. A script was initially used to do a count against dba_tables. However, soon we discovered that the count isn't accurate until after the table (s) have been analyzed. Is there another method out there besides doing a count on the table themselves? I have tried looking at using statistics. However, it is unclear how to use it to get the information I need. Does anyone know what I can do to get this information?
0
sikyala
Asked:
sikyala
  • 6
  • 4
  • 2
  • +4
4 Solutions
 
sdstuberCommented:
If you want 100% accurate count you'll have to either gather statistics with a full compute (which will likely take a long time) then look at dba_tables  this will essentially do count(*) for each table, as well as gather the other statistic values.

or you'll have to count the tables yourself, which might also take a long time; but you won't have to do the other statistics or indexes or partition level stats.

there's no other method for counting beside counting.

You could attempt using estimates by average row size and number of blocks but that's not going to be reliable.


Note, if you do a count on a unique index of columns constrained to be not null, that will likely be faster than counting the table itself (the optimizer should do this for you though)
so, it might not be as bad as you think.

Also, a lot of that 6TB is probably indexes anyway, so it's not like you'll have to actually read all of it.  This doesn't mean it will done in a few seconds, but it's not necessarily a multi-hour/multi-day operation.
0
 
DavidSenior Oracle Database AdministratorCommented:
It could be time to make the case that having production data in a test (non-production) environment is an unnecessary risk -- as non-production environments typically skimp on the penetration controls and monitoring.  IOW matching row counts may not be worth the bother.

Instead, I advocate for a test environment to hold enough data to test all of the business rules and conditions.  PII must of course be replaced -- tools and scripts exist.

All of the performance plans, metrics, load replication, and other metadata can be moved from production -- giving you a test bed with test data, but prod performance simulation.
0
 
dbmullenCommented:
yes, there is no way to get a count other than counting...  
what I like to do is COUNT and save into a table:

create table your_owner.COUNT_TABLE (DB_NAME VARCHAR2(30), OWNER VARCHAR2(30), TABLE_NAME VARCHAR2(30), ROW_COUNT NUMBER, count_dt date)
;

select 'INSERT INTO ' || OWNER ||'.COUNT_TABLE (db_name, owner, table_name, row_count, count_dt) select sys_context(''USERENV'',''DB_NAME'') DB_NAME, ''' || owner || ''',''' || table_name ||''', count(*), sysdate from ' || owner ||'.' || table_name ||';'
    || chr(10) || 'commit;' paste_me
from dba_tables
where owner = 'your_owner'
order by 1
;

run that on BOTH test and prod, then you can compare them with a simple join
or insert them both into the same table using a link (real or implicit)..  whatever...

it's not going to be fast, and it's really hard to hit 2 moving targets
0
Learn Veeam advantages over legacy backup

Every day, more and more legacy backup customers switch to Veeam. Technologies designed for the client-server era cannot restore any IT service running in the hybrid cloud within seconds. Learn top Veeam advantages over legacy backup and get Veeam for the price of your renewal

 
Geert GruwezOracle dbaCommented:
6 tera
i assume you didn't prevent any new inserts on the production database ?
the test database will probably not get any new inserts ... after the process is finished

a count on a static test system and a count on an active  production system will likely be different
0
 
sikyalaSenior Database AdministratorAuthor Commented:
yes Geert the count will be different. We are trying to find out how different. The production system is used heavily and there is no down time. So inserts, updates, deletes are occurring.

The purpose of the test environment is to rebuild the production environment in a virtual environment. That is why there is production data in the test environment. We are close to building the production environment and other developers feel it is imperative that we know how different the two environments are.  We are at a disadvantage because in both environments doing a select count takes more than 24 hours on most of the tables We were just hoping that there was an easier way to gather this information.
0
 
DavidSenior Oracle Database AdministratorCommented:
I'm still confused.  Wouldn't your ETL or your backup set show success or failure?
0
 
Geert GruwezOracle dbaCommented:
are you doing Physical to Virtual ?
0
 
sdstuberCommented:
>>> yes Geert the count will be different. We are trying to find out how different.


Since you're trying to chase a moving target, that sounds like you're looking for an approximation anyway, so you can probably gather statistics and use those to get a relative difference.  How long that takes will depend on how big your sample sizes are.
0
 
sikyalaSenior Database AdministratorAuthor Commented:
The production database is currently in a physical environment.  The test environment is in a virtual environment with plans to move the production to the virtual environment. There is a lot going on in the current production environment. It is a RAC database with 2 nodes. There are scripts running at various times that are loading data etc. There is a application front end that has things I am not even aware of going on. Backups are running successfully.
0
 
Mark GeerlingsDatabase AdministratorCommented:
As others have said, there is no magic way to get a quick, accurate count of records in a large table.  You can either do the counts manually (select count....) or query the results of an analyze command, but either a manual query or an "analyze table...", or "dbms_stats.gather_table_stats(...)" will take some time if the table is large, and the value you get returned will be as of the point in time when the query or analyze command started.
0
 
sikyalaSenior Database AdministratorAuthor Commented:
ok thanks
0
 
sikyalaSenior Database AdministratorAuthor Commented:
how would I use "dbms_stats.gather_table_stats to gather statistics on all the tables in the database
0
 
sdstuberCommented:
you wouldn't

gather_table_stats is for a single table

if you want to do all tables in the database  use


dbms_stats.gather_database_stats


it can be as simple as


begin
dbms_stats.gather_database_stats;
end;


if you really wanted to use gather_table_stats then
build a loop to get all tables then iterate through them
but that's essentially what the gather_database_stats does for you
0
 
sikyalaSenior Database AdministratorAuthor Commented:
ok awesome. what if I want to exclude certain schemas
0
 
sdstuberCommented:
then don't use gather_database_stats

use dbms_stats.gather_schema_stats  once for each schema
0
 
slightwv (䄆 Netminder) Commented:
Not directly related to the question asked, so no points.  Just a caution about visualization and Oracle.

If your virtual environment is VMWare please read:
Support Position for Oracle Products Running on VMWare Virtualized Environments (Doc ID 249212.1)

In a nutshell:
You will only be supported by Oracle if it is a known bug or the problem can be reproduced on a physical server.

Personal experience:
I once had a 'bug' that was due to one row in one table.  Any time I selected just that specific row, I would get an ORA-3113.  It had to do with how the row crossed database blocks.  There was no way to reproduce the problem in any environment than production.
0
 
sikyalaSenior Database AdministratorAuthor Commented:
thanks. I'll do what works best
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 6
  • 4
  • 2
  • +4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now