Solved

how do I get data counts for tables in a database that has 6 terabytes of data

Posted on 2014-03-14
17
503 Views
Last Modified: 2014-03-25
I have a test environment and a production environment. I have production data in my test environment. I want to get a count from both databases and identify the count difference if any between the environments. A script was initially used to do a count against dba_tables. However, soon we discovered that the count isn't accurate until after the table (s) have been analyzed. Is there another method out there besides doing a count on the table themselves? I have tried looking at using statistics. However, it is unclear how to use it to get the information I need. Does anyone know what I can do to get this information?
0
Comment
Question by:sikyala
  • 6
  • 4
  • 2
  • +4
17 Comments
 
LVL 73

Expert Comment

by:sdstuber
ID: 39929722
If you want 100% accurate count you'll have to either gather statistics with a full compute (which will likely take a long time) then look at dba_tables  this will essentially do count(*) for each table, as well as gather the other statistic values.

or you'll have to count the tables yourself, which might also take a long time; but you won't have to do the other statistics or indexes or partition level stats.

there's no other method for counting beside counting.

You could attempt using estimates by average row size and number of blocks but that's not going to be reliable.


Note, if you do a count on a unique index of columns constrained to be not null, that will likely be faster than counting the table itself (the optimizer should do this for you though)
so, it might not be as bad as you think.

Also, a lot of that 6TB is probably indexes anyway, so it's not like you'll have to actually read all of it.  This doesn't mean it will done in a few seconds, but it's not necessarily a multi-hour/multi-day operation.
0
 
LVL 23

Expert Comment

by:David
ID: 39929929
It could be time to make the case that having production data in a test (non-production) environment is an unnecessary risk -- as non-production environments typically skimp on the penetration controls and monitoring.  IOW matching row counts may not be worth the bother.

Instead, I advocate for a test environment to hold enough data to test all of the business rules and conditions.  PII must of course be replaced -- tools and scripts exist.

All of the performance plans, metrics, load replication, and other metadata can be moved from production -- giving you a test bed with test data, but prod performance simulation.
0
 
LVL 10

Assisted Solution

by:dbmullen
dbmullen earned 125 total points
ID: 39932634
yes, there is no way to get a count other than counting...  
what I like to do is COUNT and save into a table:

create table your_owner.COUNT_TABLE (DB_NAME VARCHAR2(30), OWNER VARCHAR2(30), TABLE_NAME VARCHAR2(30), ROW_COUNT NUMBER, count_dt date)
;

select 'INSERT INTO ' || OWNER ||'.COUNT_TABLE (db_name, owner, table_name, row_count, count_dt) select sys_context(''USERENV'',''DB_NAME'') DB_NAME, ''' || owner || ''',''' || table_name ||''', count(*), sysdate from ' || owner ||'.' || table_name ||';'
    || chr(10) || 'commit;' paste_me
from dba_tables
where owner = 'your_owner'
order by 1
;

run that on BOTH test and prod, then you can compare them with a simple join
or insert them both into the same table using a link (real or implicit)..  whatever...

it's not going to be fast, and it's really hard to hit 2 moving targets
0
 
LVL 36

Expert Comment

by:Geert Gruwez
ID: 39933820
6 tera
i assume you didn't prevent any new inserts on the production database ?
the test database will probably not get any new inserts ... after the process is finished

a count on a static test system and a count on an active  production system will likely be different
0
 

Author Comment

by:sikyala
ID: 39934446
yes Geert the count will be different. We are trying to find out how different. The production system is used heavily and there is no down time. So inserts, updates, deletes are occurring.

The purpose of the test environment is to rebuild the production environment in a virtual environment. That is why there is production data in the test environment. We are close to building the production environment and other developers feel it is imperative that we know how different the two environments are.  We are at a disadvantage because in both environments doing a select count takes more than 24 hours on most of the tables We were just hoping that there was an easier way to gather this information.
0
 
LVL 23

Expert Comment

by:David
ID: 39934466
I'm still confused.  Wouldn't your ETL or your backup set show success or failure?
0
 
LVL 36

Expert Comment

by:Geert Gruwez
ID: 39934606
are you doing Physical to Virtual ?
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39934618
>>> yes Geert the count will be different. We are trying to find out how different.


Since you're trying to chase a moving target, that sounds like you're looking for an approximation anyway, so you can probably gather statistics and use those to get a relative difference.  How long that takes will depend on how big your sample sizes are.
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 

Author Comment

by:sikyala
ID: 39934692
The production database is currently in a physical environment.  The test environment is in a virtual environment with plans to move the production to the virtual environment. There is a lot going on in the current production environment. It is a RAC database with 2 nodes. There are scripts running at various times that are loading data etc. There is a application front end that has things I am not even aware of going on. Backups are running successfully.
0
 
LVL 34

Assisted Solution

by:Mark Geerlings
Mark Geerlings earned 125 total points
ID: 39934944
As others have said, there is no magic way to get a quick, accurate count of records in a large table.  You can either do the counts manually (select count....) or query the results of an analyze command, but either a manual query or an "analyze table...", or "dbms_stats.gather_table_stats(...)" will take some time if the table is large, and the value you get returned will be as of the point in time when the query or analyze command started.
0
 

Author Comment

by:sikyala
ID: 39934962
ok thanks
0
 

Author Comment

by:sikyala
ID: 39937922
how would I use "dbms_stats.gather_table_stats to gather statistics on all the tables in the database
0
 
LVL 73

Accepted Solution

by:
sdstuber earned 250 total points
ID: 39937947
you wouldn't

gather_table_stats is for a single table

if you want to do all tables in the database  use


dbms_stats.gather_database_stats


it can be as simple as


begin
dbms_stats.gather_database_stats;
end;


if you really wanted to use gather_table_stats then
build a loop to get all tables then iterate through them
but that's essentially what the gather_database_stats does for you
0
 

Author Comment

by:sikyala
ID: 39937965
ok awesome. what if I want to exclude certain schemas
0
 
LVL 73

Assisted Solution

by:sdstuber
sdstuber earned 250 total points
ID: 39938001
then don't use gather_database_stats

use dbms_stats.gather_schema_stats  once for each schema
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39952333
Not directly related to the question asked, so no points.  Just a caution about visualization and Oracle.

If your virtual environment is VMWare please read:
Support Position for Oracle Products Running on VMWare Virtualized Environments (Doc ID 249212.1)

In a nutshell:
You will only be supported by Oracle if it is a known bug or the problem can be reproduced on a physical server.

Personal experience:
I once had a 'bug' that was due to one row in one table.  Any time I selected just that specific row, I would get an ORA-3113.  It had to do with how the row crossed database blocks.  There was no way to reproduce the problem in any environment than production.
0
 

Author Closing Comment

by:sikyala
ID: 39954982
thanks. I'll do what works best
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Suggested Solutions

Introduction A previously published article on Experts Exchange ("Joins in Oracle", http://www.experts-exchange.com/Database/Oracle/A_8249-Joins-in-Oracle.html) makes a statement about "Oracle proprietary" joins and mixes the join syntax with gen…
How to Unravel a Tricky Query Introduction If you browse through the Oracle zones or any of the other database-related zones you'll come across some complicated solutions and sometimes you'll just have to wonder how anyone came up with them.  …
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
This video shows how to configure and send email from and Oracle database using both UTL_SMTP and UTL_MAIL, as well as comparing UTL_SMTP to a manual SMTP conversation with a mail server.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now