Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

how do I get data counts for tables in a database that has 6 terabytes of data

Posted on 2014-03-14
17
Medium Priority
?
529 Views
Last Modified: 2014-03-25
I have a test environment and a production environment. I have production data in my test environment. I want to get a count from both databases and identify the count difference if any between the environments. A script was initially used to do a count against dba_tables. However, soon we discovered that the count isn't accurate until after the table (s) have been analyzed. Is there another method out there besides doing a count on the table themselves? I have tried looking at using statistics. However, it is unclear how to use it to get the information I need. Does anyone know what I can do to get this information?
0
Comment
Question by:sikyala
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
  • 2
  • +4
17 Comments
 
LVL 74

Expert Comment

by:sdstuber
ID: 39929722
If you want 100% accurate count you'll have to either gather statistics with a full compute (which will likely take a long time) then look at dba_tables  this will essentially do count(*) for each table, as well as gather the other statistic values.

or you'll have to count the tables yourself, which might also take a long time; but you won't have to do the other statistics or indexes or partition level stats.

there's no other method for counting beside counting.

You could attempt using estimates by average row size and number of blocks but that's not going to be reliable.


Note, if you do a count on a unique index of columns constrained to be not null, that will likely be faster than counting the table itself (the optimizer should do this for you though)
so, it might not be as bad as you think.

Also, a lot of that 6TB is probably indexes anyway, so it's not like you'll have to actually read all of it.  This doesn't mean it will done in a few seconds, but it's not necessarily a multi-hour/multi-day operation.
0
 
LVL 23

Expert Comment

by:David
ID: 39929929
It could be time to make the case that having production data in a test (non-production) environment is an unnecessary risk -- as non-production environments typically skimp on the penetration controls and monitoring.  IOW matching row counts may not be worth the bother.

Instead, I advocate for a test environment to hold enough data to test all of the business rules and conditions.  PII must of course be replaced -- tools and scripts exist.

All of the performance plans, metrics, load replication, and other metadata can be moved from production -- giving you a test bed with test data, but prod performance simulation.
0
 
LVL 10

Assisted Solution

by:dbmullen
dbmullen earned 500 total points
ID: 39932634
yes, there is no way to get a count other than counting...  
what I like to do is COUNT and save into a table:

create table your_owner.COUNT_TABLE (DB_NAME VARCHAR2(30), OWNER VARCHAR2(30), TABLE_NAME VARCHAR2(30), ROW_COUNT NUMBER, count_dt date)
;

select 'INSERT INTO ' || OWNER ||'.COUNT_TABLE (db_name, owner, table_name, row_count, count_dt) select sys_context(''USERENV'',''DB_NAME'') DB_NAME, ''' || owner || ''',''' || table_name ||''', count(*), sysdate from ' || owner ||'.' || table_name ||';'
    || chr(10) || 'commit;' paste_me
from dba_tables
where owner = 'your_owner'
order by 1
;

run that on BOTH test and prod, then you can compare them with a simple join
or insert them both into the same table using a link (real or implicit)..  whatever...

it's not going to be fast, and it's really hard to hit 2 moving targets
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 38

Expert Comment

by:Geert Gruwez
ID: 39933820
6 tera
i assume you didn't prevent any new inserts on the production database ?
the test database will probably not get any new inserts ... after the process is finished

a count on a static test system and a count on an active  production system will likely be different
0
 

Author Comment

by:sikyala
ID: 39934446
yes Geert the count will be different. We are trying to find out how different. The production system is used heavily and there is no down time. So inserts, updates, deletes are occurring.

The purpose of the test environment is to rebuild the production environment in a virtual environment. That is why there is production data in the test environment. We are close to building the production environment and other developers feel it is imperative that we know how different the two environments are.  We are at a disadvantage because in both environments doing a select count takes more than 24 hours on most of the tables We were just hoping that there was an easier way to gather this information.
0
 
LVL 23

Expert Comment

by:David
ID: 39934466
I'm still confused.  Wouldn't your ETL or your backup set show success or failure?
0
 
LVL 38

Expert Comment

by:Geert Gruwez
ID: 39934606
are you doing Physical to Virtual ?
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39934618
>>> yes Geert the count will be different. We are trying to find out how different.


Since you're trying to chase a moving target, that sounds like you're looking for an approximation anyway, so you can probably gather statistics and use those to get a relative difference.  How long that takes will depend on how big your sample sizes are.
0
 

Author Comment

by:sikyala
ID: 39934692
The production database is currently in a physical environment.  The test environment is in a virtual environment with plans to move the production to the virtual environment. There is a lot going on in the current production environment. It is a RAC database with 2 nodes. There are scripts running at various times that are loading data etc. There is a application front end that has things I am not even aware of going on. Backups are running successfully.
0
 
LVL 35

Assisted Solution

by:Mark Geerlings
Mark Geerlings earned 500 total points
ID: 39934944
As others have said, there is no magic way to get a quick, accurate count of records in a large table.  You can either do the counts manually (select count....) or query the results of an analyze command, but either a manual query or an "analyze table...", or "dbms_stats.gather_table_stats(...)" will take some time if the table is large, and the value you get returned will be as of the point in time when the query or analyze command started.
0
 

Author Comment

by:sikyala
ID: 39934962
ok thanks
0
 

Author Comment

by:sikyala
ID: 39937922
how would I use "dbms_stats.gather_table_stats to gather statistics on all the tables in the database
0
 
LVL 74

Accepted Solution

by:
sdstuber earned 1000 total points
ID: 39937947
you wouldn't

gather_table_stats is for a single table

if you want to do all tables in the database  use


dbms_stats.gather_database_stats


it can be as simple as


begin
dbms_stats.gather_database_stats;
end;


if you really wanted to use gather_table_stats then
build a loop to get all tables then iterate through them
but that's essentially what the gather_database_stats does for you
0
 

Author Comment

by:sikyala
ID: 39937965
ok awesome. what if I want to exclude certain schemas
0
 
LVL 74

Assisted Solution

by:sdstuber
sdstuber earned 1000 total points
ID: 39938001
then don't use gather_database_stats

use dbms_stats.gather_schema_stats  once for each schema
0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39952333
Not directly related to the question asked, so no points.  Just a caution about visualization and Oracle.

If your virtual environment is VMWare please read:
Support Position for Oracle Products Running on VMWare Virtualized Environments (Doc ID 249212.1)

In a nutshell:
You will only be supported by Oracle if it is a known bug or the problem can be reproduced on a physical server.

Personal experience:
I once had a 'bug' that was due to one row in one table.  Any time I selected just that specific row, I would get an ORA-3113.  It had to do with how the row crossed database blocks.  There was no way to reproduce the problem in any environment than production.
0
 

Author Closing Comment

by:sikyala
ID: 39954982
thanks. I'll do what works best
0

Featured Post

Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you ever had to make fundamental changes to a table in Oracle, but haven't been able to get any downtime?  I'm talking things like: * Dropping columns * Shrinking allocated space * Removing chained blocks and restoring the PCTFREE * Re-or…
I remember the day when someone asked me to create a user for an application developement. The user should be able to create views and materialized views and, so, I used the following syntax: (CODE) This way, I guessed, I would ensure that use…
This video shows information on the Oracle Data Dictionary, starting with the Oracle documentation, explaining the different types of Data Dictionary views available by group and permissions as well as giving examples on how to retrieve data from th…
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Suggested Courses

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question