Solved

Fetching rows from Oracle in a given range

Posted on 1999-01-08
3
1,942 Views
Last Modified: 2007-12-19

      I have one table which has got around 10 million records and
   since this table is so big it takes a while to populate these records
in Crystal
   reports.  Instead i was thinking of a way to populate say first 1000
records and
   then the next 1000 records and so on. So is there any way to populate
records
   within a given range, say populating first 1000 records and then
populating
   records from 1001-2000,2001-3000
   and so on. I thought Rownum might help but it doesn't.
   Can somebody help.
   Regards,
   Subin
0
Comment
Question by:subingeorge
3 Comments
 
LVL 2

Accepted Solution

by:
dslavin earned 100 total points
ID: 1083614
Here is an article I wrote for a company newspaper.  It contains within it a method that might do what you need to do.  Please read this and if you need clarification, I'll be glad to respond.

Hope this helps.

Processing All Records in Large Database Tables
-----------------------------------------------
If you are trying to process all of the records of a table with many records,
then you have probably encountered ORA-1555.

The traditional way to process all of the records in a table is to:
1) Create a PL/SQL stored procedure
2) In the Procedure, create a cursor which selects ALL of the table records
3) Loop through each record and perform the corresponding inserts, updates
   and deletes.
4) If the table has a large number of records, then perform a COMMIT at
   regular intervals to prevent the ROLLBACK SEGMENT from overflowing.

The problem with this methodology is that you are continuing to fetch
records from the same table "past" each commit.  Although Oracle allows
you to do this, this is not ANSI/SQL.  Also, even though you can keep the
cursor open between commits, the rollback segment is NOT getting flushed
with every commit.  Since the cursor is open against the same table, the
rollback segment keeps those entries until the cursor is closed.

==============================================================================
There are three increasingly complex ways to deal with this problem.  Here
they are in order of complexity:

1) Increase the size of your rollback segments.  
   If you have privileges to choose which rollback segment to currently use,
   or even better, privileges to create your own rollback segments, then this
   is the easiest solution.

   In this case you can avoid having to execute a COMMIT after every so-many
   records.
 
   The problem with this method, however, is that you may not have the
   required permissions to alter or create rollback segments.  Even if you
   do, the database may not have enough space to support a large enough
   rollback segment.  Also, you may not know how large a rollback segment
   you will need.  Thus you are reduced to guessing/trying/erroring-out/...

==============================================================================
2) Grab groups of records in order of ROWID.

   If your cursor is open against a single table, then you can use a simple
   algorithm to group your records by ROWID.  In this case, you can:

     - open a cursor for each smaller subset of records
     - process all of the records in each subset
     - close the cursor
     - perform a COMMIT

   The process above is repeated for each subset of records.  Thus you are
   no longer fetching past the commit.  Each commit is performed after
   the cursor is closed for that subset of processed records.  Here is some
   sample code to show you the algorithm:

   
   DECLARE
      rows_per_commit NUMBER := 1000;   -- Rows to process between COMMITs
      last_ROWID      ROWID  := '00000000.0000.0000';
      max_ROWID       ROWID;

   BEGIN
      -- Get Maximum ROWID from table being processed (EMP)
      select max(ROWID) into max_ROWID from EMP;

      -- Loop Through EVERY subset of records until last record is reached
      WHILE (last_ROWID < max_ROWID) LOOP
         DECLARE

            -- Create Cursor for next subset of records:
            -- - Returns ROWID as well
            -- - Begins at record with ROWID greater than last ROW processed
            -- - Uses ROWNUM to select just 'rows_per_commit' records
            CURSOR emp_cur IS
               SELECT
                  ROWID,
                  EMPNO,
                  DEPTNO,
                  SAL
               FROM
                  EMP
               WHERE
                  ROWID > last_ROWID
               AND
                  ROWNUM <= rows_per_commit
         
         BEGIN
            -- Perform Commit before each subset of records is processed
            COMMIT;

            -- Force database to use largest rollback segment
            SET TRANSACTION USE ROLLBACK SEGMENT big_rollback;

            -- Loop through all records in subset
            FOR emp_rec IN emp_cur LOOP

               -- Update each record to add COLA adjustment to salary
               update EMP set SAL = SAL*1.03 where ROWID = emp_rec.ROWID;

               -- Update value of last_ROWID to point to last record processed
               last_ROWID := emp_rec.ROWID;
            END LOOP;
         END;
      END LOOP;

      -- Perform final commit
      COMMIT;
   END;

==============================================================================
3) Grab groups of records in order of Primary Key

   If the cursor you are opening is NOT against a single table, but against a
   group of joined tables and/or views, then you will not be able to use
   Method #2.  After all of the joins are performed, the records are no longer
   returned in any sort of order (even by ROWID.)  You also can't use ROWNUM
   in a consistent fashion since it is calculated BEFORE the execution of
   any ORDER BY statement.  Thus, if you just replace ROWID by, for example,
   EMPNO, and get the next 1000 records by adding 'AND ROWNUM <= 1000', then
   you can't be certain that you will be getting the next largest set of
   EMPNO's (even if you ORDER BY EMPNO.)

   If your table records are distributed evenly with respect to your primary
   key, then you can just guess at where you need to break for commits.  For
   example:
      Minimum EMPNO = 1
      Maximum EMPNO = 100,000
     No. of Records = 100,000

     In this easiest of all cases you know that every EMPNO between 1 and
     100,000 is used.  If you wish to process 1000 records between commits
     then you can use this clause in your cursor statement:

     AND EMPNO BETWEEN low_EMPNO and high_EMPNO

     You would initialize low_EMPNO to 1 and high_EMPNO to 1000.
     Each time through the outer loop you would increase low_EMPNO and
     high_EMPNO by 1000 until all of the records had been processed.

   In the worst case, however, your table records will not be evenly
   distributed with respect to your primary key.  If the records are
   sufficiently "clumped" then you could choose a range with more records
   than can be handled by your rollback segment, while other ranges are
   sparse or even empty of records.

   Following is sample code that uses a binary search algorithm to determine
   the "break points" such that each subset of records is approximately
   equal in size no matter how unevenly distributed the table records are:


   DECLARE
      rows_per_commit NUMBER := 1000;  -- records to process between commits
      low_EMPNO NUMBER;     -- low EMPNO in range of records to process
      high_EMPNO NUMBER;    -- high EMPNO in range of records to process
      guess_EMPNO NUMBER;   -- guess at high EMPNO for next range of records
      max_EMPNO NUMBER;     -- max EMPNO in EMP table
      low_guess NUMBER;     -- lowest possible value for guess
      high_guess NUMBER;    -- highest possible value for guess
      count_EMPNO NUMBER;   -- number of records in range
      i_group NUMBER;       -- current group being processed
      n_group NUMBER;       -- total number of groups or records to process

   BEGIN

      -- Set initial high_EMPNO to -1
      high_EMPNO := -1;

      -- Get Maximum EMPNO
      select max(EMPNO) into max_EMPNO from EMP;

      -- Get number of groups of records to process
      select ceil(count(*)/rows_per_commit) into n_group from EMP;

      -- Loop through each group of records
      FOR i_group IN 1..n_group LOOP
         -- Get next low value as just greater than high value of last group
         select min(EMPNO) into low_EMPNO from EMP
            where EMPNO > high_EMPNO;

         -- IF this is the last group, then set high value to max value
         if (i_group = n_group) then
            high_EMPNO := max_EMPNO;

         -- ELSE this in not the last group
         else
            -- Guess high value by assuming that records are evenly distributed
            guess_EMPNO := ceil((max_EMPNO - low_EMPNO)/(n_group - i_group + 1))
                         + low_EMPNO;

            -- Set worst case limits to range of possible guesses
            low_guess := low_EMPNO;
            high_guess := max_EMPNO;
         end if;
 
         -- If this is not the last group, then loop until get high value
         WHILE (i_group < n_group) LOOP

            -- Get number of records between low and high values
            select count(*) into count_EMPNO from EMP where EMPNO between
               low_EMPNO and guess_EMPNO;

            -- If count is within 10% of 'rows_per_commit' then
            if (count_EMPNO between 0.9*rows_per_commit
                              and 1.1*rows_per_commit) then

               -- Set high value and EXIT while loop
               high_EMPNO := guess_EMPNO;
               exit;

            -- Else there are too many records in the range
            elsif (count_EMPNO > 1.1*rows_per_commit) then
               -- Reset highest possible guess
               high_guess := guess_EMPNO;

               -- Use binary search algorithm to reestimate high value
               guess_EMPNO := round((guess_EMPNO - low_guess)/2.0 + low_guess);

            -- Else there are too few records in the range
            else
               -- Reset lowest possible guess
               low_guess := guess_EMPNO;

               -- Use binary search algorithm to reestimate high value
               guess_EMPNO := round((high_guess - guess_EMPNO)/2 + guess_EMPNO);
            end if;
         END LOOP;

         -- Low and High values for range have been determined
         DECLARE
            -- Create Cursor for next group of records:
            -- - Uses low and high values to limit records processed
            CURSOR emp_cur IS
               SELECT
                  EMPNO,
                  DNAME,
                  SAL
               FROM
                  EMP,
                  DEPT
               WHERE
                  EMP.DEPTNO = DEPT.DEPTNO
               AND
                  EMPNO BETWEEN low_EMPNO and high_EMPNO
            ;

         BEGIN
            -- Perform Commit before each subset of records is processed
            COMMIT;

            -- Force database to use largest rollback segment
            SET TRANSACTION USE ROLLBACK SEGMENT big_rollback;

            -- Loop through all records in subset
            FOR emp_rec IN emp_cur LOOP

               -- Update each record to add COLA adjustment to salary
               if (emp_rec.dname = 'Engineering') then
                  update EMP set SAL = SAL*1.03 where EMPNO = emp_rec.EMPNO;
               else
                  update EMP set SAL = SAL*1.10 where EMPNO = emp_rec.EMPNO;
               end if;
            END LOOP;
         END;
      END LOOP;

      -- Perform final commit
      COMMIT;
   END;


   With this binary search algorithm, I seldom had to guess more than about
   four times to get a good value for high_EMPNO.  This algorithm may seem
   rather complex, but I couldn't think of anything simpler.

   If ROWNUM were calculated AFTER an ORDER BY, then I could use an algorithm
   that was very similar to Method #2, but with EMPNO instead of ROWID.


0
 
LVL 3

Expert Comment

by:vlad_impala
ID: 1083615
As I don't know exactly what you are trying to do and I dont use crystal reports (yet) here are couple of approaches that may give you some ideas.

You could try using  PL/SQL procedures that finds the primary key value that corresponds with a given record number in the result set and restrict the SQL query by these values. e.g.
SELECT col1, col2, col3
FROM tab1
WHERE col1 < row_value(2000)
AND col1 >= row_value(1000);

Or you could write a function that takes upper and lower range limits of the record span you want and a given value, and then returns Y or N if the the given value is within the limits.  This could be tied back into an SQL query e.g.
SELECT col1, col2, col3
FROM tab1
WHERE in_range(2000,3000,col1) = 'Y';
(i.e. you ask asking if the value of col1 is in the set of rows from 2000 to 3000)

Both approaches would need to order by some known criteria (and indexed column or rowid for example) to ensure consistancy.

Hope this helps
Vlad.
0
 

Author Comment

by:subingeorge
ID: 1083616
Thanks, You have given an excellent solution. I am going to use your second method, using Rowid. that should work.
Regards,
 Subin
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

This article started out as an Experts-Exchange question, which then grew into a quick tip to go along with an IOUG presentation for the Collaborate confernce and then later grew again into a full blown article with expanded functionality and legacy…
Note: this article covers simple compression. Oracle introduced in version 11g release 2 a new feature called Advanced Compression which is not covered here. General principle of Oracle compression Oracle compression is a way of reducing the d…
This video shows how to recover a database from a user managed backup
This video explains what a user managed backup is and shows how to take one, providing a couple of simple example scripts.

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now