Link to home
Start Free TrialLog in
Avatar of Steve Wales
Steve WalesFlag for United States of America

asked on

Generate all of the "missing" months in data

We have a database that's recently been converted from SQL Server 2005 to Oracle 11.2.0.3.

There's a table that contains inventory usage statistics by month.  It's keyed on the part number and the period:

Both are CHAR:

STOCK_CODE CHAR(9)
ACCT_PER CHAR (6) -- Format is YYYYMM

There are a bunch of other columns in the table tracking total of orders, issues, prices etc, but they aren't particularly relevant.  There is one row per item per month where the the item actually has usage.

There was a query used in SQL Server where the developer performed a cross join - joining the table to itself to effectively generate a list of periods - which was then used as a part of a larger query in order to give rolling totals for each month for each inventory item.

The query looked like this:

SELECT DISTINCT A.STOCK_CODE
      , CAST((B.ACCT_PER + '01') AS DATETIME) AS ACCT_PER
FROM INVENT_STATS A 
CROSS JOIN INVENT_STATS B

Open in new window


It ran pretty quickly, scanning 900,000 rows pretty darn quick.  Looking at the actual execution plan, it doesn't seem to need to sort anything because the clustered index returns the rows in order, joins them together and returns the distinct list of stock codes and periods.

In Oracle, because there's no such thing as a clustered index, the explain plan is telling me that it's sorting, which is probably why it runs and runs and runs.  Considering that a cross join is a.num_rows * b.num_rows, that's a lot of sorting it needs to do on a 900K row table.

The goal is that if we have this data in INVENT_STATS:

STOCK_CODE  ACCT_PER
ITEM1        201302
ITEM1        201307
ITEM1        201310

Open in new window


What we get is one row returned for each month of the year (working under the assumption that every month at least something out of inventory is going to be ordered or used somewhere, this generating at least one row per month for at least something in inventory.

That then feeds other parts of reports so that they can get rolling report values on amounts and quantities issued, ordered etc.

I believe that the answer lies somewhere in the use of SELECT ... OVER (PARTITION BY ) in order to get what we need but we can't quite hit the syntax and hoping for some help from here.

(I am still searching EE and other sites for answers, but wanted to get this posted to help expedite a solution for the developer).
Avatar of Sean Stuber
Sean Stuber

In Oracle, because there's no such thing as a clustered index

Not by that name, but conceptually same idea - A table with a clustered index in sql server would be analagous to an index-organized table in oracle.

However if you have indexes on the two columns whether organized (clustered) by one of those indexes or not you could probably do something like this fairly effeciently.

SELECT DISTINCT A.STOCK_CODE
      ,to_date(B.ACCT_PER,'yyyymm') AS ACCT_PER
FROM INVENT_STATS A
CROSS JOIN INVENT_STATS B

A single index on both columns could work too but it would be skip scan for one use.
Avatar of Steve Wales

ASKER

Can't use an IOT, it's a third party solution and modification of the DB is not an option (but yes, I'd forgotten about them).

That query is pretty much what we've started with and it's taking forever.  As mentioned in the question, that seems to be doing an awful lot of work

Output from explain plan:

--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |  1017K|    93M|       |  3728M  (1)|999:59:59 |
|   1 |  HASH UNIQUE           |           |  1017K|    93M|    18T|  3728M  (1)|999:59:59 |
|   2 |   MERGE JOIN CARTESIAN |           |   193G|    16T|       |   726M  (1)|999:59:59 |
|*  3 |    INDEX FAST FULL SCAN| ISTATS_PK |   220K|    14M|       |  1123   (2)| 00:00:14 |
|   4 |    BUFFER SORT         |           |   880K|    20M|       |   726M  (1)|999:59:59 |
|   5 |     TABLE ACCESS FULL  | ISTATS    |   880K|    20M|       |  3301   (1)| 00:00:40 |
--------------------------------------------------------------------------------------------

Open in new window


That MERGE JOIN CARTESIAN with its 16T bytes  seems to be what's taking the time.

I can get a SELECT DISTINCT ACCT_PER back to me with a complete list of the accounting periods in about .20 of a second - which is why I'm thinking that some form of analytical function joining that result set is going to get me the answers I need in an acceptable timeframe.
ASKER CERTIFIED SOLUTION
Avatar of Sean Stuber
Sean Stuber

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
This was my test bed...
I just used a distinct subset of dba_objects  to give me a bunch of "stock_code" values along with other stuff just to give my table some bulk.
Join in 100 months to give me some history, then created the pk.

I think that roughly mirrors the structure you've described, except it should be bigger and more painful.

CREATE TABLE invent_stats
AS
    SELECT owner,
           stock_code,
           subobject_name,
           object_id,
           data_object_id,
           object_type,
           created,
           last_ddl_time,
           timestamp,
           status,
           temporary,
           generated,
           secondary,
           namespace,
           edition_name,
           acct_per
      FROM (SELECT *
              FROM (SELECT owner,
                           object_name stock_code,
                           subobject_name,
                           object_id,
                           data_object_id,
                           object_type,
                           created,
                           last_ddl_time,
                           timestamp,
                           status,
                           temporary,
                           generated,
                           secondary,
                           namespace,
                           edition_name,
                           ROW_NUMBER() OVER(PARTITION BY object_name ORDER BY object_id) rn
                      FROM dba_objects o)
             WHERE rn = 1),
           (    SELECT TO_CHAR(ADD_MONTHS(SYSDATE, -LEVEL), 'yyyymm') acct_per
                  FROM DUAL
            CONNECT BY LEVEL <= 100);

ALTER TABLE invent_stats ADD
CONSTRAINT pk_invent_stats
 PRIMARY KEY (stock_code, acct_per);

Open in new window

Thanks sdstuber, that is exactly what I needed (and I had just come to the same conclusion myself - I was in the process of typing in my reply when the email about your reply came in).

I had come up with the following too:

select stock_code, acct_per
from ( select distinct stock_code from invent_stats),
     ( select distinct acct_per from invent_stats)

Open in new window


The way the data is distributed, instead of doing 900K x 900K rows in a cross join, it's doing 25000 x 250 and is generating the correct result set in practically no time.

The explain plan from my query looks exactly like yours too (at least from steps taken, rows and costs, of course, are different).

Thanks again for your time and effort.

Oh and for the sake of completeness of the answer archive for EE, I used the following question from asktom as the basis for coming up with my query:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8912311513313