Generate all of the "missing" months in data

We have a database that's recently been converted from SQL Server 2005 to Oracle 11.2.0.3.

There's a table that contains inventory usage statistics by month.  It's keyed on the part number and the period:

Both are CHAR:

STOCK_CODE CHAR(9)
ACCT_PER CHAR (6) -- Format is YYYYMM

There are a bunch of other columns in the table tracking total of orders, issues, prices etc, but they aren't particularly relevant.  There is one row per item per month where the the item actually has usage.

There was a query used in SQL Server where the developer performed a cross join - joining the table to itself to effectively generate a list of periods - which was then used as a part of a larger query in order to give rolling totals for each month for each inventory item.

The query looked like this:

SELECT DISTINCT A.STOCK_CODE
      , CAST((B.ACCT_PER + '01') AS DATETIME) AS ACCT_PER
FROM INVENT_STATS A 
CROSS JOIN INVENT_STATS B

Open in new window


It ran pretty quickly, scanning 900,000 rows pretty darn quick.  Looking at the actual execution plan, it doesn't seem to need to sort anything because the clustered index returns the rows in order, joins them together and returns the distinct list of stock codes and periods.

In Oracle, because there's no such thing as a clustered index, the explain plan is telling me that it's sorting, which is probably why it runs and runs and runs.  Considering that a cross join is a.num_rows * b.num_rows, that's a lot of sorting it needs to do on a 900K row table.

The goal is that if we have this data in INVENT_STATS:

STOCK_CODE  ACCT_PER
ITEM1        201302
ITEM1        201307
ITEM1        201310

Open in new window


What we get is one row returned for each month of the year (working under the assumption that every month at least something out of inventory is going to be ordered or used somewhere, this generating at least one row per month for at least something in inventory.

That then feeds other parts of reports so that they can get rolling report values on amounts and quantities issued, ordered etc.

I believe that the answer lies somewhere in the use of SELECT ... OVER (PARTITION BY ) in order to get what we need but we can't quite hit the syntax and hoping for some help from here.

(I am still searching EE and other sites for answers, but wanted to get this posted to help expedite a solution for the developer).
LVL 23
Steve WalesSenior Database AdministratorAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

sdstuberCommented:
In Oracle, because there's no such thing as a clustered index

Not by that name, but conceptually same idea - A table with a clustered index in sql server would be analagous to an index-organized table in oracle.

However if you have indexes on the two columns whether organized (clustered) by one of those indexes or not you could probably do something like this fairly effeciently.

SELECT DISTINCT A.STOCK_CODE
      ,to_date(B.ACCT_PER,'yyyymm') AS ACCT_PER
FROM INVENT_STATS A
CROSS JOIN INVENT_STATS B

A single index on both columns could work too but it would be skip scan for one use.
0
Steve WalesSenior Database AdministratorAuthor Commented:
Can't use an IOT, it's a third party solution and modification of the DB is not an option (but yes, I'd forgotten about them).

That query is pretty much what we've started with and it's taking forever.  As mentioned in the question, that seems to be doing an awful lot of work

Output from explain plan:

--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |  1017K|    93M|       |  3728M  (1)|999:59:59 |
|   1 |  HASH UNIQUE           |           |  1017K|    93M|    18T|  3728M  (1)|999:59:59 |
|   2 |   MERGE JOIN CARTESIAN |           |   193G|    16T|       |   726M  (1)|999:59:59 |
|*  3 |    INDEX FAST FULL SCAN| ISTATS_PK |   220K|    14M|       |  1123   (2)| 00:00:14 |
|   4 |    BUFFER SORT         |           |   880K|    20M|       |   726M  (1)|999:59:59 |
|   5 |     TABLE ACCESS FULL  | ISTATS    |   880K|    20M|       |  3301   (1)| 00:00:40 |
--------------------------------------------------------------------------------------------

Open in new window


That MERGE JOIN CARTESIAN with its 16T bytes  seems to be what's taking the time.

I can get a SELECT DISTINCT ACCT_PER back to me with a complete list of the accounting periods in about .20 of a second - which is why I'm thinking that some form of analytical function joining that result set is going to get me the answers I need in an acceptable timeframe.
0
sdstuberCommented:
Using a table of 5,111,100 rows (511111 stock codes spread across 100 months)

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name            | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                 |        |       |       |   139G(100)|          |       |       |          |
|   1 |  MERGE JOIN CARTESIAN    |                 |     30T|  2029T|       |   139G  (1)|999:59:59 |       |       |          |
|   2 |   VIEW                   |                 |   5566K|   350M|       | 95459   (1)| 00:19:06 |       |       |          |
|   3 |    HASH UNIQUE           |                 |   5566K|   350M|   384M| 95459   (1)| 00:19:06 |  2505K|  1055K|     1/0/0|
|   4 |     INDEX FAST FULL SCAN | PK_INVENT_STATS |   5566K|   350M|       |  8033   (1)| 00:01:37 |       |       |          |
|   5 |   BUFFER SORT            |                 |   5566K|    31M|       |   139G  (1)|999:59:59 |  4096 |  4096 |     1/0/0|
|   6 |    VIEW                  |                 |   5566K|    31M|       | 25029   (1)| 00:05:01 |       |       |          |
|   7 |     HASH UNIQUE          |                 |   5566K|    26M|    63M| 25029   (1)| 00:05:01 |   936K|   936K|     1/0/0|
|   8 |      INDEX FAST FULL SCAN| PK_INVENT_STATS |   5566K|    26M|       |  8033   (1)| 00:01:37 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------

Open in new window


Only took a few seconds on old pc acting as a db server Pentium(R) D CPU 3.40GHz with 2Gb of memory to execute the query.

Including time to pump the 5 million rows to my pc from that old thing and render them on my screen (with lots of scrolling), about 30 seconds.

SELECT *
  FROM (SELECT DISTINCT stock_code FROM invent_stats)
       CROSS JOIN (SELECT DISTINCT TO_DATE(acct_per, 'yyyymm') AS acct_per FROM invent_stats);

Note, I set this up to be intentionally abusive.  Every stock_code occurs in every month of my test set.
So this was a really, really expensive way to just do "select stock_code,acct_per from invent_stats"
I'm assuming with less overall data and a less fully populated history yours will be faster.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
sdstuberCommented:
This was my test bed...
I just used a distinct subset of dba_objects  to give me a bunch of "stock_code" values along with other stuff just to give my table some bulk.
Join in 100 months to give me some history, then created the pk.

I think that roughly mirrors the structure you've described, except it should be bigger and more painful.

CREATE TABLE invent_stats
AS
    SELECT owner,
           stock_code,
           subobject_name,
           object_id,
           data_object_id,
           object_type,
           created,
           last_ddl_time,
           timestamp,
           status,
           temporary,
           generated,
           secondary,
           namespace,
           edition_name,
           acct_per
      FROM (SELECT *
              FROM (SELECT owner,
                           object_name stock_code,
                           subobject_name,
                           object_id,
                           data_object_id,
                           object_type,
                           created,
                           last_ddl_time,
                           timestamp,
                           status,
                           temporary,
                           generated,
                           secondary,
                           namespace,
                           edition_name,
                           ROW_NUMBER() OVER(PARTITION BY object_name ORDER BY object_id) rn
                      FROM dba_objects o)
             WHERE rn = 1),
           (    SELECT TO_CHAR(ADD_MONTHS(SYSDATE, -LEVEL), 'yyyymm') acct_per
                  FROM DUAL
            CONNECT BY LEVEL <= 100);

ALTER TABLE invent_stats ADD
CONSTRAINT pk_invent_stats
 PRIMARY KEY (stock_code, acct_per);

Open in new window

0
Steve WalesSenior Database AdministratorAuthor Commented:
Thanks sdstuber, that is exactly what I needed (and I had just come to the same conclusion myself - I was in the process of typing in my reply when the email about your reply came in).

I had come up with the following too:

select stock_code, acct_per
from ( select distinct stock_code from invent_stats),
     ( select distinct acct_per from invent_stats)

Open in new window


The way the data is distributed, instead of doing 900K x 900K rows in a cross join, it's doing 25000 x 250 and is generating the correct result set in practically no time.

The explain plan from my query looks exactly like yours too (at least from steps taken, rows and costs, of course, are different).

Thanks again for your time and effort.

Oh and for the sake of completeness of the answer archive for EE, I used the following question from asktom as the basis for coming up with my query:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8912311513313
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Oracle Database

From novice to tech pro — start learning today.