Solved

Generate all of the "missing" months in data

Posted on 2013-11-22
5
563 Views
Last Modified: 2013-11-22
We have a database that's recently been converted from SQL Server 2005 to Oracle 11.2.0.3.

There's a table that contains inventory usage statistics by month.  It's keyed on the part number and the period:

Both are CHAR:

STOCK_CODE CHAR(9)
ACCT_PER CHAR (6) -- Format is YYYYMM

There are a bunch of other columns in the table tracking total of orders, issues, prices etc, but they aren't particularly relevant.  There is one row per item per month where the the item actually has usage.

There was a query used in SQL Server where the developer performed a cross join - joining the table to itself to effectively generate a list of periods - which was then used as a part of a larger query in order to give rolling totals for each month for each inventory item.

The query looked like this:

SELECT DISTINCT A.STOCK_CODE
      , CAST((B.ACCT_PER + '01') AS DATETIME) AS ACCT_PER
FROM INVENT_STATS A 
CROSS JOIN INVENT_STATS B

Open in new window


It ran pretty quickly, scanning 900,000 rows pretty darn quick.  Looking at the actual execution plan, it doesn't seem to need to sort anything because the clustered index returns the rows in order, joins them together and returns the distinct list of stock codes and periods.

In Oracle, because there's no such thing as a clustered index, the explain plan is telling me that it's sorting, which is probably why it runs and runs and runs.  Considering that a cross join is a.num_rows * b.num_rows, that's a lot of sorting it needs to do on a 900K row table.

The goal is that if we have this data in INVENT_STATS:

STOCK_CODE  ACCT_PER
ITEM1        201302
ITEM1        201307
ITEM1        201310

Open in new window


What we get is one row returned for each month of the year (working under the assumption that every month at least something out of inventory is going to be ordered or used somewhere, this generating at least one row per month for at least something in inventory.

That then feeds other parts of reports so that they can get rolling report values on amounts and quantities issued, ordered etc.

I believe that the answer lies somewhere in the use of SELECT ... OVER (PARTITION BY ) in order to get what we need but we can't quite hit the syntax and hoping for some help from here.

(I am still searching EE and other sites for answers, but wanted to get this posted to help expedite a solution for the developer).
0
Comment
Question by:Steve Wales
  • 3
  • 2
5 Comments
 
LVL 73

Expert Comment

by:sdstuber
ID: 39670179
In Oracle, because there's no such thing as a clustered index

Not by that name, but conceptually same idea - A table with a clustered index in sql server would be analagous to an index-organized table in oracle.

However if you have indexes on the two columns whether organized (clustered) by one of those indexes or not you could probably do something like this fairly effeciently.

SELECT DISTINCT A.STOCK_CODE
      ,to_date(B.ACCT_PER,'yyyymm') AS ACCT_PER
FROM INVENT_STATS A
CROSS JOIN INVENT_STATS B

A single index on both columns could work too but it would be skip scan for one use.
0
 
LVL 22

Author Comment

by:Steve Wales
ID: 39670212
Can't use an IOT, it's a third party solution and modification of the DB is not an option (but yes, I'd forgotten about them).

That query is pretty much what we've started with and it's taking forever.  As mentioned in the question, that seems to be doing an awful lot of work

Output from explain plan:

--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |  1017K|    93M|       |  3728M  (1)|999:59:59 |
|   1 |  HASH UNIQUE           |           |  1017K|    93M|    18T|  3728M  (1)|999:59:59 |
|   2 |   MERGE JOIN CARTESIAN |           |   193G|    16T|       |   726M  (1)|999:59:59 |
|*  3 |    INDEX FAST FULL SCAN| ISTATS_PK |   220K|    14M|       |  1123   (2)| 00:00:14 |
|   4 |    BUFFER SORT         |           |   880K|    20M|       |   726M  (1)|999:59:59 |
|   5 |     TABLE ACCESS FULL  | ISTATS    |   880K|    20M|       |  3301   (1)| 00:00:40 |
--------------------------------------------------------------------------------------------

Open in new window


That MERGE JOIN CARTESIAN with its 16T bytes  seems to be what's taking the time.

I can get a SELECT DISTINCT ACCT_PER back to me with a complete list of the accounting periods in about .20 of a second - which is why I'm thinking that some form of analytical function joining that result set is going to get me the answers I need in an acceptable timeframe.
0
 
LVL 73

Accepted Solution

by:
sdstuber earned 500 total points
ID: 39670324
Using a table of 5,111,100 rows (511111 stock codes spread across 100 months)

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name            | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                 |        |       |       |   139G(100)|          |       |       |          |
|   1 |  MERGE JOIN CARTESIAN    |                 |     30T|  2029T|       |   139G  (1)|999:59:59 |       |       |          |
|   2 |   VIEW                   |                 |   5566K|   350M|       | 95459   (1)| 00:19:06 |       |       |          |
|   3 |    HASH UNIQUE           |                 |   5566K|   350M|   384M| 95459   (1)| 00:19:06 |  2505K|  1055K|     1/0/0|
|   4 |     INDEX FAST FULL SCAN | PK_INVENT_STATS |   5566K|   350M|       |  8033   (1)| 00:01:37 |       |       |          |
|   5 |   BUFFER SORT            |                 |   5566K|    31M|       |   139G  (1)|999:59:59 |  4096 |  4096 |     1/0/0|
|   6 |    VIEW                  |                 |   5566K|    31M|       | 25029   (1)| 00:05:01 |       |       |          |
|   7 |     HASH UNIQUE          |                 |   5566K|    26M|    63M| 25029   (1)| 00:05:01 |   936K|   936K|     1/0/0|
|   8 |      INDEX FAST FULL SCAN| PK_INVENT_STATS |   5566K|    26M|       |  8033   (1)| 00:01:37 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------

Open in new window


Only took a few seconds on old pc acting as a db server Pentium(R) D CPU 3.40GHz with 2Gb of memory to execute the query.

Including time to pump the 5 million rows to my pc from that old thing and render them on my screen (with lots of scrolling), about 30 seconds.

SELECT *
  FROM (SELECT DISTINCT stock_code FROM invent_stats)
       CROSS JOIN (SELECT DISTINCT TO_DATE(acct_per, 'yyyymm') AS acct_per FROM invent_stats);

Note, I set this up to be intentionally abusive.  Every stock_code occurs in every month of my test set.
So this was a really, really expensive way to just do "select stock_code,acct_per from invent_stats"
I'm assuming with less overall data and a less fully populated history yours will be faster.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39670330
This was my test bed...
I just used a distinct subset of dba_objects  to give me a bunch of "stock_code" values along with other stuff just to give my table some bulk.
Join in 100 months to give me some history, then created the pk.

I think that roughly mirrors the structure you've described, except it should be bigger and more painful.

CREATE TABLE invent_stats
AS
    SELECT owner,
           stock_code,
           subobject_name,
           object_id,
           data_object_id,
           object_type,
           created,
           last_ddl_time,
           timestamp,
           status,
           temporary,
           generated,
           secondary,
           namespace,
           edition_name,
           acct_per
      FROM (SELECT *
              FROM (SELECT owner,
                           object_name stock_code,
                           subobject_name,
                           object_id,
                           data_object_id,
                           object_type,
                           created,
                           last_ddl_time,
                           timestamp,
                           status,
                           temporary,
                           generated,
                           secondary,
                           namespace,
                           edition_name,
                           ROW_NUMBER() OVER(PARTITION BY object_name ORDER BY object_id) rn
                      FROM dba_objects o)
             WHERE rn = 1),
           (    SELECT TO_CHAR(ADD_MONTHS(SYSDATE, -LEVEL), 'yyyymm') acct_per
                  FROM DUAL
            CONNECT BY LEVEL <= 100);

ALTER TABLE invent_stats ADD
CONSTRAINT pk_invent_stats
 PRIMARY KEY (stock_code, acct_per);

Open in new window

0
 
LVL 22

Author Closing Comment

by:Steve Wales
ID: 39670336
Thanks sdstuber, that is exactly what I needed (and I had just come to the same conclusion myself - I was in the process of typing in my reply when the email about your reply came in).

I had come up with the following too:

select stock_code, acct_per
from ( select distinct stock_code from invent_stats),
     ( select distinct acct_per from invent_stats)

Open in new window


The way the data is distributed, instead of doing 900K x 900K rows in a cross join, it's doing 25000 x 250 and is generating the correct result set in practically no time.

The explain plan from my query looks exactly like yours too (at least from steps taken, rows and costs, of course, are different).

Thanks again for your time and effort.

Oh and for the sake of completeness of the answer archive for EE, I used the following question from asktom as the basis for coming up with my query:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8912311513313
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Suggested Solutions

Confronted with some SQL you don't know can be a daunting task. It can be even more daunting if that SQL carries some of the old secret codes used in the Ye Olde query syntax, such as: (+)     as used in Oracle;     *=     =*    as used in Sybase …
Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
This video shows setup options and the basic steps and syntax for duplicating (cloning) a database from one instance to another. Examples are given for duplicating to the same machine and to different machines

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now