?
Solved

Generate all of the "missing" months in data

Posted on 2013-11-22
5
Medium Priority
?
576 Views
Last Modified: 2013-11-22
We have a database that's recently been converted from SQL Server 2005 to Oracle 11.2.0.3.

There's a table that contains inventory usage statistics by month.  It's keyed on the part number and the period:

Both are CHAR:

STOCK_CODE CHAR(9)
ACCT_PER CHAR (6) -- Format is YYYYMM

There are a bunch of other columns in the table tracking total of orders, issues, prices etc, but they aren't particularly relevant.  There is one row per item per month where the the item actually has usage.

There was a query used in SQL Server where the developer performed a cross join - joining the table to itself to effectively generate a list of periods - which was then used as a part of a larger query in order to give rolling totals for each month for each inventory item.

The query looked like this:

SELECT DISTINCT A.STOCK_CODE
      , CAST((B.ACCT_PER + '01') AS DATETIME) AS ACCT_PER
FROM INVENT_STATS A 
CROSS JOIN INVENT_STATS B

Open in new window


It ran pretty quickly, scanning 900,000 rows pretty darn quick.  Looking at the actual execution plan, it doesn't seem to need to sort anything because the clustered index returns the rows in order, joins them together and returns the distinct list of stock codes and periods.

In Oracle, because there's no such thing as a clustered index, the explain plan is telling me that it's sorting, which is probably why it runs and runs and runs.  Considering that a cross join is a.num_rows * b.num_rows, that's a lot of sorting it needs to do on a 900K row table.

The goal is that if we have this data in INVENT_STATS:

STOCK_CODE  ACCT_PER
ITEM1        201302
ITEM1        201307
ITEM1        201310

Open in new window


What we get is one row returned for each month of the year (working under the assumption that every month at least something out of inventory is going to be ordered or used somewhere, this generating at least one row per month for at least something in inventory.

That then feeds other parts of reports so that they can get rolling report values on amounts and quantities issued, ordered etc.

I believe that the answer lies somewhere in the use of SELECT ... OVER (PARTITION BY ) in order to get what we need but we can't quite hit the syntax and hoping for some help from here.

(I am still searching EE and other sites for answers, but wanted to get this posted to help expedite a solution for the developer).
0
Comment
Question by:Steve Wales
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 74

Expert Comment

by:sdstuber
ID: 39670179
In Oracle, because there's no such thing as a clustered index

Not by that name, but conceptually same idea - A table with a clustered index in sql server would be analagous to an index-organized table in oracle.

However if you have indexes on the two columns whether organized (clustered) by one of those indexes or not you could probably do something like this fairly effeciently.

SELECT DISTINCT A.STOCK_CODE
      ,to_date(B.ACCT_PER,'yyyymm') AS ACCT_PER
FROM INVENT_STATS A
CROSS JOIN INVENT_STATS B

A single index on both columns could work too but it would be skip scan for one use.
0
 
LVL 22

Author Comment

by:Steve Wales
ID: 39670212
Can't use an IOT, it's a third party solution and modification of the DB is not an option (but yes, I'd forgotten about them).

That query is pretty much what we've started with and it's taking forever.  As mentioned in the question, that seems to be doing an awful lot of work

Output from explain plan:

--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |  1017K|    93M|       |  3728M  (1)|999:59:59 |
|   1 |  HASH UNIQUE           |           |  1017K|    93M|    18T|  3728M  (1)|999:59:59 |
|   2 |   MERGE JOIN CARTESIAN |           |   193G|    16T|       |   726M  (1)|999:59:59 |
|*  3 |    INDEX FAST FULL SCAN| ISTATS_PK |   220K|    14M|       |  1123   (2)| 00:00:14 |
|   4 |    BUFFER SORT         |           |   880K|    20M|       |   726M  (1)|999:59:59 |
|   5 |     TABLE ACCESS FULL  | ISTATS    |   880K|    20M|       |  3301   (1)| 00:00:40 |
--------------------------------------------------------------------------------------------

Open in new window


That MERGE JOIN CARTESIAN with its 16T bytes  seems to be what's taking the time.

I can get a SELECT DISTINCT ACCT_PER back to me with a complete list of the accounting periods in about .20 of a second - which is why I'm thinking that some form of analytical function joining that result set is going to get me the answers I need in an acceptable timeframe.
0
 
LVL 74

Accepted Solution

by:
sdstuber earned 2000 total points
ID: 39670324
Using a table of 5,111,100 rows (511111 stock codes spread across 100 months)

--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name            | E-Rows |E-Bytes|E-Temp | Cost (%CPU)| E-Time   |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                 |        |       |       |   139G(100)|          |       |       |          |
|   1 |  MERGE JOIN CARTESIAN    |                 |     30T|  2029T|       |   139G  (1)|999:59:59 |       |       |          |
|   2 |   VIEW                   |                 |   5566K|   350M|       | 95459   (1)| 00:19:06 |       |       |          |
|   3 |    HASH UNIQUE           |                 |   5566K|   350M|   384M| 95459   (1)| 00:19:06 |  2505K|  1055K|     1/0/0|
|   4 |     INDEX FAST FULL SCAN | PK_INVENT_STATS |   5566K|   350M|       |  8033   (1)| 00:01:37 |       |       |          |
|   5 |   BUFFER SORT            |                 |   5566K|    31M|       |   139G  (1)|999:59:59 |  4096 |  4096 |     1/0/0|
|   6 |    VIEW                  |                 |   5566K|    31M|       | 25029   (1)| 00:05:01 |       |       |          |
|   7 |     HASH UNIQUE          |                 |   5566K|    26M|    63M| 25029   (1)| 00:05:01 |   936K|   936K|     1/0/0|
|   8 |      INDEX FAST FULL SCAN| PK_INVENT_STATS |   5566K|    26M|       |  8033   (1)| 00:01:37 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------

Open in new window


Only took a few seconds on old pc acting as a db server Pentium(R) D CPU 3.40GHz with 2Gb of memory to execute the query.

Including time to pump the 5 million rows to my pc from that old thing and render them on my screen (with lots of scrolling), about 30 seconds.

SELECT *
  FROM (SELECT DISTINCT stock_code FROM invent_stats)
       CROSS JOIN (SELECT DISTINCT TO_DATE(acct_per, 'yyyymm') AS acct_per FROM invent_stats);

Note, I set this up to be intentionally abusive.  Every stock_code occurs in every month of my test set.
So this was a really, really expensive way to just do "select stock_code,acct_per from invent_stats"
I'm assuming with less overall data and a less fully populated history yours will be faster.
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39670330
This was my test bed...
I just used a distinct subset of dba_objects  to give me a bunch of "stock_code" values along with other stuff just to give my table some bulk.
Join in 100 months to give me some history, then created the pk.

I think that roughly mirrors the structure you've described, except it should be bigger and more painful.

CREATE TABLE invent_stats
AS
    SELECT owner,
           stock_code,
           subobject_name,
           object_id,
           data_object_id,
           object_type,
           created,
           last_ddl_time,
           timestamp,
           status,
           temporary,
           generated,
           secondary,
           namespace,
           edition_name,
           acct_per
      FROM (SELECT *
              FROM (SELECT owner,
                           object_name stock_code,
                           subobject_name,
                           object_id,
                           data_object_id,
                           object_type,
                           created,
                           last_ddl_time,
                           timestamp,
                           status,
                           temporary,
                           generated,
                           secondary,
                           namespace,
                           edition_name,
                           ROW_NUMBER() OVER(PARTITION BY object_name ORDER BY object_id) rn
                      FROM dba_objects o)
             WHERE rn = 1),
           (    SELECT TO_CHAR(ADD_MONTHS(SYSDATE, -LEVEL), 'yyyymm') acct_per
                  FROM DUAL
            CONNECT BY LEVEL <= 100);

ALTER TABLE invent_stats ADD
CONSTRAINT pk_invent_stats
 PRIMARY KEY (stock_code, acct_per);

Open in new window

0
 
LVL 22

Author Closing Comment

by:Steve Wales
ID: 39670336
Thanks sdstuber, that is exactly what I needed (and I had just come to the same conclusion myself - I was in the process of typing in my reply when the email about your reply came in).

I had come up with the following too:

select stock_code, acct_per
from ( select distinct stock_code from invent_stats),
     ( select distinct acct_per from invent_stats)

Open in new window


The way the data is distributed, instead of doing 900K x 900K rows in a cross join, it's doing 25000 x 250 and is generating the correct result set in practically no time.

The explain plan from my query looks exactly like yours too (at least from steps taken, rows and costs, of course, are different).

Thanks again for your time and effort.

Oh and for the sake of completeness of the answer archive for EE, I used the following question from asktom as the basis for coming up with my query:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8912311513313
0

Featured Post

Veeam Task Manager for Hyper-V

Task Manager for Hyper-V provides critical information that allows you to monitor Hyper-V performance by displaying real-time views of CPU and memory at the individual VM-level, so you can quickly identify which VMs are using host resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This post first appeared at Oracleinaction  (http://oracleinaction.com/undo-and-redo-in-oracle/)by Anju Garg (Myself). I  will demonstrate that undo for DML’s is stored both in undo tablespace and online redo logs. Then, we will analyze the reaso…
Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
This video shows how to configure and send email from and Oracle database using both UTL_SMTP and UTL_MAIL, as well as comparing UTL_SMTP to a manual SMTP conversation with a mail server.
Suggested Courses

800 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question