Steve Wales
asked on
Generate all of the "missing" months in data
We have a database that's recently been converted from SQL Server 2005 to Oracle 11.2.0.3.
There's a table that contains inventory usage statistics by month. It's keyed on the part number and the period:
Both are CHAR:
STOCK_CODE CHAR(9)
ACCT_PER CHAR (6) -- Format is YYYYMM
There are a bunch of other columns in the table tracking total of orders, issues, prices etc, but they aren't particularly relevant. There is one row per item per month where the the item actually has usage.
There was a query used in SQL Server where the developer performed a cross join - joining the table to itself to effectively generate a list of periods - which was then used as a part of a larger query in order to give rolling totals for each month for each inventory item.
The query looked like this:
It ran pretty quickly, scanning 900,000 rows pretty darn quick. Looking at the actual execution plan, it doesn't seem to need to sort anything because the clustered index returns the rows in order, joins them together and returns the distinct list of stock codes and periods.
In Oracle, because there's no such thing as a clustered index, the explain plan is telling me that it's sorting, which is probably why it runs and runs and runs. Considering that a cross join is a.num_rows * b.num_rows, that's a lot of sorting it needs to do on a 900K row table.
The goal is that if we have this data in INVENT_STATS:
What we get is one row returned for each month of the year (working under the assumption that every month at least something out of inventory is going to be ordered or used somewhere, this generating at least one row per month for at least something in inventory.
That then feeds other parts of reports so that they can get rolling report values on amounts and quantities issued, ordered etc.
I believe that the answer lies somewhere in the use of SELECT ... OVER (PARTITION BY ) in order to get what we need but we can't quite hit the syntax and hoping for some help from here.
(I am still searching EE and other sites for answers, but wanted to get this posted to help expedite a solution for the developer).
There's a table that contains inventory usage statistics by month. It's keyed on the part number and the period:
Both are CHAR:
STOCK_CODE CHAR(9)
ACCT_PER CHAR (6) -- Format is YYYYMM
There are a bunch of other columns in the table tracking total of orders, issues, prices etc, but they aren't particularly relevant. There is one row per item per month where the the item actually has usage.
There was a query used in SQL Server where the developer performed a cross join - joining the table to itself to effectively generate a list of periods - which was then used as a part of a larger query in order to give rolling totals for each month for each inventory item.
The query looked like this:
SELECT DISTINCT A.STOCK_CODE
, CAST((B.ACCT_PER + '01') AS DATETIME) AS ACCT_PER
FROM INVENT_STATS A
CROSS JOIN INVENT_STATS B
It ran pretty quickly, scanning 900,000 rows pretty darn quick. Looking at the actual execution plan, it doesn't seem to need to sort anything because the clustered index returns the rows in order, joins them together and returns the distinct list of stock codes and periods.
In Oracle, because there's no such thing as a clustered index, the explain plan is telling me that it's sorting, which is probably why it runs and runs and runs. Considering that a cross join is a.num_rows * b.num_rows, that's a lot of sorting it needs to do on a 900K row table.
The goal is that if we have this data in INVENT_STATS:
STOCK_CODE ACCT_PER
ITEM1 201302
ITEM1 201307
ITEM1 201310
What we get is one row returned for each month of the year (working under the assumption that every month at least something out of inventory is going to be ordered or used somewhere, this generating at least one row per month for at least something in inventory.
That then feeds other parts of reports so that they can get rolling report values on amounts and quantities issued, ordered etc.
I believe that the answer lies somewhere in the use of SELECT ... OVER (PARTITION BY ) in order to get what we need but we can't quite hit the syntax and hoping for some help from here.
(I am still searching EE and other sites for answers, but wanted to get this posted to help expedite a solution for the developer).
ASKER
Can't use an IOT, it's a third party solution and modification of the DB is not an option (but yes, I'd forgotten about them).
That query is pretty much what we've started with and it's taking forever. As mentioned in the question, that seems to be doing an awful lot of work
Output from explain plan:
That MERGE JOIN CARTESIAN with its 16T bytes seems to be what's taking the time.
I can get a SELECT DISTINCT ACCT_PER back to me with a complete list of the accounting periods in about .20 of a second - which is why I'm thinking that some form of analytical function joining that result set is going to get me the answers I need in an acceptable timeframe.
That query is pretty much what we've started with and it's taking forever. As mentioned in the question, that seems to be doing an awful lot of work
Output from explain plan:
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1017K| 93M| | 3728M (1)|999:59:59 |
| 1 | HASH UNIQUE | | 1017K| 93M| 18T| 3728M (1)|999:59:59 |
| 2 | MERGE JOIN CARTESIAN | | 193G| 16T| | 726M (1)|999:59:59 |
|* 3 | INDEX FAST FULL SCAN| ISTATS_PK | 220K| 14M| | 1123 (2)| 00:00:14 |
| 4 | BUFFER SORT | | 880K| 20M| | 726M (1)|999:59:59 |
| 5 | TABLE ACCESS FULL | ISTATS | 880K| 20M| | 3301 (1)| 00:00:40 |
--------------------------------------------------------------------------------------------
That MERGE JOIN CARTESIAN with its 16T bytes seems to be what's taking the time.
I can get a SELECT DISTINCT ACCT_PER back to me with a complete list of the accounting periods in about .20 of a second - which is why I'm thinking that some form of analytical function joining that result set is going to get me the answers I need in an acceptable timeframe.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
This was my test bed...
I just used a distinct subset of dba_objects to give me a bunch of "stock_code" values along with other stuff just to give my table some bulk.
Join in 100 months to give me some history, then created the pk.
I think that roughly mirrors the structure you've described, except it should be bigger and more painful.
I just used a distinct subset of dba_objects to give me a bunch of "stock_code" values along with other stuff just to give my table some bulk.
Join in 100 months to give me some history, then created the pk.
I think that roughly mirrors the structure you've described, except it should be bigger and more painful.
CREATE TABLE invent_stats
AS
SELECT owner,
stock_code,
subobject_name,
object_id,
data_object_id,
object_type,
created,
last_ddl_time,
timestamp,
status,
temporary,
generated,
secondary,
namespace,
edition_name,
acct_per
FROM (SELECT *
FROM (SELECT owner,
object_name stock_code,
subobject_name,
object_id,
data_object_id,
object_type,
created,
last_ddl_time,
timestamp,
status,
temporary,
generated,
secondary,
namespace,
edition_name,
ROW_NUMBER() OVER(PARTITION BY object_name ORDER BY object_id) rn
FROM dba_objects o)
WHERE rn = 1),
( SELECT TO_CHAR(ADD_MONTHS(SYSDATE, -LEVEL), 'yyyymm') acct_per
FROM DUAL
CONNECT BY LEVEL <= 100);
ALTER TABLE invent_stats ADD
CONSTRAINT pk_invent_stats
PRIMARY KEY (stock_code, acct_per);
ASKER
Thanks sdstuber, that is exactly what I needed (and I had just come to the same conclusion myself - I was in the process of typing in my reply when the email about your reply came in).
I had come up with the following too:
The way the data is distributed, instead of doing 900K x 900K rows in a cross join, it's doing 25000 x 250 and is generating the correct result set in practically no time.
The explain plan from my query looks exactly like yours too (at least from steps taken, rows and costs, of course, are different).
Thanks again for your time and effort.
Oh and for the sake of completeness of the answer archive for EE, I used the following question from asktom as the basis for coming up with my query:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8912311513313
I had come up with the following too:
select stock_code, acct_per
from ( select distinct stock_code from invent_stats),
( select distinct acct_per from invent_stats)
The way the data is distributed, instead of doing 900K x 900K rows in a cross join, it's doing 25000 x 250 and is generating the correct result set in practically no time.
The explain plan from my query looks exactly like yours too (at least from steps taken, rows and costs, of course, are different).
Thanks again for your time and effort.
Oh and for the sake of completeness of the answer archive for EE, I used the following question from asktom as the basis for coming up with my query:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8912311513313
Not by that name, but conceptually same idea - A table with a clustered index in sql server would be analagous to an index-organized table in oracle.
However if you have indexes on the two columns whether organized (clustered) by one of those indexes or not you could probably do something like this fairly effeciently.
SELECT DISTINCT A.STOCK_CODE
,to_date(B.ACCT_PER,'yyyym
FROM INVENT_STATS A
CROSS JOIN INVENT_STATS B
A single index on both columns could work too but it would be skip scan for one use.