Getting missing rows of data without burying the servers and a decent completion time

We are trying to get "missing rows" from our fact table and, in the end, join them to our dimension table.

Trying the approach here: https://www.experts-exchange.com/questions/27968510/Oracle-11g-How-to-insert-missing-rows-in-to-a-sequence-based-on-a-date-and-a-group.html
proved way too resource intensive and for a single row takes over an hour to return.

The fact table has 12 Million records and the dimension table has 1200 or so.  The issue we are facing is that there are not enough dimensions in the dimension table to order by it alone and use a LEAD/LAG approach to fill in the missing data with an outer join.

The fact table has orders in it, by date and interval.  One date has to have 24 intervals (hourly) for the dimension table to lookup.  So 21 = X, 20 = Y, etc.

Without having the fact table populated from a vendor API with all 24 periods of time (intervals), we cannot get a full set of data which is causing issues downstream.

We always need every interval for every day, although, some intervals come in at 96 (fifteen minute periods of time).  This is where we need to get the data back with a full 24 or 96 records, join to the dimension table and get the accurate flag associated with the record.

Same issue as described above in the link, just a small twist on it.

Any suggestions would be greatly appreciated.

Thanks
LVL 1
cyimxtckAsked:
Who is Participating?
 
SujithData ArchitectCommented:
Are you looking for something like this?

SQL> select * from dim;

SYSTEM               LOCATION             BRANCH               FLAG                   INTERVAL
-------------------- -------------------- -------------------- -------------------- ----------
CA                   M5                   27                   X                             1
CA                   M5                   27                   Y                             2
CA                   M5                   27                   Z                             3
CA                   M5                   27                   A                             4
CA                   M5                   27                   B                             5

SQL>
SQL> select * from fact;

SYSTEM               LOCATION             BRANCH                 INTERVAL INCEPTION
-------------------- -------------------- -------------------- ---------- ---------
CA                   M5                   27                            1 22-APR-18
CA                   M5                   27                            3 22-APR-18
CA                   M5                   27                            4 22-APR-18
CA                   M5                   27                            5 22-APR-18

SQL>
SQL> with data as (
  2  select  distinct d.system, d.location, d.branch, d.interval, f.inception_date
  3  from    dim d inner join fact f on f.system = d.system and f.location = d.location and f.branch = d.branch
  4  )
  5  select
  6  d.system, d.location, d.branch, d.interval, d.inception_date
  7  from    data d left outer join fact f
  8  on f.system = d.system and f.location = d.location and f.branch = d.branch
  9  and f.interval = d.interval and f.inception_date = d.inception_date
 10  ;

SYSTEM               LOCATION             BRANCH                 INTERVAL INCEPTION
-------------------- -------------------- -------------------- ---------- ---------
CA                   M5                   27                            1 22-APR-18
CA                   M5                   27                            3 22-APR-18
CA                   M5                   27                            4 22-APR-18
CA                   M5                   27                            5 22-APR-18
CA                   M5                   27                            2 22-APR-18

SQL>

Open in new window

0
 
SujithData ArchitectCommented:
@cyimxtck

Can you elaborate on this?
>> We always need every interval for every day, although, some intervals come in at 96 (fifteen minute periods of time).  This is where we need to get the data back with a full 24 or 96 records, join to the dimension table and get the accurate flag associated with the record.

Do you have dimension entries for 24/96 intervals and you tag the IDs to the fact table during the fact load?
What intervals do you expect your output to be? 24 or 96?
0
 
cyimxtckAuthor Commented:
The dimension table has all entries to join to 100% of the time by location.  So we can get to the location and the branch per location, but the date becomes critical to order the data.  It is null in the outer join scenario because the interval is missing so there isn't a way to ORDER by.

Some locations have 24, some have 96 and some have both.  Each location has a branch associated with it.  (i.e. California has both but Pennsylvania doesn't)  So the output is purely dependent upon the location/branch combination.

Does that answer your question?

The branch/location combination never changes; if you have 24 for this combination, it is 24 forever
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
SujithData ArchitectCommented:
Can you post some sample records from Fact/Dimensions and the expected output?

Do you have a date dimension?
0
 
cyimxtckAuthor Commented:
We don't have a date in the dimension table, only the fact table.

The dimension table looks like this:
SYSTEM | Location | Branch | Flag  | Interval
CA              M5             27            X            1
CA              M5             27            Y            2
CA              M5             27            Z            3
CA              M5             27            A            4
CA              M5             27            B            5
...
CA              M5             27            ab            96
CA              SMD          25            aD            1
CA              SMD          25            aF            2
CA              SMD          25            De            3
...
CA              SMD          25            Xq            24
PA              PJ               1251        Z              1
...
NJ               CI                1008       A              2
...

And the Fact table has these related
SYSTEM | Location | Branch | Interval | Inception_Date
CA              M5             27               1              22APR2018
CA              M5             27               3              22APR2018
CA              M5             27               4              22APR2018
CA              M5             27               5              22APR2018

In the example above, we need interval 2, which doesn't exist for that date.  Therein lies our issue.

Many location without the correct number of intervals, but we need to fill those in.  It could be missing 20 or 1 or 5 or none at all.

Please let me know if that is sufficient
0
 
cyimxtckAuthor Commented:
Sorry, didn't post what we need output:

Existing output (missing row)
SYSTEM | Location | Branch | Interval | Inception_Date | Flag
CA              M5             27               1              22APR2018         X
CA              M5             27               3              22APR2018         Z
CA              M5             27               4              22APR2018         A
CA              M5             27               5              22APR2018         B

desired output:
SYSTEM | Location | Branch | Interval | Inception_Date | Flag
CA              M5             27               1              22APR2018         X
CA              M5             27               2              22APR2018         Y
CA              M5             27               3              22APR2018         Z
CA              M5             27               4              22APR2018         A
CA              M5             27               5              22APR2018         B
0
 
SujithData ArchitectCommented:
Do you need any measures from the fact table in the output? or only the listed columns?
0
 
cyimxtckAuthor Commented:
We do need a couple more columns from the fact table I believe; I omitted those for brevity.
0
 
cyimxtckAuthor Commented:
That is exactly what I am looking for.

Let me run this against our test environment and see what time it takes.

Thanks for the help
0
 
cyimxtckAuthor Commented:
AWESOME!  That worked super well, thanks a ton!
0
 
SujithData ArchitectCommented:
Glad to know!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.