Getting missing rows of data without burying the servers and a decent completion time

We are trying to get "missing rows" from our fact table and, in the end, join them to our dimension table.

Trying the approach here: https://www.experts-exchange.com/questions/27968510/Oracle-11g-How-to-insert-missing-rows-in-to-a-sequence-based-on-a-date-and-a-group.html
proved way too resource intensive and for a single row takes over an hour to return.

The fact table has 12 Million records and the dimension table has 1200 or so.  The issue we are facing is that there are not enough dimensions in the dimension table to order by it alone and use a LEAD/LAG approach to fill in the missing data with an outer join.

The fact table has orders in it, by date and interval.  One date has to have 24 intervals (hourly) for the dimension table to lookup.  So 21 = X, 20 = Y, etc.

Without having the fact table populated from a vendor API with all 24 periods of time (intervals), we cannot get a full set of data which is causing issues downstream.

We always need every interval for every day, although, some intervals come in at 96 (fifteen minute periods of time).  This is where we need to get the data back with a full 24 or 96 records, join to the dimension table and get the accurate flag associated with the record.

Same issue as described above in the link, just a small twist on it.

Any suggestions would be greatly appreciated.

Thanks
LVL 1
cyimxtckCEOAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

SujithData ArchitectCommented:
@cyimxtck

Can you elaborate on this?
>> We always need every interval for every day, although, some intervals come in at 96 (fifteen minute periods of time).  This is where we need to get the data back with a full 24 or 96 records, join to the dimension table and get the accurate flag associated with the record.

Do you have dimension entries for 24/96 intervals and you tag the IDs to the fact table during the fact load?
What intervals do you expect your output to be? 24 or 96?
0
cyimxtckCEOAuthor Commented:
The dimension table has all entries to join to 100% of the time by location.  So we can get to the location and the branch per location, but the date becomes critical to order the data.  It is null in the outer join scenario because the interval is missing so there isn't a way to ORDER by.

Some locations have 24, some have 96 and some have both.  Each location has a branch associated with it.  (i.e. California has both but Pennsylvania doesn't)  So the output is purely dependent upon the location/branch combination.

Does that answer your question?

The branch/location combination never changes; if you have 24 for this combination, it is 24 forever
0
SujithData ArchitectCommented:
Can you post some sample records from Fact/Dimensions and the expected output?

Do you have a date dimension?
0
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

cyimxtckCEOAuthor Commented:
We don't have a date in the dimension table, only the fact table.

The dimension table looks like this:
SYSTEM | Location | Branch | Flag  | Interval
CA              M5             27            X            1
CA              M5             27            Y            2
CA              M5             27            Z            3
CA              M5             27            A            4
CA              M5             27            B            5
...
CA              M5             27            ab            96
CA              SMD          25            aD            1
CA              SMD          25            aF            2
CA              SMD          25            De            3
...
CA              SMD          25            Xq            24
PA              PJ               1251        Z              1
...
NJ               CI                1008       A              2
...

And the Fact table has these related
SYSTEM | Location | Branch | Interval | Inception_Date
CA              M5             27               1              22APR2018
CA              M5             27               3              22APR2018
CA              M5             27               4              22APR2018
CA              M5             27               5              22APR2018

In the example above, we need interval 2, which doesn't exist for that date.  Therein lies our issue.

Many location without the correct number of intervals, but we need to fill those in.  It could be missing 20 or 1 or 5 or none at all.

Please let me know if that is sufficient
0
cyimxtckCEOAuthor Commented:
Sorry, didn't post what we need output:

Existing output (missing row)
SYSTEM | Location | Branch | Interval | Inception_Date | Flag
CA              M5             27               1              22APR2018         X
CA              M5             27               3              22APR2018         Z
CA              M5             27               4              22APR2018         A
CA              M5             27               5              22APR2018         B

desired output:
SYSTEM | Location | Branch | Interval | Inception_Date | Flag
CA              M5             27               1              22APR2018         X
CA              M5             27               2              22APR2018         Y
CA              M5             27               3              22APR2018         Z
CA              M5             27               4              22APR2018         A
CA              M5             27               5              22APR2018         B
0
SujithData ArchitectCommented:
Do you need any measures from the fact table in the output? or only the listed columns?
0
cyimxtckCEOAuthor Commented:
We do need a couple more columns from the fact table I believe; I omitted those for brevity.
0
SujithData ArchitectCommented:
Are you looking for something like this?

SQL> select * from dim;

SYSTEM               LOCATION             BRANCH               FLAG                   INTERVAL
-------------------- -------------------- -------------------- -------------------- ----------
CA                   M5                   27                   X                             1
CA                   M5                   27                   Y                             2
CA                   M5                   27                   Z                             3
CA                   M5                   27                   A                             4
CA                   M5                   27                   B                             5

SQL>
SQL> select * from fact;

SYSTEM               LOCATION             BRANCH                 INTERVAL INCEPTION
-------------------- -------------------- -------------------- ---------- ---------
CA                   M5                   27                            1 22-APR-18
CA                   M5                   27                            3 22-APR-18
CA                   M5                   27                            4 22-APR-18
CA                   M5                   27                            5 22-APR-18

SQL>
SQL> with data as (
  2  select  distinct d.system, d.location, d.branch, d.interval, f.inception_date
  3  from    dim d inner join fact f on f.system = d.system and f.location = d.location and f.branch = d.branch
  4  )
  5  select
  6  d.system, d.location, d.branch, d.interval, d.inception_date
  7  from    data d left outer join fact f
  8  on f.system = d.system and f.location = d.location and f.branch = d.branch
  9  and f.interval = d.interval and f.inception_date = d.inception_date
 10  ;

SYSTEM               LOCATION             BRANCH                 INTERVAL INCEPTION
-------------------- -------------------- -------------------- ---------- ---------
CA                   M5                   27                            1 22-APR-18
CA                   M5                   27                            3 22-APR-18
CA                   M5                   27                            4 22-APR-18
CA                   M5                   27                            5 22-APR-18
CA                   M5                   27                            2 22-APR-18

SQL>

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
cyimxtckCEOAuthor Commented:
That is exactly what I am looking for.

Let me run this against our test environment and see what time it takes.

Thanks for the help
0
cyimxtckCEOAuthor Commented:
AWESOME!  That worked super well, thanks a ton!
0
SujithData ArchitectCommented:
Glad to know!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PL/SQL

From novice to tech pro — start learning today.