Solved

Oracle SQL - duplicates within a date range

Posted on 2014-09-12
10
909 Views
Last Modified: 2014-09-19
I'm thinking this is similar to finding duplicates,  but slightly more involved.
I'm using Oracle 11g

I have three tables - Customer, Customer_Order & Order_Detail_History


The Customer Table is:
Customer_ID
Customer_Name

The Customer_order table is:
Order_id
Customer_id
start_date
end_date

The Order_Detail_History is
Order_ID
<details>

Order_Detail_History contains millions of records.Customer_Order has hundreds and Customer has a hundred.
The Order_Detail_History table is filled upon finalization of an order.
The order details can be slightl different depending on who is inputting the information (long story).

I wish to QA the Order_Detail_History and make sure that there are no duplicate orders (different order_id, same detail)

I would like to do it based on start_date, end_date if possible.

I want to see the customer_name, the order_id, start & end dates of those orders in the Order_Detail_History.

Example - :

Select distinct a.order_id, c.Customer_name, b.date_start,b.date_end
from  Order_Detail_History a
inner join Customer_Order b on a.order_id = b.order_id
inner join Customer c on c.customer_ID = b.Customer_ID

would give:


Order ID      Customer_Name   Start_Date  End_Date
   123         JOE            1/1/12       6/12/12    <----
   213         JOE            1/1/12       7/15/12    <----
   217         JOE            3/1/12       6/30/12    <----
   333         JOE            2/14/13      2/25/13
   145         SAM            1/1/14       3/14/14         

These top three would have duplicates.
Is there  a way to generate only these with a SQL query     ?
0
Comment
Question by:GNOVAK
10 Comments
 
LVL 73

Expert Comment

by:sdstuber
ID: 40319318
SELECT *
  FROM (SELECT x.*, COUNT(*) OVER (PARTITION BY customer_name) cnt
          FROM (SELECT DISTINCT a.order_id,
                                c.customer_name,
                                b.date_start,
                                b.date_end
                  FROM order_detail_history a
                       INNER JOIN customer_order b ON a.order_id = b.order_id
                       INNER JOIN customer c ON c.customer_id = b.customer_id) x)
 WHERE cnt > 1
0
 

Author Comment

by:GNOVAK
ID: 40319380
I had to add the date_start to the partition and it brings up the ones with the matching start dates.
Is there anyway to consider the dates between the start and end (overlaps)?
For example, if a customer two records are:
date start         date end
1/1/12              6/30/12
2/1/12               4/28/12

?
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 40319404
what do you want to do in that case?

extend your initial sample data with this scenario and post the expected results
0
 

Author Comment

by:GNOVAK
ID: 40319513
I want to see the original top 3 claims. The record with the 217 order number is a perfect example.
The start and end dates are not the same as any of JOE's other records, yet there would be overlap since the start date is between the first start and end date.
It's almost like I need to get all the dates between a record's start & end and compare them to records for the same customer, if that makes sense.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 40319691
So, are you trying to say the output should be this?

Order ID      Customer_Name   Start_Date  End_Date
   213         JOE            1/1/12       7/15/12    
   333         JOE            2/14/13      2/25/13
   145         SAM            1/1/14       3/14/14       

Open in new window


since those are the rows that are either distinct or super-ranges?

or, are you looking for the inverse?

Order ID      Customer_Name   Start_Date  End_Date
   123         JOE            1/1/12       6/12/12   
   217         JOE            3/1/12       6/30/12   

Open in new window

 
since these are rows that are contained with the range of other rows?

or, are you looking for something else?
if something else - please SHOW me, you can try to describe the output, but it's MUCH more clear (to me anyway) if you show me what you want.

I might need a description in addition to the display though;for example, you say you want 3 rows of JOE, but I don't understand why the 4th JOE is excluded.
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 73

Accepted Solution

by:
sdstuber earned 500 total points
ID: 40319710
actually, I think I might have figured out what you want without the extra info

try this....


SELECT order_id,
       customer_name,
       start_date,
       end_date
  FROM (SELECT q.*,
               LAG(end_date) OVER(PARTITION BY customer_name ORDER BY start_date) prevend,
               LEAD(start_date) OVER(PARTITION BY customer_name ORDER BY end_date) nextstart
          FROM (Select distinct a.order_id, c.Customer_name, b.date_start,b.date_end
from  Order_Detail_History a
inner join Customer_Order b on a.order_id = b.order_id
inner join Customer c on c.customer_ID = b.Customer_ID) q)
 WHERE nextstart <= end_date OR prevend >= start_date
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 40319923
My idea for the query in my previous post is that you want to see all rows where any part of a row overlaps any part of any other row for the same customer.

Is that a correct rephrasing of your description?
0
 
LVL 35

Expert Comment

by:Mark Geerlings
ID: 40320185
Ststuber's suggestion may work for you, depending on the size of your SGA and/or your temporary tablespace and on how many other users/processes are active at the same time.

Normally, I agree that the most efficient way to solve data problems in Oracle is via well-written SQL statements.  But, this may be a case where a PL\SQL procedure is a better option.  I'm concerned by this sub-query: "Select distinct a.order_id, ..." and the fact that this is an "in-line view" which means all of these rows will be loaded into temporary segments, which may or may not all be contained in memory, then they will be evaluated without the benefit of any indexes.  This approach is certainly worth trying (especially if you have a test system with a nearly-full copy of production data) to work with.

But if this approach causes memory usage problems or other performance-related problems, you may need to consider a PL\SQL procedure based approach that uses a cursor loop to get the order information, then a second query to check for over-lapping orders.  If one or more overlapping orders are found, a third query could be be used to get the related information form the other tables for the particular order involved.  This approach would likely cause a lot less contention for memory and temporary tablespace I/O.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40320524
To reach the expected result does not require use of the history table at all. So I'm wondering what else it is you are really searching for
| ORDER_ID | CUSTOMER_NAME | START_DATE |  NEXTSTART |   END_DATE |    PREVEND |
|----------|---------------|------------|------------|------------|------------|
|      123 |           JOE | 2012-01-01 | 2012-03-01 | 2012-06-12 |     (null) |
|      213 |           JOE | 2012-01-01 | 2013-02-14 | 2012-07-15 | 2012-06-12 |
|      217 |           JOE | 2012-03-01 | 2012-01-01 | 2012-06-30 | 2012-07-15 |
		

Open in new window

Produced by:
SELECT
        q.order_id
      , c.customer_name
      , to_char(q.start_date,'YYYY-MM-DD') start_date
      , to_char(q.nextstart,'YYYY-MM-DD')  nextstart
      , to_char(q.end_date,'YYYY-MM-DD')   end_date
      , to_char(q.prevend,'YYYY-MM-DD')    prevend
FROM (
            SELECT
                  co.order_id
                , co.customer_id
                , co.start_date
                , LEAD(co.start_date) OVER (PARTITION BY co.customer_id 
                                            ORDER BY co.end_date)           AS nextstart
                , co.end_date
                , LAG(co.end_date)    OVER (PARTITION BY co.customer_id 
                                            ORDER BY co.start_date)         AS prevend
            FROM Customer_Order co
      ) q
INNER JOIN CUSTOMER c
                    ON q.customer_id = c.customer_id
WHERE (
        q.nextstart <= q.end_date
      OR 
        q.prevend >= q.start_date
      )
ORDER BY
        c.customer_name
      , q.start_date
      , q.end_date
;

Open in new window

perhaps this?
SELECT
        count(*) OVER (PARTITION BY oq.order_id, oq.customer_id) num_of
      , oq.order_id
      , c.customer_name
      , oh.hist
      , to_char(oq.start_date,'YYYY-MM-DD') start_date
      , to_char(oq.end_date,'YYYY-MM-DD')   end_date
      , to_char(oq.nextstart,'YYYY-MM-DD')  nextstart
      , to_char(oq.prevend,'YYYY-MM-DD')    prevend
FROM (
      SELECT
              q.order_id
            , q.customer_id
            , q.start_date
            , q.nextstart
            , q.end_date
            , q.prevend
      FROM (
                  SELECT
                        co.order_id
                      , co.customer_id
                      , co.start_date
                      , LEAD(co.start_date) OVER (PARTITION BY co.customer_id 
                                                  ORDER BY co.end_date)           AS nextstart
                      , co.end_date
                      , LAG(co.end_date)    OVER (PARTITION BY co.customer_id 
                                                  ORDER BY co.start_date)         AS prevend
                  FROM Customer_Order co
            ) q
      WHERE (
              q.nextstart <= q.end_date
            OR 
              q.prevend >= q.start_date
            )
      ) oq
INNER JOIN ORDER_DETAIL_HISTORY oh
                                ON oq.order_id = oh.order_id
INNER JOIN CUSTOMER c
                    ON oq.customer_id = c.customer_id
ORDER BY
        c.customer_name
      , oq.start_date
      , oq.end_date
      , oh.hist
;

Open in new window

see: http://sqlfiddle.com/#!4/ddceb/1
0
 

Author Closing Comment

by:GNOVAK
ID: 40332483
Excellent - thanks sdstuber - once again!
0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

I remember the day when someone asked me to create a user for an application developement. The user should be able to create views and materialized views and, so, I used the following syntax: (CODE) This way, I guessed, I would ensure that use…
Composite queries are used to retrieve the results from joining multiple queries after applying any filters. UNION, INTERSECT, MINUS, and UNION ALL are some of the operators used to get certain desired results.​
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now