Solved

Oracle SQL - duplicates within a date range

Posted on 2014-09-12
10
880 Views
Last Modified: 2014-09-19
I'm thinking this is similar to finding duplicates,  but slightly more involved.
I'm using Oracle 11g

I have three tables - Customer, Customer_Order & Order_Detail_History


The Customer Table is:
Customer_ID
Customer_Name

The Customer_order table is:
Order_id
Customer_id
start_date
end_date

The Order_Detail_History is
Order_ID
<details>

Order_Detail_History contains millions of records.Customer_Order has hundreds and Customer has a hundred.
The Order_Detail_History table is filled upon finalization of an order.
The order details can be slightl different depending on who is inputting the information (long story).

I wish to QA the Order_Detail_History and make sure that there are no duplicate orders (different order_id, same detail)

I would like to do it based on start_date, end_date if possible.

I want to see the customer_name, the order_id, start & end dates of those orders in the Order_Detail_History.

Example - :

Select distinct a.order_id, c.Customer_name, b.date_start,b.date_end
from  Order_Detail_History a
inner join Customer_Order b on a.order_id = b.order_id
inner join Customer c on c.customer_ID = b.Customer_ID

would give:


Order ID      Customer_Name   Start_Date  End_Date
   123         JOE            1/1/12       6/12/12    <----
   213         JOE            1/1/12       7/15/12    <----
   217         JOE            3/1/12       6/30/12    <----
   333         JOE            2/14/13      2/25/13
   145         SAM            1/1/14       3/14/14         

These top three would have duplicates.
Is there  a way to generate only these with a SQL query     ?
0
Comment
Question by:GNOVAK
10 Comments
 
LVL 73

Expert Comment

by:sdstuber
Comment Utility
SELECT *
  FROM (SELECT x.*, COUNT(*) OVER (PARTITION BY customer_name) cnt
          FROM (SELECT DISTINCT a.order_id,
                                c.customer_name,
                                b.date_start,
                                b.date_end
                  FROM order_detail_history a
                       INNER JOIN customer_order b ON a.order_id = b.order_id
                       INNER JOIN customer c ON c.customer_id = b.customer_id) x)
 WHERE cnt > 1
0
 

Author Comment

by:GNOVAK
Comment Utility
I had to add the date_start to the partition and it brings up the ones with the matching start dates.
Is there anyway to consider the dates between the start and end (overlaps)?
For example, if a customer two records are:
date start         date end
1/1/12              6/30/12
2/1/12               4/28/12

?
0
 
LVL 73

Expert Comment

by:sdstuber
Comment Utility
what do you want to do in that case?

extend your initial sample data with this scenario and post the expected results
0
 

Author Comment

by:GNOVAK
Comment Utility
I want to see the original top 3 claims. The record with the 217 order number is a perfect example.
The start and end dates are not the same as any of JOE's other records, yet there would be overlap since the start date is between the first start and end date.
It's almost like I need to get all the dates between a record's start & end and compare them to records for the same customer, if that makes sense.
0
 
LVL 73

Expert Comment

by:sdstuber
Comment Utility
So, are you trying to say the output should be this?

Order ID      Customer_Name   Start_Date  End_Date
   213         JOE            1/1/12       7/15/12    
   333         JOE            2/14/13      2/25/13
   145         SAM            1/1/14       3/14/14       

Open in new window


since those are the rows that are either distinct or super-ranges?

or, are you looking for the inverse?

Order ID      Customer_Name   Start_Date  End_Date
   123         JOE            1/1/12       6/12/12   
   217         JOE            3/1/12       6/30/12   

Open in new window

 
since these are rows that are contained with the range of other rows?

or, are you looking for something else?
if something else - please SHOW me, you can try to describe the output, but it's MUCH more clear (to me anyway) if you show me what you want.

I might need a description in addition to the display though;for example, you say you want 3 rows of JOE, but I don't understand why the 4th JOE is excluded.
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 73

Accepted Solution

by:
sdstuber earned 500 total points
Comment Utility
actually, I think I might have figured out what you want without the extra info

try this....


SELECT order_id,
       customer_name,
       start_date,
       end_date
  FROM (SELECT q.*,
               LAG(end_date) OVER(PARTITION BY customer_name ORDER BY start_date) prevend,
               LEAD(start_date) OVER(PARTITION BY customer_name ORDER BY end_date) nextstart
          FROM (Select distinct a.order_id, c.Customer_name, b.date_start,b.date_end
from  Order_Detail_History a
inner join Customer_Order b on a.order_id = b.order_id
inner join Customer c on c.customer_ID = b.Customer_ID) q)
 WHERE nextstart <= end_date OR prevend >= start_date
0
 
LVL 73

Expert Comment

by:sdstuber
Comment Utility
My idea for the query in my previous post is that you want to see all rows where any part of a row overlaps any part of any other row for the same customer.

Is that a correct rephrasing of your description?
0
 
LVL 34

Expert Comment

by:Mark Geerlings
Comment Utility
Ststuber's suggestion may work for you, depending on the size of your SGA and/or your temporary tablespace and on how many other users/processes are active at the same time.

Normally, I agree that the most efficient way to solve data problems in Oracle is via well-written SQL statements.  But, this may be a case where a PL\SQL procedure is a better option.  I'm concerned by this sub-query: "Select distinct a.order_id, ..." and the fact that this is an "in-line view" which means all of these rows will be loaded into temporary segments, which may or may not all be contained in memory, then they will be evaluated without the benefit of any indexes.  This approach is certainly worth trying (especially if you have a test system with a nearly-full copy of production data) to work with.

But if this approach causes memory usage problems or other performance-related problems, you may need to consider a PL\SQL procedure based approach that uses a cursor loop to get the order information, then a second query to check for over-lapping orders.  If one or more overlapping orders are found, a third query could be be used to get the related information form the other tables for the particular order involved.  This approach would likely cause a lot less contention for memory and temporary tablespace I/O.
0
 
LVL 48

Expert Comment

by:PortletPaul
Comment Utility
To reach the expected result does not require use of the history table at all. So I'm wondering what else it is you are really searching for
| ORDER_ID | CUSTOMER_NAME | START_DATE |  NEXTSTART |   END_DATE |    PREVEND |
|----------|---------------|------------|------------|------------|------------|
|      123 |           JOE | 2012-01-01 | 2012-03-01 | 2012-06-12 |     (null) |
|      213 |           JOE | 2012-01-01 | 2013-02-14 | 2012-07-15 | 2012-06-12 |
|      217 |           JOE | 2012-03-01 | 2012-01-01 | 2012-06-30 | 2012-07-15 |
		

Open in new window

Produced by:
SELECT
        q.order_id
      , c.customer_name
      , to_char(q.start_date,'YYYY-MM-DD') start_date
      , to_char(q.nextstart,'YYYY-MM-DD')  nextstart
      , to_char(q.end_date,'YYYY-MM-DD')   end_date
      , to_char(q.prevend,'YYYY-MM-DD')    prevend
FROM (
            SELECT
                  co.order_id
                , co.customer_id
                , co.start_date
                , LEAD(co.start_date) OVER (PARTITION BY co.customer_id 
                                            ORDER BY co.end_date)           AS nextstart
                , co.end_date
                , LAG(co.end_date)    OVER (PARTITION BY co.customer_id 
                                            ORDER BY co.start_date)         AS prevend
            FROM Customer_Order co
      ) q
INNER JOIN CUSTOMER c
                    ON q.customer_id = c.customer_id
WHERE (
        q.nextstart <= q.end_date
      OR 
        q.prevend >= q.start_date
      )
ORDER BY
        c.customer_name
      , q.start_date
      , q.end_date
;

Open in new window

perhaps this?
SELECT
        count(*) OVER (PARTITION BY oq.order_id, oq.customer_id) num_of
      , oq.order_id
      , c.customer_name
      , oh.hist
      , to_char(oq.start_date,'YYYY-MM-DD') start_date
      , to_char(oq.end_date,'YYYY-MM-DD')   end_date
      , to_char(oq.nextstart,'YYYY-MM-DD')  nextstart
      , to_char(oq.prevend,'YYYY-MM-DD')    prevend
FROM (
      SELECT
              q.order_id
            , q.customer_id
            , q.start_date
            , q.nextstart
            , q.end_date
            , q.prevend
      FROM (
                  SELECT
                        co.order_id
                      , co.customer_id
                      , co.start_date
                      , LEAD(co.start_date) OVER (PARTITION BY co.customer_id 
                                                  ORDER BY co.end_date)           AS nextstart
                      , co.end_date
                      , LAG(co.end_date)    OVER (PARTITION BY co.customer_id 
                                                  ORDER BY co.start_date)         AS prevend
                  FROM Customer_Order co
            ) q
      WHERE (
              q.nextstart <= q.end_date
            OR 
              q.prevend >= q.start_date
            )
      ) oq
INNER JOIN ORDER_DETAIL_HISTORY oh
                                ON oq.order_id = oh.order_id
INNER JOIN CUSTOMER c
                    ON oq.customer_id = c.customer_id
ORDER BY
        c.customer_name
      , oq.start_date
      , oq.end_date
      , oh.hist
;

Open in new window

see: http://sqlfiddle.com/#!4/ddceb/1
0
 

Author Closing Comment

by:GNOVAK
Comment Utility
Excellent - thanks sdstuber - once again!
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Suggested Solutions

Cursors in Oracle: A cursor is used to process individual rows returned by database system for a query. In oracle every SQL statement executed by the oracle server has a private area. This area contains information about the SQL statement and the…
How to Unravel a Tricky Query Introduction If you browse through the Oracle zones or any of the other database-related zones you'll come across some complicated solutions and sometimes you'll just have to wonder how anyone came up with them.  …
This video shows syntax for various backup options while discussing how the different basic backup types work.  It explains how to take full backups, incremental level 0 backups, incremental level 1 backups in both differential and cumulative mode a…
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now