Solved

Oracle SQL - duplicates within a date range

Posted on 2014-09-12
10
966 Views
Last Modified: 2014-09-19
I'm thinking this is similar to finding duplicates,  but slightly more involved.
I'm using Oracle 11g

I have three tables - Customer, Customer_Order & Order_Detail_History


The Customer Table is:
Customer_ID
Customer_Name

The Customer_order table is:
Order_id
Customer_id
start_date
end_date

The Order_Detail_History is
Order_ID
<details>

Order_Detail_History contains millions of records.Customer_Order has hundreds and Customer has a hundred.
The Order_Detail_History table is filled upon finalization of an order.
The order details can be slightl different depending on who is inputting the information (long story).

I wish to QA the Order_Detail_History and make sure that there are no duplicate orders (different order_id, same detail)

I would like to do it based on start_date, end_date if possible.

I want to see the customer_name, the order_id, start & end dates of those orders in the Order_Detail_History.

Example - :

Select distinct a.order_id, c.Customer_name, b.date_start,b.date_end
from  Order_Detail_History a
inner join Customer_Order b on a.order_id = b.order_id
inner join Customer c on c.customer_ID = b.Customer_ID

would give:


Order ID      Customer_Name   Start_Date  End_Date
   123         JOE            1/1/12       6/12/12    <----
   213         JOE            1/1/12       7/15/12    <----
   217         JOE            3/1/12       6/30/12    <----
   333         JOE            2/14/13      2/25/13
   145         SAM            1/1/14       3/14/14         

These top three would have duplicates.
Is there  a way to generate only these with a SQL query     ?
0
Comment
Question by:GNOVAK
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
10 Comments
 
LVL 74

Expert Comment

by:sdstuber
ID: 40319318
SELECT *
  FROM (SELECT x.*, COUNT(*) OVER (PARTITION BY customer_name) cnt
          FROM (SELECT DISTINCT a.order_id,
                                c.customer_name,
                                b.date_start,
                                b.date_end
                  FROM order_detail_history a
                       INNER JOIN customer_order b ON a.order_id = b.order_id
                       INNER JOIN customer c ON c.customer_id = b.customer_id) x)
 WHERE cnt > 1
0
 

Author Comment

by:GNOVAK
ID: 40319380
I had to add the date_start to the partition and it brings up the ones with the matching start dates.
Is there anyway to consider the dates between the start and end (overlaps)?
For example, if a customer two records are:
date start         date end
1/1/12              6/30/12
2/1/12               4/28/12

?
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 40319404
what do you want to do in that case?

extend your initial sample data with this scenario and post the expected results
0
Creating Instructional Tutorials  

For Any Use & On Any Platform

Contextual Guidance at the moment of need helps your employees/users adopt software o& achieve even the most complex tasks instantly. Boost knowledge retention, software adoption & employee engagement with easy solution.

 

Author Comment

by:GNOVAK
ID: 40319513
I want to see the original top 3 claims. The record with the 217 order number is a perfect example.
The start and end dates are not the same as any of JOE's other records, yet there would be overlap since the start date is between the first start and end date.
It's almost like I need to get all the dates between a record's start & end and compare them to records for the same customer, if that makes sense.
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 40319691
So, are you trying to say the output should be this?

Order ID      Customer_Name   Start_Date  End_Date
   213         JOE            1/1/12       7/15/12    
   333         JOE            2/14/13      2/25/13
   145         SAM            1/1/14       3/14/14       

Open in new window


since those are the rows that are either distinct or super-ranges?

or, are you looking for the inverse?

Order ID      Customer_Name   Start_Date  End_Date
   123         JOE            1/1/12       6/12/12   
   217         JOE            3/1/12       6/30/12   

Open in new window

 
since these are rows that are contained with the range of other rows?

or, are you looking for something else?
if something else - please SHOW me, you can try to describe the output, but it's MUCH more clear (to me anyway) if you show me what you want.

I might need a description in addition to the display though;for example, you say you want 3 rows of JOE, but I don't understand why the 4th JOE is excluded.
0
 
LVL 74

Accepted Solution

by:
sdstuber earned 500 total points
ID: 40319710
actually, I think I might have figured out what you want without the extra info

try this....


SELECT order_id,
       customer_name,
       start_date,
       end_date
  FROM (SELECT q.*,
               LAG(end_date) OVER(PARTITION BY customer_name ORDER BY start_date) prevend,
               LEAD(start_date) OVER(PARTITION BY customer_name ORDER BY end_date) nextstart
          FROM (Select distinct a.order_id, c.Customer_name, b.date_start,b.date_end
from  Order_Detail_History a
inner join Customer_Order b on a.order_id = b.order_id
inner join Customer c on c.customer_ID = b.Customer_ID) q)
 WHERE nextstart <= end_date OR prevend >= start_date
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 40319923
My idea for the query in my previous post is that you want to see all rows where any part of a row overlaps any part of any other row for the same customer.

Is that a correct rephrasing of your description?
0
 
LVL 35

Expert Comment

by:Mark Geerlings
ID: 40320185
Ststuber's suggestion may work for you, depending on the size of your SGA and/or your temporary tablespace and on how many other users/processes are active at the same time.

Normally, I agree that the most efficient way to solve data problems in Oracle is via well-written SQL statements.  But, this may be a case where a PL\SQL procedure is a better option.  I'm concerned by this sub-query: "Select distinct a.order_id, ..." and the fact that this is an "in-line view" which means all of these rows will be loaded into temporary segments, which may or may not all be contained in memory, then they will be evaluated without the benefit of any indexes.  This approach is certainly worth trying (especially if you have a test system with a nearly-full copy of production data) to work with.

But if this approach causes memory usage problems or other performance-related problems, you may need to consider a PL\SQL procedure based approach that uses a cursor loop to get the order information, then a second query to check for over-lapping orders.  If one or more overlapping orders are found, a third query could be be used to get the related information form the other tables for the particular order involved.  This approach would likely cause a lot less contention for memory and temporary tablespace I/O.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40320524
To reach the expected result does not require use of the history table at all. So I'm wondering what else it is you are really searching for
| ORDER_ID | CUSTOMER_NAME | START_DATE |  NEXTSTART |   END_DATE |    PREVEND |
|----------|---------------|------------|------------|------------|------------|
|      123 |           JOE | 2012-01-01 | 2012-03-01 | 2012-06-12 |     (null) |
|      213 |           JOE | 2012-01-01 | 2013-02-14 | 2012-07-15 | 2012-06-12 |
|      217 |           JOE | 2012-03-01 | 2012-01-01 | 2012-06-30 | 2012-07-15 |
		

Open in new window

Produced by:
SELECT
        q.order_id
      , c.customer_name
      , to_char(q.start_date,'YYYY-MM-DD') start_date
      , to_char(q.nextstart,'YYYY-MM-DD')  nextstart
      , to_char(q.end_date,'YYYY-MM-DD')   end_date
      , to_char(q.prevend,'YYYY-MM-DD')    prevend
FROM (
            SELECT
                  co.order_id
                , co.customer_id
                , co.start_date
                , LEAD(co.start_date) OVER (PARTITION BY co.customer_id 
                                            ORDER BY co.end_date)           AS nextstart
                , co.end_date
                , LAG(co.end_date)    OVER (PARTITION BY co.customer_id 
                                            ORDER BY co.start_date)         AS prevend
            FROM Customer_Order co
      ) q
INNER JOIN CUSTOMER c
                    ON q.customer_id = c.customer_id
WHERE (
        q.nextstart <= q.end_date
      OR 
        q.prevend >= q.start_date
      )
ORDER BY
        c.customer_name
      , q.start_date
      , q.end_date
;

Open in new window

perhaps this?
SELECT
        count(*) OVER (PARTITION BY oq.order_id, oq.customer_id) num_of
      , oq.order_id
      , c.customer_name
      , oh.hist
      , to_char(oq.start_date,'YYYY-MM-DD') start_date
      , to_char(oq.end_date,'YYYY-MM-DD')   end_date
      , to_char(oq.nextstart,'YYYY-MM-DD')  nextstart
      , to_char(oq.prevend,'YYYY-MM-DD')    prevend
FROM (
      SELECT
              q.order_id
            , q.customer_id
            , q.start_date
            , q.nextstart
            , q.end_date
            , q.prevend
      FROM (
                  SELECT
                        co.order_id
                      , co.customer_id
                      , co.start_date
                      , LEAD(co.start_date) OVER (PARTITION BY co.customer_id 
                                                  ORDER BY co.end_date)           AS nextstart
                      , co.end_date
                      , LAG(co.end_date)    OVER (PARTITION BY co.customer_id 
                                                  ORDER BY co.start_date)         AS prevend
                  FROM Customer_Order co
            ) q
      WHERE (
              q.nextstart <= q.end_date
            OR 
              q.prevend >= q.start_date
            )
      ) oq
INNER JOIN ORDER_DETAIL_HISTORY oh
                                ON oq.order_id = oh.order_id
INNER JOIN CUSTOMER c
                    ON oq.customer_id = c.customer_id
ORDER BY
        c.customer_name
      , oq.start_date
      , oq.end_date
      , oh.hist
;

Open in new window

see: http://sqlfiddle.com/#!4/ddceb/1
0
 

Author Closing Comment

by:GNOVAK
ID: 40332483
Excellent - thanks sdstuber - once again!
0

Featured Post

Forrester Webinar: xMatters Delivers 261% ROI

Guest speaker Dean Davison, Forrester Principal Consultant, explains how a Fortune 500 communication company using xMatters found these results: Achieved a 261% ROI, Experienced $753,280 in net present value benefits over 3 years and Reduced MTTR by 91% for tier 1 incidents.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
oracle forms question 22 48
populate value based on what is selected in lov 2 38
Sql case statement to calculate totals 5 37
Oracle programming for starter 14 37
I remember the day when someone asked me to create a user for an application developement. The user should be able to create views and materialized views and, so, I used the following syntax: (CODE) This way, I guessed, I would ensure that use…
'Between' is such a common word we rarely think about it but in SQL it has a very specific definition we should be aware of. While most database vendors will have their own unique phrases to describe it (see references at end) the concept in common …
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
This video explains what a user managed backup is and shows how to take one, providing a couple of simple example scripts.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question