How do I select only 1 record within a group of duplicate customer numbers?

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

sarabande

you may try

SELECT DISTINCT customer, factory, entry_date, category 
FROM   Cust 
WHERE factory > 0 AND NOT (category = 'Y')
;

that gives 44 hits for your selection.

of course you may exchange the condition 'factory > 0' by any other valid criteria.

note, in the csv file there are only a few duplicates regarding the customer id but there are at least two occurrences where there are multiple records which are identical for all fields including a category value of 'N'. you should try to avoid duplicate records since it violates entity integrity (and makes always troubles when doing queries or evaluations).

Sara

I don't see how this:

SELECT DISTINCT customer, factory, entry_date, category 
FROM   Cust 
WHERE factory > 0 AND NOT (category = 'Y')
;

satisifes this criteria:

customer number that never had a Y

For example, that query would return customer number 8042597, which has records with a category of Y. There are a total of 9 records returned by that query where the customer number has at least one record with a category of Y.

SQL> select * from mytab where customer = 8042597;
  CUSTOMER    FACTORY ENTRY_DATE          EXIT_DATE            EXIT_CODE C
---------- ---------- ------------------- ------------------- ---------- -
   8042597     156311 07/17/2017 00:00:00 07/17/2017 00:00:00         12 N
   8042597     156311 07/17/2017 00:00:00 07/17/2017 00:00:00         12 N
   8042597     156311 07/17/2017 00:00:00 07/17/2017 00:00:00         12 Y

sarabande

select * from mytab where customer = 8042597;

johnsone, your select statement neither contains the 'distinct' keyword nor does it exclude entries with category='Y'. did you mix-up some comments?

Sara

No. I was showing a case where your query did not return the results where:

customer number that never had a Y

That customer clearly has two Y records and are returned by your query. There are 8 others where that is true as well.

sarabande

but you used not the select statement i posted but one where you neither make a 'select distinct' nor where you exclude records with category = 'Y'.

customer	factory	entry_date	category
202006	152711	31.08.2017	N
1902389	151511	30.07.2017	N
3902043	152711	31.08.2017	N
8021861	152711	31.08.2017	N
8021897	156211	31.08.2017	N
8021912	152711	31.08.2017	N
8021963	152711	10.07.2017	N
8030101	152711	31.08.2017	N
8030272	152711	31.08.2017	N
8031643	151611	31.08.2017	N
8032629	156311	23.08.2017	N
8032714	152711	31.08.2017	N
8034752	152711	31.08.2017	N
8035076	152711	31.08.2017	N
8035170	152711	31.08.2017	N
8035881	152711	31.08.2017	N
8035960	152711	31.08.2017	N
8036364	156111	31.08.2017	N
8036697	152711	31.08.2017	N
8037755	156111	31.08.2017	N
8037838	156111	31.08.2017	N
8037925	156111	31.08.2017	N
8039679	901	31.08.2017	N
8039785	156211	31.08.2017	N
8040124	156111	31.08.2017	N
8040136	152711	31.08.2017	N
8040792	152711	31.08.2017	N
8041775	152711	31.08.2017	N
8041992	100	31.08.2017	N
8042248	152711	31.08.2017	N
8042473	152711	31.08.2017	N
8042597	156311	17.07.2017	N
8042671	152711	31.08.2017	N
8043133	101	31.08.2017	N
8044347	156211	31.08.2017	N
8044361	156311	31.08.2017	N
8044463	156211	31.08.2017	N
8046857	156211	31.08.2017	N
8047143	156211	31.08.2017	N
8047317	151611	31.08.2017	N
8048137	156211	31.08.2017	N
8048144	156311	31.08.2017	N
8048180	156211	31.08.2017	N
8049165	156111	18.07.2017	N

my query returned 44 records and none of them contains a duplicate customer (id) nor has a category 'Y'.

Sara

>>my query returned 44 records and none of them contains a duplicate customer (id) nor has a category 'Y'.

Yes but it doesn't meet the requirements. 8032629 had a 'Y' so it should not be included.

awking00

This query returns 34 rows and appears to meet all of the criteria -
select customer, factory, entry_date, exit_date, exit_code, category
from yourtbl where category = 'N'
minus
select customer, factory, entry_date, exit_date, exit_code, decode(category,'Y','N')
from yourtbl where category = 'Y'
;

ASKER

I ran all queries against my original file and I still get the results with records that should not be there. From my sample file I should have a result of 35 unique records.

awking00

Can you post the 35 records (just the customer number) that you should have?

Using the SQL in #a42280417 and your 99 sample rows, I get 36 rows returned. Which one doesn't belong?

[edit] I accidently deleted a row in your 100 records. I'll find and retest.

ASKER

I should have thought of that, sorry.
WD35.csv

Retested johnsone's SQL with all 100 sample rows and I get 35 rows returned.

>>I should have thought of that, sorry.

Those are the exact same customers I get with johnsone's SQL from #a42280417

The query in my original post returns the exact 35 records in your expected result set.

Can you please post the differences that you see?

While the method with the MINUS should work as well, it would pass the data twice.

ASKER

Thank you all for your input. I had an issue with johnsone's query as a result of something I did.

ASKER

I thank everyone that participated.

awking00

I also missed inserting a record. Now that I have 100 records in my test table, I also get 35 records. While I agree that the query requires two passes of the table, it has the added benefit of retrieving all of the attributes and not just the customer numbers.

select customer, factory, entry_date, exit_date, exit_code, category
from yourtbl where category = 'N'
minus
select customer, factory, entry_date, exit_date, exit_code, decode(category,'Y','N')
from yourtbl where category = 'Y';

  CUSTOMER    FACTORY ENTRY_DATE EXIT_DATE   EXIT_CODE C
---------- ---------- ---------- ---------- ---------- -
    202006     152711 8/31/2017  8/30/2017             N
   1902389     151511 7/30/2017  7/31/2017          25 N
   3902043     152711 8/31/2017  8/30/2017             N
   8021861     152711 8/31/2017  8/30/2017             N
   8021897     156211 8/31/2017  7/3/2017            3 N
   8021912     152711 8/31/2017  8/30/2017             N
   8021963     152711 7/10/2017  7/11/2017           2 N
   8030101     152711 8/31/2017  8/30/2017             N
   8030272     152711 8/31/2017  8/30/2017             N
   8031643     151611 8/31/2017  7/17/2017           3 N
   8032714     152711 8/31/2017  8/30/2017             N
   8034752     152711 8/31/2017  8/30/2017             N
   8035076     152711 8/31/2017  8/30/2017             N
   8035881     152711 8/31/2017  8/30/2017             N
   8035960     152711 8/31/2017  8/30/2017             N
   8036697     152711 8/31/2017  8/30/2017             N
   8037755     156111 8/31/2017  8/18/2017          11 N
   8037838     156111 8/31/2017  8/23/2017           2 N
   8037925     156111 8/31/2017  8/14/2017           3 N
   8039679        901 8/31/2017  8/24/2017          11 N
   8039785     156211 8/31/2017  8/16/2017           2 N
   8040124     156111 8/31/2017  8/8/2017            2 N
   8040136     152711 8/31/2017  8/30/2017             N
   8040792     152711 8/31/2017  8/30/2017             N
   8041775     152711 8/31/2017  8/30/2017             N
   8041992        100 8/31/2017  7/24/2017          11 N
   8042248     152711 8/31/2017  8/30/2017             N
   8042473     152711 8/31/2017  8/30/2017             N
   8042671     152711 8/31/2017  8/30/2017             N
   8043133        101 8/31/2017  7/27/2017           1 N
   8046857     156211 8/31/2017  8/10/2017           2 N
   8047143     156211 8/31/2017  8/31/2017           1 N
   8048144     156311 8/31/2017  8/18/2017           1 N
   8048180     156211 8/31/2017  8/3/2017            2 N
   8049165     156111 7/18/2017  7/18/2017           3 N

35 rows selected.

Pass data once, select all columns:

SELECT customer, 
       factory, 
       entry_date, 
       exit_date, 
       exit_code, 
       category 
FROM   (SELECT customer, 
               factory, 
               entry_date, 
               exit_date, 
               exit_code, 
               category, 
               SUM(CASE 
                     WHEN category = 'Y' THEN 1 
                     ELSE 0 
                   END) 
                 over ( 
                   PARTITION BY customer 
                   ORDER BY customer) y_cnt 
        FROM   mytab) 
WHERE  y_cnt = 0;

Just change it to a windowed aggregate.