Don Scarpetti
asked on
How do I select only 1 record within a group of duplicate customer numbers?
I am using an Oracle database, I couldn't select anything except Microsoft SQL as a topic.
I created a table with all records that matched a certain criteria. The table contains a customer number and a field with either Y or N along with other fields. This table can contain the same customer number several times. I am looking to select only a single customer number that never had a Y.
WD.csv
I created a table with all records that matched a certain criteria. The table contains a customer number and a field with either Y or N along with other fields. This table can contain the same customer number several times. I am looking to select only a single customer number that never had a Y.
WD.csv
Changed zones to 'Oracle Database' for you. Not sure what the deal is between that and 'Oracle'.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
you may try
that gives 44 hits for your selection.
of course you may exchange the condition 'factory > 0' by any other valid criteria.
note, in the csv file there are only a few duplicates regarding the customer id but there are at least two occurrences where there are multiple records which are identical for all fields including a category value of 'N'. you should try to avoid duplicate records since it violates entity integrity (and makes always troubles when doing queries or evaluations).
Sara
SELECT DISTINCT customer, factory, entry_date, category
FROM Cust
WHERE factory > 0 AND NOT (category = 'Y')
;
that gives 44 hits for your selection.
of course you may exchange the condition 'factory > 0' by any other valid criteria.
note, in the csv file there are only a few duplicates regarding the customer id but there are at least two occurrences where there are multiple records which are identical for all fields including a category value of 'N'. you should try to avoid duplicate records since it violates entity integrity (and makes always troubles when doing queries or evaluations).
Sara
I don't see how this:
SELECT DISTINCT customer, factory, entry_date, category
FROM Cust
WHERE factory > 0 AND NOT (category = 'Y')
;
satisifes this criteria:customer number that never had a YFor example, that query would return customer number 8042597, which has records with a category of Y. There are a total of 9 records returned by that query where the customer number has at least one record with a category of Y.
SQL> select * from mytab where customer = 8042597;
CUSTOMER FACTORY ENTRY_DATE EXIT_DATE EXIT_CODE C
---------- ---------- ------------------- ------------------- ---------- -
8042597 156311 07/17/2017 00:00:00 07/17/2017 00:00:00 12 N
8042597 156311 07/17/2017 00:00:00 07/17/2017 00:00:00 12 N
8042597 156311 07/17/2017 00:00:00 07/17/2017 00:00:00 12 Y
select * from mytab where customer = 8042597;
johnsone, your select statement neither contains the 'distinct' keyword nor does it exclude entries with category='Y'. did you mix-up some comments?
Sara
No. I was showing a case where your query did not return the results where:
That customer clearly has two Y records and are returned by your query. There are 8 others where that is true as well.
customer number that never had a Y
That customer clearly has two Y records and are returned by your query. There are 8 others where that is true as well.
but you used not the select statement i posted but one where you neither make a 'select distinct' nor where you exclude records with category = 'Y'.
my query returned 44 records and none of them contains a duplicate customer (id) nor has a category 'Y'.
Sara
customer factory entry_date category
202006 152711 31.08.2017 N
1902389 151511 30.07.2017 N
3902043 152711 31.08.2017 N
8021861 152711 31.08.2017 N
8021897 156211 31.08.2017 N
8021912 152711 31.08.2017 N
8021963 152711 10.07.2017 N
8030101 152711 31.08.2017 N
8030272 152711 31.08.2017 N
8031643 151611 31.08.2017 N
8032629 156311 23.08.2017 N
8032714 152711 31.08.2017 N
8034752 152711 31.08.2017 N
8035076 152711 31.08.2017 N
8035170 152711 31.08.2017 N
8035881 152711 31.08.2017 N
8035960 152711 31.08.2017 N
8036364 156111 31.08.2017 N
8036697 152711 31.08.2017 N
8037755 156111 31.08.2017 N
8037838 156111 31.08.2017 N
8037925 156111 31.08.2017 N
8039679 901 31.08.2017 N
8039785 156211 31.08.2017 N
8040124 156111 31.08.2017 N
8040136 152711 31.08.2017 N
8040792 152711 31.08.2017 N
8041775 152711 31.08.2017 N
8041992 100 31.08.2017 N
8042248 152711 31.08.2017 N
8042473 152711 31.08.2017 N
8042597 156311 17.07.2017 N
8042671 152711 31.08.2017 N
8043133 101 31.08.2017 N
8044347 156211 31.08.2017 N
8044361 156311 31.08.2017 N
8044463 156211 31.08.2017 N
8046857 156211 31.08.2017 N
8047143 156211 31.08.2017 N
8047317 151611 31.08.2017 N
8048137 156211 31.08.2017 N
8048144 156311 31.08.2017 N
8048180 156211 31.08.2017 N
8049165 156111 18.07.2017 N
my query returned 44 records and none of them contains a duplicate customer (id) nor has a category 'Y'.
Sara
>>my query returned 44 records and none of them contains a duplicate customer (id) nor has a category 'Y'.
Yes but it doesn't meet the requirements. 8032629 had a 'Y' so it should not be included.
Yes but it doesn't meet the requirements. 8032629 had a 'Y' so it should not be included.
This query returns 34 rows and appears to meet all of the criteria -
select customer, factory, entry_date, exit_date, exit_code, category
from yourtbl where category = 'N'
minus
select customer, factory, entry_date, exit_date, exit_code, decode(category,'Y','N')
from yourtbl where category = 'Y'
;
select customer, factory, entry_date, exit_date, exit_code, category
from yourtbl where category = 'N'
minus
select customer, factory, entry_date, exit_date, exit_code, decode(category,'Y','N')
from yourtbl where category = 'Y'
;
ASKER
I ran all queries against my original file and I still get the results with records that should not be there. From my sample file I should have a result of 35 unique records.
Can you post the 35 records (just the customer number) that you should have?
Using the SQL in #a42280417 and your 99 sample rows, I get 36 rows returned. Which one doesn't belong?
[edit] I accidently deleted a row in your 100 records. I'll find and retest.
[edit] I accidently deleted a row in your 100 records. I'll find and retest.
ASKER
I should have thought of that, sorry.
WD35.csv
WD35.csv
Retested johnsone's SQL with all 100 sample rows and I get 35 rows returned.
>>I should have thought of that, sorry.
Those are the exact same customers I get with johnsone's SQL from #a42280417
Those are the exact same customers I get with johnsone's SQL from #a42280417
The query in my original post returns the exact 35 records in your expected result set.
Can you please post the differences that you see?
While the method with the MINUS should work as well, it would pass the data twice.
Can you please post the differences that you see?
While the method with the MINUS should work as well, it would pass the data twice.
ASKER
Thank you all for your input. I had an issue with johnsone's query as a result of something I did.
ASKER
I thank everyone that participated.
I also missed inserting a record. Now that I have 100 records in my test table, I also get 35 records. While I agree that the query requires two passes of the table, it has the added benefit of retrieving all of the attributes and not just the customer numbers.
select customer, factory, entry_date, exit_date, exit_code, category
from yourtbl where category = 'N'
minus
select customer, factory, entry_date, exit_date, exit_code, decode(category,'Y','N')
from yourtbl where category = 'Y';
select customer, factory, entry_date, exit_date, exit_code, category
from yourtbl where category = 'N'
minus
select customer, factory, entry_date, exit_date, exit_code, decode(category,'Y','N')
from yourtbl where category = 'Y';
CUSTOMER FACTORY ENTRY_DATE EXIT_DATE EXIT_CODE C
---------- ---------- ---------- ---------- ---------- -
202006 152711 8/31/2017 8/30/2017 N
1902389 151511 7/30/2017 7/31/2017 25 N
3902043 152711 8/31/2017 8/30/2017 N
8021861 152711 8/31/2017 8/30/2017 N
8021897 156211 8/31/2017 7/3/2017 3 N
8021912 152711 8/31/2017 8/30/2017 N
8021963 152711 7/10/2017 7/11/2017 2 N
8030101 152711 8/31/2017 8/30/2017 N
8030272 152711 8/31/2017 8/30/2017 N
8031643 151611 8/31/2017 7/17/2017 3 N
8032714 152711 8/31/2017 8/30/2017 N
8034752 152711 8/31/2017 8/30/2017 N
8035076 152711 8/31/2017 8/30/2017 N
8035881 152711 8/31/2017 8/30/2017 N
8035960 152711 8/31/2017 8/30/2017 N
8036697 152711 8/31/2017 8/30/2017 N
8037755 156111 8/31/2017 8/18/2017 11 N
8037838 156111 8/31/2017 8/23/2017 2 N
8037925 156111 8/31/2017 8/14/2017 3 N
8039679 901 8/31/2017 8/24/2017 11 N
8039785 156211 8/31/2017 8/16/2017 2 N
8040124 156111 8/31/2017 8/8/2017 2 N
8040136 152711 8/31/2017 8/30/2017 N
8040792 152711 8/31/2017 8/30/2017 N
8041775 152711 8/31/2017 8/30/2017 N
8041992 100 8/31/2017 7/24/2017 11 N
8042248 152711 8/31/2017 8/30/2017 N
8042473 152711 8/31/2017 8/30/2017 N
8042671 152711 8/31/2017 8/30/2017 N
8043133 101 8/31/2017 7/27/2017 1 N
8046857 156211 8/31/2017 8/10/2017 2 N
8047143 156211 8/31/2017 8/31/2017 1 N
8048144 156311 8/31/2017 8/18/2017 1 N
8048180 156211 8/31/2017 8/3/2017 2 N
8049165 156111 7/18/2017 7/18/2017 3 N
35 rows selected.
Pass data once, select all columns:
SELECT customer,
factory,
entry_date,
exit_date,
exit_code,
category
FROM (SELECT customer,
factory,
entry_date,
exit_date,
exit_code,
category,
SUM(CASE
WHEN category = 'Y' THEN 1
ELSE 0
END)
over (
PARTITION BY customer
ORDER BY customer) y_cnt
FROM mytab)
WHERE y_cnt = 0;
Just change it to a windowed aggregate.
Just be aware that you will get back more than 35 rows if you get all the data. There is a customer with multiple N records that has no Y record, so there is one "repeat". It is a repeated customer id, but it appears twice because it is in the data twice. In the sample set of data, you can get rid of it with a distinct, but real data the other fields may differ. I'm pretty sure it is taken out of the minus because it is a pure duplicate.