What are the best practices for placement of SQL when checking for duplicates? WHERE clause JOINS, beginning or end?
Where do you put a checking for and removing duplicates query when you need to remove all extras. The goal is to check name, date of birth, address of person. Then to make sure that I am getting only the last person from their most recent visit or date time stamp occurrence. (We'll call it their most recent date of service) or in this case "s._serviced_date" I think it will look something like this Select Name, Address, Date of Birth From Personnel Where Max(Service_Date) ----Do I need any date formatting here or just the field? Group By Name, Address, Date of Birth Having Count(*) >1 Please feel free to check that. Especially If MAX() has to have special date format. May not need any?
THE BIG QUESTION HERE IS WHERE DO I PLACE IT AND WHY. WHAT ARE THE BEST PRACTICES OR WHAT IS GOOD POLICY TO DO IT ONE WAY OVER ANOTHER? the following is a mockup of the query. declare @from_dob date = dateadd(year,-45,'20210101')declare @to_dob date = dateadd(year,-40,'20211231')declare @research_date date = '20150101'declare @region varchar(100) select @region = dbo.z_bi_get_region(db_name())select @region as region, personnel_id, last_name, first_name, gender_code, birth_date,datediff(year,birth_date,cast(getdate() as date)) as age_at_run_date, mobile_phone_number, home_phone_number, primary_email_address, primary_address_line1, primary_address_line2, primary_address_city, primary_address_state, primary_address_zipcode, death_flag ,isnull(s1.disease_found,'N') as disease_found ,@from_dob as from_DOB, @to_dob as to_DOB, @research_date as research_datefrom c_personnel pers outer apply ( select top 1 'Y' as disease_found from c_research s inner join c_research_item si on s.research_key=si.research_key and si.active_flag='Y' inner join l_procedure_code pc on si.procedure_code=pc.procedure_code inner join l_disease_type dt on pc.disease_type_code=mt.disease_type_code and dt.disease_type_code in ('Cancer','Heart','Kidney',’Lung’,'Accident',’Neurological’) inner join l_location st on s._location_code=st.location_code where pers.personnel_key=s.personnel_key and s._serviced_date >= @research_date AND st.location_code NOT IN ('HP','CLNK') AND st.practice_code <> 'DocNABox' ) as s1where pers.birth_date >= @from_dob and pers.birth_date < dateadd(d,1,@to_dob) and pers.master_personnel_key is null AND pers.issuer_of_personnel_id = 'system' and pers.gender_code='M' and isnull(pers.death_flag,'N')='N' and isnull(s1.disease_found,'N')='N' order by datediff(year,birth_date,cast(getdate() as date))
THE BIG QUESTION HERE IS WHERE DO I PLACE IT AND WHY. WHAT ARE THE BEST PRACTICES OR WHAT IS GOOD POLICY TO DO IT ONE WAY OVER ANOTHER? the following is a mockup of the query.
One thing I was hoping to get is "where does the duplicate checker typically go". Does it go in the where clause? I don't know or see a clear place to put it? Does it go at the very end of the existing query that delivers the data. I kind of like Jim Horn's information as it gives you options. Especially the Soft Delete. That would work best in my situation. What I get confused about is if I have the records returned in a large select statement as above. Do I encapsulate that whole query and define it as "Data" then say Can someone give me an example of placement "where" does the Soft Delete code go" Do I need to create a CTE or insert a table? I can often find an example of the code it is "where does the code go" What has to happen prior. Do I have to give the query above a name and use the insert function then below the "framed query" then do I create a separate query where the soft delete goes...?
If you are referring to Select Distinct ...not select Unique. I think you are mistaken sir. Select Unique is only applicable to Oracle this is MS SQL Server.
RUA Volunteer2?
ASKER
I am looking at Jim Horns Soft Delete of duplicates gentlemen. I do not know what to do with the above code. The problem is I DO NOT KNOW WHERE to put the code now. Do I insert a new table somewhere and then use the code Jim provided. I do not know if I need to insert a new table or somehow incorporate the query I have above turning it into some kind of inserted table?
-- Set all is_active values to 'Y' (just for the demo) UPDATE ....update what? The table? Am I supposed to define a table here or an existing one that relates to the query above? SET is_active='Y' GO. .....So this defines the field/column as a Y-- Now set all duplicate rows is_active='N';with " Some identifier like a single character s2 as ( SELECT *, row_number() over(partition by last_name, first_name, scheduled_start_date, birth_date, address1 order by ISNULL(birth_date,''), MAX(scheduled_start_date)) as row_number FROM "some reference to a table" )UPDATE "short reference to table" SET is_active='N'WHERE row_number > 1
I have no idea what needs to be done in relation to the query I provided above in the beginning of the question vs what I need to do with this. I am confused about WHERE this would go and what needs to happen to the query above in order for it to accept the code Jim provided. The data I am have has duplicate rows of names and addresses of the same people who may have had a phone number or address change. The goal is to use the most recent visit as the tru current record to count only once. otherwise we have 400k records returned.
What I get confused about is if I have the records returned in a large select statement as above. Do I encapsulate that whole query and define it as "Data" then say Can someone give me an example of placement "where" does the Soft Delete code go" Do I need to create a CTE or insert a table? I can often find an example of the code it is "where does the code go" What has to happen prior. Do I have to give the query above a name and use the insert function then below the "framed query" then do I create a separate query where the soft delete goes...?