Skip import of duplicate records (and merging) according to latest note.

Hi Experts,
We have two tables,
Employeestbl
ID (PK)
Email
SkipImport (int)

NotesTbl
ID (PK)
EmployeeID
Date
Notes

We are in middle of importing these data into a new platform.
Since we realized there're duplicate records in Employeestbl (according to email), we want import only those who have the latest note (according to date field).
Therefore I would like to have the SkipImport field updated to 1 for all those that dont have the latest note.
However if there is no note whatsoever under that email, then all records should be imported.

In addition I would like to merge all notes into one employee records.
Meaning all notes under same email should be transferred to the Employee record having the latest note.

What is the easiest way to accomplish it?

Work could be done either in SQL or in Access.
Thanks in advance.
LVL 6
bfuchsAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

yo_beeDirector of Information TechnologyCommented:
Are these two tables joined and if so what is the join?
0
bfuchsAuthor Commented:
Hi,
Employeestbl.ID = Notestbl.EmployeeID

Here is something I was in middle doing in Access, linked to those tables.
Sheet is a table containing all duplicate emails.

SELECT DISTINCT dbo_NotesTbl.EmployeeID, dbo_NotesTbl.Date
FROM (dbo_NotesTbl INNER JOIN dbo_Employeestbl ON dbo_NotesTbl.EmployeeID = dbo_Employeestbl.ID) INNER JOIN Sheet ON dbo_Employeestbl.Email = Sheet.EMail;

Open in new window


Thanks,
Ben
0
yo_beeDirector of Information TechnologyCommented:
if the Notes field is null will there be more than 1 record?
0
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

PatHartmanCommented:
Create a table with a proper unique index to prevent future duplicates.  You may want an autonumber as the primary key but you can still create unique indexes on other fields.  Create a query that selects the data from the original table and sort it descending so that the record you want to add sorts first.  All subsequent duplicates will be discarded.  There is no need for joins or any complex logic.  Just get the selected data into the desired order and run a simple append query.
0
bfuchsAuthor Commented:
@y0_bee,
In notestbl there should be no records with notes field null.

@Pat,
Each table has a primary key named ID, however in our system it was okay for two records to have same email address, while in the new system its not.

I'm not getting how are you planning to accomplish the task with that suggested query?

Again I need first to set which of the duplicates will remain and which will be removed.
Then I need to take all notes from those being removed and a transfer them to the main (the account that will remain).

So for example if records ID 1,3 and 5 in employees table share same email, and each have 3 records in notes table, then I want to keep only one record in employees table, the one who has the latest note in notes table, and all 9 records in notes table should be assigned to that employees record.

The last task could be either an update query or an append query.

Thanks,
Ben
0
Mark WillsTopic AdvisorCommented:
I see multiple steps involved.... Once the data is in SQL Server, using T-SQL

-- Part 1
-- Part 1
-- Find Employees with Dupe emails
-- and update those ID's to SkipImport

;with CTE_Emp as
( select row_number() over (partition by E.Email order by N.[Date] desc) as RN, E.ID as E_ID, E.Email, E.SkipImport,N.*
  from EmployeesTbl E
  outer apply (select top 1 * from NotesTbl N where N.EmployeeID = E.id order by [date] desc) N
) update E set skipimport = case when RN = 1 then 0 else 1 end 
  from Employeestbl E
  inner join CTE_Emp C on C.E_ID = E.ID 
  
--  Select * from CTE_Emp

Open in new window



-- Part 2   WARNING - CANNOT UNDO WITHOUT BACKUP
-- Part 2   WARNING - CANNOT UNDO WITHOUT BACKUP
-- Consolidate Notes into a Single Note for an EmployeeID
-- Update the Notes table with the consolidated Notes.
-- Assumes Notes column is varchar(max)
-- Backup before committing the updates

;with CTE_Emp as
( select E.ID as E_ID, E.Email, E.SkipImport,N.*
  from Employeestbl E
  Inner Join NotesTbl N on N.EmployeeID = E.id 
) , CTE_Notes as
( select E.ID, E_ID, E.Email, E.SkipImport, Stuff(N.Notes,1,2,'') as Concat_Notes
  from CTE_Emp E
  Cross apply (Select ', '+Notes from NotesTbl where EmployeeID in (select E_ID from CTE_Emp C where C.email = E.Email) order by [Date] for xml path ('')) N(Notes)
) update N set Notes = C.Concat_Notes 
  from NotesTbl N
  inner join CTE_Notes C on C.ID = N.ID 
  
--  select * from CTE_Notes

Open in new window



-- Part 3   WARNING - CANNOT UNDO WITHOUT BACKUP
-- Part 3   WARNING - CANNOT UNDO WITHOUT BACKUP
-- Find Note ID's for SkipImport Employees
-- Find Note ID's for duplicate EmployeeID
-- Backup before committing the deletes

;With CTE_Delete as
( select 2 as RN, N.ID from Employeestbl E inner join NotesTbl N on E.ID = N.EmployeeID where E.SkipImport = 1 
  Union
  select row_Number() over (Partition by EmployeeID Order by Employeeid), ID from NotesTbl
) Delete from NotesTbl where ID in (select ID from CTE_Delete where RN > 1)  
 
--  select * from CTE_Delete where RN > 1

Open in new window

In each of the above CTE's you can comment out the Update or Delete and uncomment the select below and you can see the process before you commit. I would be inclined to let the SkipImport update happen in Part 1 - regardless - very easy to update/reset SkipImport
0
John TsioumprisSoftware & Systems EngineerCommented:
I think the best thing would be to have a list of the duplicates and based on that update all the notes who carry Duplicates to map to Unique Ones.
e,g. Employee has 3 IDs, 5,11,28
We keep only one of them...e.g. 5
And then on the Notes Tables we update it
EmployeeID | Note
5....................|abcdef
11..................|klmnop
28..................|xxyyzzz

it will become
5....................|abcdef
5....................|klmnop
5....................|xxyyzzz
and finally
5....................|abcdef - klmnop -xxyyzzz
1
Mark WillsTopic AdvisorCommented:
@John,

That would certainly be a lot easier to update the erroneous EmployeeID's with the correct EmployeeID's in the NotesTbl table - it also preserves the [Date] history and therefore no need to concatenate...

So, ignore the "and finally" part :)
0
John TsioumprisSoftware & Systems EngineerCommented:
If i could take a guess i would say that probably these Notes as not important regarding chronologically order but as plain info ...thats why i put the finally part.
0
yo_beeDirector of Information TechnologyCommented:
I have a couple of questions.
If you run

Select * From Employeestbl are you still getting duplicates or the duplicates only happening when you john the two tables.
Does it look like this when you run the select without joining the tables?
http://sqlfiddle.com/#!18/8fe9e/1/0

From your two tables is looks like the employeetbl is one to many on the NotesTable.
0
PatHartmanCommented:
@Pat,
Each table has a primary key named ID, however in our system it was okay for two records to have same email address, while in the new system its not.
I did mention that possibility.  That is why you need to add a unique index on email address.

When you have dependent data that needs to preserved, that is a whole different problem.  You still need to create the first table with the unique ID as I described.  Add a temp column to the original version of this table to hold the ID of the record that got loaded.  Create an update query that joins the new table to the old table on email.  If the PKs of the table are different, update the temp column with the PK from the new table.  Now you have a way of joining to the dependent table on the OLD pk-fk and you can update the fk with the value from the temp table.

When you copy the old data, be sure to also copy the identity column.  You don't want to have to figure out how to match dependent tables and fix up their foreign keys.
0
bfuchsAuthor Commented:
Hi Experts,

@Mark,
In each of the above CTE's you can comment out the Update or Delete and uncomment the select below and you can see the process before you commit.
for Part1
Tested with one record where both have a note, one from yesterday and one from today, and in select statement SkipImport is null for both.

FYI- I'm using SQL 2008.

Thanks,
Ben
0
yo_beeDirector of Information TechnologyCommented:
@BF

Can you confirm my question
0
bfuchsAuthor Commented:
@Yo_bee,

Sorry for delay as still working on implementing Marks suggestions..

Select * From Employeestbl are you still getting duplicates..
Yes, as mention in our old system 2 employees where allowed to have same email.

From your two tables is looks like the employeetbl is one to many on the NotesTable.
Correct.

Thanks,
Ben
0
yo_beeDirector of Information TechnologyCommented:
Let me know if this works.


http://sqlfiddle.com/#!18/61c8e/3/0

SELECT     empTbl.IDx, empTbl.email, empTbl.SkipImport,  NoteTbl.EmpId,max(NoteTbl.Date) as [Date], NoteTbl.Notes
FROM         empTbl INNER JOIN
                      NoteTbl ON empTbl.IDx = NoteTbl.EmpId inner join (select empid,Max([date]) maxdate from NoteTbl group by empid) as x on x.maxdate = NoteTbl.Date and x.EmpId = NoteTbl.EmpId
Group by empTbl.IDx, empTbl.email, empTbl.SkipImport,  NoteTbl.EmpId, NoteTbl.Notes

Open in new window

0
bfuchsAuthor Commented:
@Mark,
Disregard my previous comment.
Part1 seems to be working perfectly.
(Didnt realized, I had to run the update first in order to see in the select)
Will test the rest and let you know.

@Yoo_bee
Yours gives me an error.
Msg 208, Level 16, State 1, Line 1
Invalid object name 'empTbl'.

Thanks,
Ben
0
yo_beeDirector of Information TechnologyCommented:
This is my test your table and column names maybe different.  Goto my link in SQL Fiddle to see what the results will look like
0
bfuchsAuthor Commented:
@Mark,
-- Consolidate Notes into a Single Note for an EmployeeID
I'm not getting it, why are you consolidating notes?!
I basically want all notes to remain the same, just to be appended to the record in employees table with SkipImport = 0.
So when we import only the employees with SkipImport = 0, only one employee per email will be there.
While all notes will be under that employee.

Thanks,
Ben
0
bfuchsAuthor Commented:
@Yo_bee,

I'm not getting why running without a join?
However just tried your link, doesn't work either..
See attached.

Thanks,
Ben
Capture.PNG
0
bfuchsAuthor Commented:
@Mark,

BTW, in Part2 column name is Note and defined as text.
Getting the following msg
Msg 402, Level 16, State 1, Line 12
The data types varchar and text are incompatible in the add operator.
Msg 207, Level 16, State 1, Line 10
Invalid column name 'Note'.

Open in new window

However as mentioned, I dont think its necessary to consolidate.

Thanks,
Ben
0
Mark WillsTopic AdvisorCommented:
Why consolidate Notes ? Because I thought that's what you wanted : "In addition I would like to merge all notes into one employee records."
But now understand that you want is more along the lines of the discussion John T and I had above at #a42552664 and #a42552753

Which makes it a lot easier.... But please check that discussion above and confirm....
 
Cheers,
Mark Wills
0
yo_beeDirector of Information TechnologyCommented:
Would you be able to post a small subset of CSV for us to see the actual data
0
Mark WillsTopic AdvisorCommented:
@bfuchs

Yep, was wondering about TEXT and made mention of using varchar(max) because of that possibility - but no matter - we dont need to concatenate - just update the old EmployeeID for the new one

Please have a look at :

Part 1 (think that was confirmed as working)
-- Part 1
-- Find Employees with Dupe emails
-- and update those ID's to SkipImport

;with CTE_Emp as
( select row_number() over (partition by E.Email order by N.[Date] desc) as RN, E.ID as E_ID, E.Email, E.SkipImport,N.*
  from EmployeesTbl E
  outer apply (select top 1 * from NotesTbl N where N.EmployeeID = E.id order by [date] desc) N
) 
  update E set skipimport = case when RN = 1 then 0 else 1 end 
  from Employeestbl E
  inner join CTE_Emp C on C.E_ID = E.ID 
  
--  Select * from CTE_Emp

Open in new window

New Part 2   WARNING - CANNOT UNDO WITHOUT BACKUP
-- Part 2   WARNING - BACKUP
-- Consolidate Notes into a Single Note for an EmployeeID
-- Update the Notes table with the consolidated Notes.
-- Backup before committing the updates

;with CTE_Emp as  
( select E.ID as E_ID, E.Email, E.SkipImport,NEW_ID,N.*
  from EmployeesTbl E
  Inner Join NotesTbl N on N.EmployeeID = E.id 
  Cross apply (select ID from EmployeesTbl C where C.email = E.Email and C.SkipImport = 0) C(NEW_ID)
) 
  update N set EmployeeID = E.NEW_ID 
  from NotesTbl N
  inner join CTE_Emp E on E.ID = N.ID 
  where E_ID <> NEW_ID
  
-- select * from CTE_Emp

select * from NotesTbl

Open in new window

Part 3 is no longer needed....
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
bfuchsAuthor Commented:
Hi Experts,

@Mark,
I was not in the office today, will try to test that on Sunday & keep you posted.

@Yoo_bee
Will see next week if Mark's Part2 latest version doesn't work, guess will do that.


Have a nice weekend!

Thanks,
Ben
0
bfuchsAuthor Commented:
This worked.
Thank you!
0
bfuchsAuthor Commented:
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Query Syntax

From novice to tech pro — start learning today.