Solved

merge duplicate record details in same row..

Posted on 2013-10-22
4
257 Views
Last Modified: 2013-11-04
select count(*), EMP_ID FROM EMPHIST WHERE EMPHIST_DATE = '2012-01-01' GROUP BY EMP_ID HAVING COUNT(*)=2 order by 1
There are some entries of Employees listed in more than one office for a particular date.

EMPHIST can be joined OFFICE &  LOCATION to get some other details..

we need a list of the duplicate in one row if possible..

example: Let's say John Adams in listed under Office A and also Office B as of Jan 1st 2012.. we want to report such cases.. with the duplicate info all in one row..
1st column: Office A's code (in Office table)
2nd Column: Office A's Description (in Office table)
3rd Column: County of Office A (in Office table)
4th Column: Location Number of Office A (in LOCATION table)
5th Column: Location Name of Office A(LOCATION table)
6th Column: Employee ID
7th column: Employee FirstName
8th column: Employee LastName
9st column: Office B's code (in Office table)
10th Column: Office B's Description (in Office table)
11th Column: County of Office B (in Office table)
12th Column: Location Number of Office B(in LOCATION table)
13th Column: Location Name of Office B(LOCATION table)
14th Column: Employee ID
15th column: Employee FirstName
16th column: Employee LastName

EmpHist and Office link through Office_key (PK-FK)

EmpHist.OFFICE_KEY = Office.OFFICE_KEY

office links to Location..
Office.Location_KEY = Location.Location_KEY

how would you merge these 2 sets of 8 columns into one row (16 columns) in tsql?
0
Comment
Question by:25112
  • 2
  • 2
4 Comments
 
LVL 40

Expert Comment

by:Sharath
ID: 39592853
Check this. Not thested though as I do not have any sample data. Change the column/table names as per your schema.
;WITH EMPHIST_CTE 
     AS (SELECT * 
           FROM (SELECT *, 
                        COUNT(*) OVER ( partition BY EMP_ID, EMPHIST_DATE) cnt, 
                        ROW_NUMBER() OVER ( partition BY EMP_ID, EMPHIST_DATE ORDER BY OFFICE_KEY) rn 
                   FROM EMPHIST 
                  WHERE EMPHIST_DATE = '2012-01-01') t1 
          WHERE cnt = 2), 
     EMPHIST_CTE2 
     AS (SELECT EMP_ID, 
                EMP_FNAME, 
                EMP_LNAME, 
                MAX(CASE rn 
                      WHEN 1 THEN OFFICE_KEY 
                    END) OFFICE_KEY1, 
                MAX(CASE rn 
                      WHEN 2 THEN OFFICE_KEY 
                    END) OFFICE_KEY2 
           FROM EMPHIST_CTE 
          GROUP BY EMP_ID, 
                   EMP_FNAME, 
                   EMP_LNAME) 
SELECT o1.Code,o1.Description,o1.Country,l1.Location_Number,l1.Location_Name,
       e1.EMP_ID,e1.EMP_FNAME,e1.EMP_LNAME,
	   o2.Code,o2.Description,o2.Country,l2.Location_Number,l2.Location_Name 
  FROM EMPHIST_CTE2 e1 
       JOIN OFFICE o1 
         ON e1.OFFICE_KEY1 = o1.OFFICE_KEY 
       JOIN LOCATION l1 
         ON o1.Location_KEY = l1.Location_KEY 
       JOIN OFFICE o2 
         ON e1.OFFICE_KEY2 = o1.OFFICE_KEY 
       JOIN LOCATION l1 
         ON o1.Location_KEY = l1.Location_KEY 

Open in new window

0
 
LVL 5

Author Comment

by:25112
ID: 39593136
thanks very much, Sharath.. i tweaked it a little and it works good, now...

sometimes the firstname/lastname are not guarenteed to be the same in duplicates..it may have a small letter transposed or written differently.. (example- Matt vs Matthew etc)  so i worked around it as you can see below..

because of this i had to call EMPHIST twice at the end.. so i just am including EMPID and OFFICE_KEY in the first CTE instead of calling the whole table ..

i also had to add 2 WHERE conditions after the final joins (s1.EMPHIST_Date = '2012-01-01' and s2.EMPHIST_Date = '2012-01-01') as there may be records with other dates also.

i first ran the first CTE and it took 30 minutes.. maybe it cached it.. then when i ran the whole thing, it ran in 4 minutes.. (30k rows)..

0)with the above logic tweaks, is there any other code changes that can benefit the query?
1)with the below, can you recommend any indexes that can cut down the run time?
2)can you save this resultset to a static table through code (like 'SELECT * INTO tmp FROM SOMETABLE' syntax)
3)if I want to run this for 3 duplicate records, i just have to change "=2" to "=3", right? (and add a few more EMPHIST joins at the end...) it will have to be seperate reports/query for 2,3,4 duplicates and so forth, right?
-----------------------------------------------------------------------------
;WITH EMPHIST_CTE
     AS (SELECT *
           FROM (SELECT EMP_ID, OFFICE_KEY,
                        COUNT(*) OVER ( partition BY EMP_ID) cnt,
                        ROW_NUMBER() OVER ( partition BY EMP_ID ORDER BY EMP_ID) rn
                   FROM EMPHIST
                  WHERE EMPHIST_Date = '2012-01-01' ) t1
          WHERE cnt = 2),
     EMPHIST_CTE2
     AS (SELECT EMP_ID,
                MAX(CASE rn
                      WHEN 1 THEN OFFICE_KEY
                    END) OFFICE_KEY1,
                MAX(CASE rn
                      WHEN 2 THEN OFFICE_KEY
                    END) OFFICE_KEY2
           FROM EMPHIST_CTE
          GROUP BY EMP_ID)
SELECT o1.Code,o1.Description,o1.County,l1.Location_Number,l1.Location_Name,
       e1.EMP_ID,s1.EMP_FNAME,s1.EMP_LNAME,
         o2.Code,o2.Description,o2.County,l2.Location_Number,l2.Location_Name,
         s2.EMP_FNAME,s2.EMP_LNAME,
  FROM EMPHIST_CTE2 e1
            JOIN EMPHIST s1
            ON e1.EMP_ID = s1.EMP_ID and e1.OFFICE_KEY1 = s1.OFFICE_KEY
       JOIN OFFICE o1
         ON e1.OFFICE_KEY1 = o1.OFFICE_KEY
       JOIN LOCATION l1
         ON s1.LOCATION_KEY = l1.LOCATION_KEY
            JOIN EMPHIST s2
            ON e1.EMP_ID = s2.EMP_ID and e1.OFFICE_KEY2 = s2.OFFICE_KEY
       JOIN OFFICE o2
         ON e1.OFFICE_KEY2 = o2.OFFICE_KEY
       JOIN LOCATION l2
         ON s2.LOCATION_KEY = l2.LOCATION_KEY
         WHERE s1.EMPHIST_Date = '2012-01-01'
         AND s2.EMPHIST_Date = '2012-01-01'
0
 
LVL 40

Accepted Solution

by:
Sharath earned 500 total points
ID: 39593256
I do not suggest joining to EMPHIST again because there will be duplicate records on EMP_ID which results in unnecessary duplicates records.

<< sometimes the firstname/lastname are not guarenteed to be the same in duplicates..it may have a small letter transposed or written differently.. (example- Matt vs Matthew etc)  so i worked around it as you can see below.. >>

In this case, how do you want to display FIRST NAME and LAST NAME. Can I just select anyone combination and display? If that is fine, you can try like this.
;WITH EMPHIST_CTE 
     AS (SELECT * 
           FROM (SELECT *, 
                        COUNT(*) OVER ( partition BY EMP_ID, EMPHIST_DATE) cnt, 
                        ROW_NUMBER() OVER ( partition BY EMP_ID, EMPHIST_DATE ORDER BY OFFICE_KEY) rn 
                   FROM EMPHIST 
                  WHERE EMPHIST_DATE = '2012-01-01') t1 
          WHERE cnt = 2), 
     EMPHIST_CTE2 
     AS (SELECT EMP_ID, 
                MAX(CASE rn WHEN 1 THEN EMP_FNAME END) EMP_FNAME, 
				MAX(CASE rn WHEN 1 THEN EMP_LNAME END) EMP_LNAME, 
                MAX(CASE rn WHEN 1 THEN OFFICE_KEY END) OFFICE_KEY1, 
                MAX(CASE rn WHEN 2 THEN OFFICE_KEY END) OFFICE_KEY2 
           FROM EMPHIST_CTE 
          GROUP BY EMP_ID) 
SELECT o1.Code,o1.Description,o1.Country,l1.Location_Number,l1.Location_Name,
       e1.EMP_ID,e1.EMP_FNAME,e1.EMP_LNAME,
	   o2.Code,o2.Description,o2.Country,l2.Location_Number,l2.Location_Name 
  FROM EMPHIST_CTE2 e1 
       JOIN OFFICE o1 
         ON e1.OFFICE_KEY1 = o1.OFFICE_KEY 
       JOIN LOCATION l1 
         ON o1.Location_KEY = l1.Location_KEY 
       JOIN OFFICE o2 
         ON e1.OFFICE_KEY2 = o1.OFFICE_KEY 
       JOIN LOCATION l1 
         ON o1.Location_KEY = l1.Location_KEY 

Open in new window

<< 3)if I want to run this for 3 duplicate records, i just have to change "=2" to "=3", right? (and add a few more EMPHIST joins at the end...) it will have to be seperate reports/query for 2,3,4 duplicates and so forth, right? >>

Along with changing from "=2" to "=3", you need to add another clause to get OFFICE_KEY3 based on rn = 3 in CTE2. And add additional JOIN conditions to OFFICE and LOCATION.

<< i first ran the first CTE and it took 30 minutes.. maybe it cached it.. then when i ran the whole thing, it ran in 4 minutes.. (30k rows)..>>

I would suggest to run again with the modifed version and check the performance. Let me know if it is still taking long time than expected.
0
 
LVL 5

Author Comment

by:25112
ID: 39594047
Thanks- from 4 minutes it is cut to 1 minute- your idea was great..

1)
i liked how you avoided the extra joins by modifying the first CTE to partition by the date also.
this is very efficient than my original thinking.. . but i did not understand your comment
'I do not suggest joining to EMPHIST again because there will be duplicate records on EMP_ID which results in unnecessary duplicates records.'
it would have been inefficient, but can you please tell me how it would have created dups?

2)
Also combining the name suggestion was nice. It is reasonable enough to have Matt for Matthew or vice versa.
if someone's maiden name is changed for example, and if that is important, i have to go back to adding those extra joins from the previous steps, right?   right now, we'll leave it as it is- it is good.

2a)
one change i did which you will see is we need LOCATION_KEY for the joins. So i brought it into the first CTE.

2b)
i also modified it for upto 4 duplicates, with 2 more left joins.. so there will be lot of NULLs, which is OK... please comment if they are efficient, with 2 more LEFT joins, as many users could be running the same report at the same time.

attached file: Query1.sql


3)
I needed one more tweak...  (pl refer query2.sql)

several offices will be running the report at the same time.. but the office must see only their set of data.
so for example, Office A is running a report, and they want to see which other offices have that employee (say, EmployeeA) listed under them for that date. This OFFICE_KEY will be sent as a parameter.
Office B, Office C, and Office D are also sharing that employee for that date. When Office A runs it they should see the details of EmpID of EmployeeA and the details of Office B,Office C and Office D for that Employee in the report.
Now they may be sharing EmployeeB with Office E & Office F.. so that line also with those 2 records should be in the report.

so with the above example, the result will be like this (Office A running to see which of their Employees were listed under other offices for that date)

EmployeeA Office_B_Details Office_C_Details Office_D_Details
EmployeeB Office_E_Details Office_F_Details NULL

etc

When OfficeB runs the report, they will see a totally different set of result..

so we need some kind of filtering with the OFFICE_KEY that will be passed... (this will reduce (upto) 4 sets of data that we  have now down to 3) in the attached example, 11203 is the office_key that is requesting the data.

i tried one solution, it is attached as Query2.sql.. can you please review it and give me your comments.

i wanted to make sure the requesting officekey is always the first one (rn=1).. hence i created a temp table and deleted the records that dont have this office at all, and then i rearrange the records to make sure this Office is always on the top.. then I display only 3 sets of data (o2,o3,o4) as o1 will be the requesting office.. this likely is not too efficient .. when first executed, it took 2 minutes.. just after a few runs, it is executing as previously (1 minute)... maybe it is caching it.. also i just realized i need to put isnull for o3 and o4 values since they may be NULLs.. but o1 & o2 should always have a value

please suggest a better way to do this..

thanks again very much Sharath.
Query1.sql
Query2.sql
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Hi all, It is important and often overlooked to understand “Database properties”. Often we see questions about "log files" or "where is the database" and one of the easiest ways to get general information about your database is to use “Database p…
If you find yourself in this situation “I have used SELECT DISTINCT but I’m getting duplicates” then I'm sorry to say you are using the wrong SQL technique as it only does one thing which is: produces whole rows that are unique. If the results you a…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now