Automatically Step through Data in SQL Server Table to be able to compare records.

Hello,

I am trying to develop a SQL Statement to systematically step throug all records in a table and compare them to each other.  I have been looking at cursors to do this, but not sure this is the best way.  All assistance is greatly appreciated.

-Matt
matt_wallsAsked:
Who is Participating?
 
Scott PletcherConnect With a Mentor Senior DBACommented:
--assuming extra bit column called "keepFlag" added to mark which rows are "good"

UPDATE dataTable
SET dataTable.keepFlag = 1
FROM dataTable
INNER JOIN (
    SELECT MAX(CASE WHEN [Employment Status] = 'A' THEN id ELSE 0 END) AS [MaxActiveId],
        MAX(CASE WHEN [Employment Status] = 'T' THEN id ELSE 0 END) AS [MaxInactiveId],
        MAX(CASE WHEN [Employment Status] = 'A' AND [FTE Status] = 1 THEN id ELSE 0 END) AS [MaxFTEStatusFullId],
        MAX(CASE WHEN [Employment Status] = 'A' AND [FTE Status] <> 1 THEN id ELSE 0 END) AS [MaxFTEStatusNotFullId]
    FROM dataTable
    GROUP BY uniquekey
) AS m ON CASE WHEN m.MaxActiveId > 0 THEN CASE WHEN m.MaxFTEStatusFullId > 0 THEN m.MaxFTEStatusFullId ELSE m.MaxFTEStatusNotFullId END ELSE m.MaxInactiveId END = dataTable.id

DELETE FROM dataTable
WHERE keepFlag <> 1
0
 
Scott PletcherSenior DBACommented:
You should avoid cursors if at all possible.  You'll get *vastly* better from a SELECT statement, which is set-based rather than row-based, if you can use it instead.

Please give more details on what you're trying to do and maybe we can help :-) .
0
 
rafranciscoCommented:
You don't need to use a cursor.  You can easily generate a SQL statement that will return to you records between 2 tables that are different.  Give us an example of what you are trying to accomplish.
0
Cloud Class® Course: Amazon Web Services - Basic

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

 
matt_wallsAuthor Commented:
OK, I am loading person data from 5 different sources into one table.  From these 5 sources, there is the possibility that a person may exist in more than one source, so there could be multiple records per person in the table.  SSN is the UK on the person record, so it should be easy to detect duplicates.  Dealing with the duplicates should be the easy part, as I will do compares on certain fields to determine the record to keep.

My true dilema is trying to determine how to systematically step through the single table, and compare each record to every other record in the table.

0
 
ptjcbCommented:
As Scott said - "Avoid cursors". SQL works with sets of data, not with individual rows, in the fastest way. Do not think of working row-by-row (that is procedural programming), but re-think how you want to do it and work with it as a set.
0
 
Brian CroweDatabase AdministratorCommented:
Do the 5 tables have identical schema?  If you insert a UNION (without the ALL qualifier) you should avoid any duplicates.  It's hard to be more precise without some more information about your source and destination tables.
0
 
Scott PletcherSenior DBACommented:
It sounds as if you are going to load all 5 potential sources into one table, then resolve the data in that table.

You can do that with an UPDATE statement with CASE logic, followed by a DELETE to get rid of the now obsolete rows with duplicate SSNs.
0
 
matt_wallsAuthor Commented:
The initial plan is to import 5 text files into 1 table, and then developing the SQL to step through the single table and compare the records for duplicates.  However, I need to determine which record to keep based on other criteria, such as active/inactive status, full time/part time (if the person is active in 2 different sources (regions))

ptjcb, I'm not sure I understand fully what you are saying.  When you say set, I am thinking of a rowset/recordset, like in vbscript or something.
0
 
Brian CroweDatabase AdministratorCommented:
DELETE Table1
WHERE status = ???
     OR fulltime = ???
     OR ...
0
 
matt_wallsAuthor Commented:
Pardon my ignorance, but I think that I am maybe not explaining this correctly.

This is going to be set up as an automated process, ran on a weekly basis, to "combine" the 5 sources.
I need a job to automatically kick off that will do all of the compares, and leave me with a table containing only one record per person.  I think I can accompllish the compares ok, my problem is how to get it to start with record 1, compare to all of the records in the table to check for duplicates, do the necessary processing, then step to record 2, record 3.... etc. through the entire table, and do the compares.
0
 
Brian CroweDatabase AdministratorCommented:
You really need to be more specific if you want a specific answer.  Is your table schema a secret?  Based on what I can piece together you don't need a cursor but you can't seem to get out of the rut of thinking procedurally instead of in sets.  If you post your table schema and the logic you need to enforce you'll have at least 2 or 3 workable solutions in 20 minutes.  There are a number of us that get just giddy at the idea of writing code to solve other peoples problems :-)
0
 
Scott PletcherSenior DBACommented:
>> my problem is how to get it to start with record 1, compare to all of the records in the table to check for duplicates, do the necessary processing, then step to record 2, record 3.... etc. through the entire table, and do the compares. <<

Using sql, you don't have to do that row by row.  A single UPDATE statement (possibly with subquery(s)) can consolidate all the data into one row, then a single DELETE (possibly with subquery(s)) can get rid of all the duplicate rows.
0
 
Scott PletcherSenior DBACommented:
Btw, it would be helpful for processing if you have an identity column on this table.  If you don't have one now, please consider adding one.
0
 
matt_wallsAuthor Commented:
Schema is definitely not a secret.

For sake of brevity, I will list the pertinent fields.
uniquekey (SSN)
Last_Name
First_Name
Employment Status - A=Active - T=Inactive
FTE Status - 0.5 = Part time 1=full time (If person active in 2 files, part time in one, full time in another, the full time record takes precedence)

compare entire table to find duplicate records (SSN).  If duplicates exist, and only 1 is active, keep that record, if multiple actives, keep the greatest fte status.

Let me know if more info is needed.

Thanks,
0
 
Scott PletcherSenior DBACommented:
You will need some unique value on the row.  An IDENTITY column is probably the easiest to add.
0
 
Brian CroweConnect With a Mentor Database AdministratorCommented:
I definitely agree that you should add an IDENTITY column.  In that case I would go with something like...

DELETE myTABLE
WHERE ID NOT IN (SELECT TOP 1 ID FROM myTable T2
     WHERE T2.SSN = myTable.SSN
     ORDER BY T2.[Employment Status] ASC, T2.[FTE Status] DESC)
0
 
matt_wallsAuthor Commented:
thanks to all for feedback thusfar!

Hi Scott,

I can definitely add the Identity column - not a problem.

Can you elaborate on your Update command (with possible sub-queries) with thoughts of how best to tackle this.

Thanks,
-matt
0
 
Scott PletcherConnect With a Mentor Senior DBACommented:
Assuming there's an identity column named "id", something like the following will allow you to pull out only the "good" rows.  [If needed, could also be adjusted to deleted "bad"/duplicate rows.]


INSERT INTO newTable
SELECT d.uniquekey, ...d.~all other columns except id~...
FROM dataTable d
INNER JOIN (
    SELECT MAX(CASE WHEN [Employment Status] = 'A' THEN id ELSE 0 END) AS [MaxActiveId],
        MAX(CASE WHEN [Employment Status] = 'T' THEN id ELSE 0 END) AS [MaxInactiveId],
        MAX(CASE WHEN [Employment Status] = 'A' AND [FTE Status] = 1 THEN id ELSE 0 END) AS [MaxFTEStatusFullId],
        MAX(CASE WHEN [Employment Status] = 'A' AND [FTE Status] <> 1 THEN id ELSE 0 END) AS [MaxFTEStatusNotFullId]
    FROM dataTable
    GROUP BY uniquekey
) AS m ON CASE WHEN m.MaxActiveId > 0 THEN CASE WHEN m.MaxFTEStatusFullId > 0 THEN m.MaxFTEStatusFullId ELSE m.MaxFTEStatusNotFullId END ELSE m.MaxInactiveId END = d.id
0
 
matt_wallsAuthor Commented:
Scott this seemed to work perfectly!  I would have never thought to use CASE, let alone how to use it.

BriCrowe - I appreciate your diligence, and information provided.

You guys both helped get me going in the right direction.

0
 
Scott PletcherSenior DBACommented:
Great, glad it helped!  Yeah, SQL is pretty powerful once you get used to it ...
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.