?
Solved

Automatically Step through Data in SQL Server Table to be able to compare records.

Posted on 2005-04-13
20
Medium Priority
?
372 Views
Last Modified: 2012-06-27
Hello,

I am trying to develop a SQL Statement to systematically step throug all records in a table and compare them to each other.  I have been looking at cursors to do this, but not sure this is the best way.  All assistance is greatly appreciated.

-Matt
0
Comment
Question by:matt_walls
  • 8
  • 6
  • 4
  • +2
20 Comments
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 13775467
You should avoid cursors if at all possible.  You'll get *vastly* better from a SELECT statement, which is set-based rather than row-based, if you can use it instead.

Please give more details on what you're trying to do and maybe we can help :-) .
0
 
LVL 28

Expert Comment

by:rafrancisco
ID: 13775476
You don't need to use a cursor.  You can easily generate a SQL statement that will return to you records between 2 tables that are different.  Give us an example of what you are trying to accomplish.
0
 

Author Comment

by:matt_walls
ID: 13775520
OK, I am loading person data from 5 different sources into one table.  From these 5 sources, there is the possibility that a person may exist in more than one source, so there could be multiple records per person in the table.  SSN is the UK on the person record, so it should be easy to detect duplicates.  Dealing with the duplicates should be the easy part, as I will do compares on certain fields to determine the record to keep.

My true dilema is trying to determine how to systematically step through the single table, and compare each record to every other record in the table.

0
Granular recovery for Microsoft Exchange

With Veeam Explorer for Microsoft Exchange you can choose the Exchange Servers and restore points you’re interested in, and Veeam Explorer will present the contents of those mailbox stores for browsing, searching and exporting.

 
LVL 27

Expert Comment

by:ptjcb
ID: 13775597
As Scott said - "Avoid cursors". SQL works with sets of data, not with individual rows, in the fastest way. Do not think of working row-by-row (that is procedural programming), but re-think how you want to do it and work with it as a set.
0
 
LVL 34

Expert Comment

by:Brian Crowe
ID: 13775637
Do the 5 tables have identical schema?  If you insert a UNION (without the ALL qualifier) you should avoid any duplicates.  It's hard to be more precise without some more information about your source and destination tables.
0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 13775667
It sounds as if you are going to load all 5 potential sources into one table, then resolve the data in that table.

You can do that with an UPDATE statement with CASE logic, followed by a DELETE to get rid of the now obsolete rows with duplicate SSNs.
0
 

Author Comment

by:matt_walls
ID: 13775702
The initial plan is to import 5 text files into 1 table, and then developing the SQL to step through the single table and compare the records for duplicates.  However, I need to determine which record to keep based on other criteria, such as active/inactive status, full time/part time (if the person is active in 2 different sources (regions))

ptjcb, I'm not sure I understand fully what you are saying.  When you say set, I am thinking of a rowset/recordset, like in vbscript or something.
0
 
LVL 34

Expert Comment

by:Brian Crowe
ID: 13775746
DELETE Table1
WHERE status = ???
     OR fulltime = ???
     OR ...
0
 

Author Comment

by:matt_walls
ID: 13775791
Pardon my ignorance, but I think that I am maybe not explaining this correctly.

This is going to be set up as an automated process, ran on a weekly basis, to "combine" the 5 sources.
I need a job to automatically kick off that will do all of the compares, and leave me with a table containing only one record per person.  I think I can accompllish the compares ok, my problem is how to get it to start with record 1, compare to all of the records in the table to check for duplicates, do the necessary processing, then step to record 2, record 3.... etc. through the entire table, and do the compares.
0
 
LVL 34

Expert Comment

by:Brian Crowe
ID: 13775882
You really need to be more specific if you want a specific answer.  Is your table schema a secret?  Based on what I can piece together you don't need a cursor but you can't seem to get out of the rut of thinking procedurally instead of in sets.  If you post your table schema and the logic you need to enforce you'll have at least 2 or 3 workable solutions in 20 minutes.  There are a number of us that get just giddy at the idea of writing code to solve other peoples problems :-)
0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 13776020
>> my problem is how to get it to start with record 1, compare to all of the records in the table to check for duplicates, do the necessary processing, then step to record 2, record 3.... etc. through the entire table, and do the compares. <<

Using sql, you don't have to do that row by row.  A single UPDATE statement (possibly with subquery(s)) can consolidate all the data into one row, then a single DELETE (possibly with subquery(s)) can get rid of all the duplicate rows.
0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 13776033
Btw, it would be helpful for processing if you have an identity column on this table.  If you don't have one now, please consider adding one.
0
 

Author Comment

by:matt_walls
ID: 13776062
Schema is definitely not a secret.

For sake of brevity, I will list the pertinent fields.
uniquekey (SSN)
Last_Name
First_Name
Employment Status - A=Active - T=Inactive
FTE Status - 0.5 = Part time 1=full time (If person active in 2 files, part time in one, full time in another, the full time record takes precedence)

compare entire table to find duplicate records (SSN).  If duplicates exist, and only 1 is active, keep that record, if multiple actives, keep the greatest fte status.

Let me know if more info is needed.

Thanks,
0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 13776173
You will need some unique value on the row.  An IDENTITY column is probably the easiest to add.
0
 
LVL 34

Assisted Solution

by:Brian Crowe
Brian Crowe earned 400 total points
ID: 13776296
I definitely agree that you should add an IDENTITY column.  In that case I would go with something like...

DELETE myTABLE
WHERE ID NOT IN (SELECT TOP 1 ID FROM myTable T2
     WHERE T2.SSN = myTable.SSN
     ORDER BY T2.[Employment Status] ASC, T2.[FTE Status] DESC)
0
 

Author Comment

by:matt_walls
ID: 13776302
thanks to all for feedback thusfar!

Hi Scott,

I can definitely add the Identity column - not a problem.

Can you elaborate on your Update command (with possible sub-queries) with thoughts of how best to tackle this.

Thanks,
-matt
0
 
LVL 70

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 1600 total points
ID: 13776303
Assuming there's an identity column named "id", something like the following will allow you to pull out only the "good" rows.  [If needed, could also be adjusted to deleted "bad"/duplicate rows.]


INSERT INTO newTable
SELECT d.uniquekey, ...d.~all other columns except id~...
FROM dataTable d
INNER JOIN (
    SELECT MAX(CASE WHEN [Employment Status] = 'A' THEN id ELSE 0 END) AS [MaxActiveId],
        MAX(CASE WHEN [Employment Status] = 'T' THEN id ELSE 0 END) AS [MaxInactiveId],
        MAX(CASE WHEN [Employment Status] = 'A' AND [FTE Status] = 1 THEN id ELSE 0 END) AS [MaxFTEStatusFullId],
        MAX(CASE WHEN [Employment Status] = 'A' AND [FTE Status] <> 1 THEN id ELSE 0 END) AS [MaxFTEStatusNotFullId]
    FROM dataTable
    GROUP BY uniquekey
) AS m ON CASE WHEN m.MaxActiveId > 0 THEN CASE WHEN m.MaxFTEStatusFullId > 0 THEN m.MaxFTEStatusFullId ELSE m.MaxFTEStatusNotFullId END ELSE m.MaxInactiveId END = d.id
0
 
LVL 70

Accepted Solution

by:
Scott Pletcher earned 1600 total points
ID: 13776322
--assuming extra bit column called "keepFlag" added to mark which rows are "good"

UPDATE dataTable
SET dataTable.keepFlag = 1
FROM dataTable
INNER JOIN (
    SELECT MAX(CASE WHEN [Employment Status] = 'A' THEN id ELSE 0 END) AS [MaxActiveId],
        MAX(CASE WHEN [Employment Status] = 'T' THEN id ELSE 0 END) AS [MaxInactiveId],
        MAX(CASE WHEN [Employment Status] = 'A' AND [FTE Status] = 1 THEN id ELSE 0 END) AS [MaxFTEStatusFullId],
        MAX(CASE WHEN [Employment Status] = 'A' AND [FTE Status] <> 1 THEN id ELSE 0 END) AS [MaxFTEStatusNotFullId]
    FROM dataTable
    GROUP BY uniquekey
) AS m ON CASE WHEN m.MaxActiveId > 0 THEN CASE WHEN m.MaxFTEStatusFullId > 0 THEN m.MaxFTEStatusFullId ELSE m.MaxFTEStatusNotFullId END ELSE m.MaxInactiveId END = dataTable.id

DELETE FROM dataTable
WHERE keepFlag <> 1
0
 

Author Comment

by:matt_walls
ID: 13777010
Scott this seemed to work perfectly!  I would have never thought to use CASE, let alone how to use it.

BriCrowe - I appreciate your diligence, and information provided.

You guys both helped get me going in the right direction.

0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 13777045
Great, glad it helped!  Yeah, SQL is pretty powerful once you get used to it ...
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In part one, we reviewed the prerequisites required for installing SQL Server vNext. In this part we will explore how to install Microsoft's SQL Server on Ubuntu 16.04.
This shares a stored procedure to retrieve permissions for a given user on the current database or across all databases on a server.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to setup several different housekeeping processes for a SQL Server.
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question