Solved

SQL finding duplicates

Posted on 2014-12-30
21
85 Views
Last Modified: 2014-12-31
Hi,
I have a complex situation:
consider this data structure:
Assembly
Component
QuantityPer

Each assembly is composed of a number of components a quantities

lets use item NNHD700DAN-2AC from data below
it contains
BLT-1300G5      2
NN-042DAN      8
NUT-375G      2
SLV-8000      2
WA-375G      8


i would like a way to find all assemblies that use same components

these assembles use same components
NNHD700DAN-2CB      
NNHD700DAN-2NA      
NNHD700DAN-2RB      

so if there are 12 variations  i would like to list each with all assembly names


i"m looking for a away to generate a [unique] code to identify the group of components then add the names and code to a group to a @temp table


one idea i had was to  concatenate components and quantity  into  a string
Component + CONVERT (VARCHAR(50),QuantityPer,128)
ex.  
BAG-MV4131
BLT-0900G51
BLT-0900G52
BLT-0900M51
BLT-0900M52
BU-042BLK4
BU-042BLK8
BU-042BLU4
BU-042RED4
BU-042RED8
INS-XUF1061
MG-042BLU4
NN-042BLK4
NN-042DAN8
NN-042PRB4
NN-042PRB8
NUT-375G1
NUT-375G2
NUT-375PTB1
NUT-375PTB2
SLV-45621
SLV-45622
SLV-4562BO1
SLV-4562BO2
WA-375G4
WA-375G8
WA-4512SM4
WA-4512SM8
WA-4512SMBO4
WA-4512SMBO8

then assign a unique number based on binary  numbers(which i"m not sure how to do yet)
1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536,131072,262144,524288,1048576 ...
this way each assembly can have a sum of combined components

add assembly and it's sum to temp table
then all with same sum are a group

Any ideas and help would be much appreciated!!!

Thank You in advance

here is sample data
[sample Data]
Assembly      Component      QuantityPer
HD700BLK      BLT-1300G5      1
HD700BLK      BU-042BLK      4
HD700BLK      NUT-375G      1
HD700BLK      SLV-8000      1
HD700BLK      WA-375G      4
HD700BLK-2      BLT-1300G5      2
HD700BLK-2      BU-042BLK      8
HD700BLK-2      NUT-375G      2
HD700BLK-2      SLV-8000      2
HD700BLK-2      WA-375G      8
HD700BLK-2QG      BLT-1300G5      2
HD700BLK-2QG      BU-042BLK      8
HD700BLK-2QG      NUT-375G      2
HD700BLK-2QG      SLV-8000      2
HD700BLK-2QG      WA-375G      8
HD700BLKOE      BLT-1300G5      1
HD700BLKOE      BU-042BLK      4
HD700BLKOE      NUT-375G      1
HD700BLKOE      SLV-8000      1
HD700BLKOE      WA-375G      4
HD700RED      BLT-1300G5      1
HD700RED      BU-042RED      4
HD700RED      NUT-375G      1
HD700RED      SLV-8000      1
HD700RED      WA-375G      4
HD700RED-2      BLT-1300G5      2
HD700RED-2      BU-042RED      8
HD700RED-2      NUT-375G      2
HD700RED-2      SLV-8000      2
HD700RED-2      WA-375G      8
MGHD700BLU      BLT-1300G5      1
MGHD700BLU      MG-042BLU      4
MGHD700BLU      NUT-375G      1
MGHD700BLU      SLV-8000      1
MGHD700BLU      WA-375G      4
NNHD700BLK      BLT-1300G5      1
NNHD700BLK      NN-042BLK      4
NNHD700BLK      NUT-375G      1
NNHD700BLK      SLV-8000      1
NNHD700BLK      WA-375G      4
NNHD700DAN-2AC      BLT-1300G5      2
NNHD700DAN-2AC      NN-042DAN      8
NNHD700DAN-2AC      NUT-375G      2
NNHD700DAN-2AC      SLV-8000      2
NNHD700DAN-2AC      WA-375G      8
NNHD700DAN-2CB      BLT-1300G5      2
NNHD700DAN-2CB      NN-042DAN      8
NNHD700DAN-2CB      NUT-375G      2
NNHD700DAN-2CB      SLV-8000      2
NNHD700DAN-2CB      WA-375G      8
NNHD700DAN-2NA      BLT-1300G5      2
NNHD700DAN-2NA      NN-042DAN      8
NNHD700DAN-2NA      NUT-375G      2
NNHD700DAN-2NA      SLV-8000      2
NNHD700DAN-2NA      WA-375G      8
NNHD700DAN-2RB      BLT-1300G5      2
NNHD700DAN-2RB      NN-042DAN      8
NNHD700DAN-2RB      NUT-375G      2
NNHD700DAN-2RB      SLV-8000      2
NNHD700DAN-2RB      WA-375G      8
0
Comment
Question by:SLIMSHIM
  • 9
  • 9
  • 3
21 Comments
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
Would you put the sample data in the form of INSERT statements to a table rather than just a text splat of data.  It often takes longer to prep the data than write the query.

Are you looking for assemblies that share *all* components?  Where all of one assembly is used in another assembly, though the other assembly may have additional components?  Or where any component at all overlaps between assemblies?
0
 

Author Comment

by:SLIMSHIM
Comment Utility
Thanks for responding.
there are no sub assemblies.
so yes i"m looking for assemblies that share all components.
i will post insert statement shortly to generate data.

thanks again
0
 

Author Comment

by:SLIMSHIM
Comment Utility
Here is insert statement:
INSERT INTO BOM([Assembly],[Component],[QuantityPer])
SELECT 'HD700BLK', 'BLT-1300G5', '1' UNION
SELECT 'HD700BLK', 'BU-042BLK', '4' UNION
SELECT 'HD700BLK', 'NUT-375G', '1' UNION
SELECT 'HD700BLK', 'SLV-8000', '1' UNION
SELECT 'HD700BLK', 'WA-375G', '4' UNION
SELECT 'HD700BLK-2', 'BLT-1300G5', '2' UNION
SELECT 'HD700BLK-2', 'BU-042BLK', '8' UNION
SELECT 'HD700BLK-2', 'NUT-375G', '2' UNION
SELECT 'HD700BLK-2', 'SLV-8000', '2' UNION
SELECT 'HD700BLK-2', 'WA-375G', '8' UNION
SELECT 'HD700BLK-2QG', 'BLT-1300G5', '2' UNION
SELECT 'HD700BLK-2QG', 'BU-042BLK', '8' UNION
SELECT 'HD700BLK-2QG', 'NUT-375G', '2' UNION
SELECT 'HD700BLK-2QG', 'SLV-8000', '2' UNION
SELECT 'HD700BLK-2QG', 'WA-375G', '8' UNION
SELECT 'HD700BLKOE', 'BLT-1300G5', '1' UNION
SELECT 'HD700BLKOE', 'BU-042BLK', '4' UNION
SELECT 'HD700BLKOE', 'NUT-375G', '1' UNION
SELECT 'HD700BLKOE', 'SLV-8000', '1' UNION
SELECT 'HD700BLKOE', 'WA-375G', '4' UNION
SELECT 'HD700RED', 'BLT-1300G5', '1' UNION
SELECT 'HD700RED', 'BU-042RED', '4' UNION
SELECT 'HD700RED', 'NUT-375G', '1' UNION
SELECT 'HD700RED', 'SLV-8000', '1' UNION
SELECT 'HD700RED', 'WA-375G', '4' UNION
SELECT 'HD700RED-2', 'BLT-1300G5', '2' UNION
SELECT 'HD700RED-2', 'BU-042RED', '8' UNION
SELECT 'HD700RED-2', 'NUT-375G', '2' UNION
SELECT 'HD700RED-2', 'SLV-8000', '2' UNION
SELECT 'HD700RED-2', 'WA-375G', '8' UNION
SELECT 'MGHD700BLU', 'BLT-1300G5', '1' UNION
SELECT 'MGHD700BLU', 'MG-042BLU', '4' UNION
SELECT 'MGHD700BLU', 'NUT-375G', '1' UNION
SELECT 'MGHD700BLU', 'SLV-8000', '1' UNION
SELECT 'MGHD700BLU', 'WA-375G', '4' UNION
SELECT 'NNHD700BLK', 'BLT-1300G5', '1' UNION
SELECT 'NNHD700BLK', 'NN-042BLK', '4' UNION
SELECT 'NNHD700BLK', 'NUT-375G', '1' UNION
SELECT 'NNHD700BLK', 'SLV-8000', '1' UNION
SELECT 'NNHD700BLK', 'WA-375G', '4' UNION
SELECT 'NNHD700DAN-2AC', 'BLT-1300G5', '2' UNION
SELECT 'NNHD700DAN-2AC', 'NN-042DAN', '8' UNION
SELECT 'NNHD700DAN-2AC', 'NUT-375G', '2' UNION
SELECT 'NNHD700DAN-2AC', 'SLV-8000', '2' UNION
SELECT 'NNHD700DAN-2AC', 'WA-375G', '8' UNION
SELECT 'NNHD700DAN-2CB', 'BLT-1300G5', '2' UNION
SELECT 'NNHD700DAN-2CB', 'NN-042DAN', '8' UNION
SELECT 'NNHD700DAN-2CB', 'NUT-375G', '2' UNION
SELECT 'NNHD700DAN-2CB', 'SLV-8000', '2' UNION
SELECT 'NNHD700DAN-2CB', 'WA-375G', '8' UNION
SELECT 'NNHD700DAN-2NA', 'BLT-1300G5', '2' UNION
SELECT 'NNHD700DAN-2NA', 'NN-042DAN', '8' UNION
SELECT 'NNHD700DAN-2NA', 'NUT-375G', '2' UNION
SELECT 'NNHD700DAN-2NA', 'SLV-8000', '2' UNION
SELECT 'NNHD700DAN-2NA', 'WA-375G', '8' UNION
SELECT 'NNHD700DAN-2RB', 'BLT-1300G5', '2' UNION
SELECT 'NNHD700DAN-2RB', 'NN-042DAN', '8' UNION
SELECT 'NNHD700DAN-2RB', 'NUT-375G', '2' UNION
SELECT 'NNHD700DAN-2RB', 'SLV-8000', '2' UNION
SELECT 'NNHD700DAN-2RB', 'WA-375G', '8'

thanks for your help!
0
 

Author Comment

by:SLIMSHIM
Comment Utility
just to clarify component and quantity need to match.
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
If you a limited number of unique components, something like this should do:


IF OBJECT_ID('tempdb.dbo.#Component') IS NOT NULL
    DROP TABLE #Component
CREATE TABLE #Component (
    Component varchar(30) PRIMARY KEY,
    component_bit_value bigint
    )
INSERT INTO #Component ( Component, component_bit_value )
SELECT Component,
    POWER(CAST(2 AS float), ROW_NUMBER() OVER(ORDER BY Component) - 1) AS component_bit_value
FROM (
    SELECT DISTINCT b.Component
    FROM BOM b
) AS distinct_components
ORDER BY 1
--SELECT * FROM #Component ORDER BY component_bit_value

;WITH
cte_assembly AS (
    SELECT
        b.Assembly,
        SUM(DISTINCT c.component_bit_value) AS Assembly_Mask
    FROM BOM b
    INNER JOIN #Component c ON
        c.Component = b.Component
    GROUP BY Assembly
)
SELECT a1.Assembly AS Assembly1, a2.Assembly AS Assembly2, 'All component parts match' AS Message
FROM cte_assembly a1
INNER JOIN cte_assembly a2 ON
    a2.Assembly > a1.Assembly AND
    a2.Assembly_Mask = a1.Assembly_Mask
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
Hmm, just saw the thing about quantity.  The code above does not do that.  There may be an easier way to do the match incl. qty.  Let me think about it.
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
This is the quick-and-dirty fix if you have a limited number of component and quantity combinations.


IF OBJECT_ID('tempdb.dbo.#Component') IS NOT NULL
    DROP TABLE #Component
CREATE TABLE #Component (
    Component varchar(30) NOT NULL,
    QuantityPer int NOT NULL,
    component_bit_value bigint NOT NULL,
    PRIMARY KEY ( Component, QuantityPer )
    )
INSERT INTO #Component ( Component, QuantityPer, component_bit_value )
SELECT Component, QuantityPer,
    POWER(CAST(2 AS float), ROW_NUMBER() OVER(ORDER BY Component) - 1) AS component_bit_value
FROM (
    SELECT DISTINCT b.Component, b.QuantityPer
    FROM BOM b
) AS distinct_components
ORDER BY 1
--SELECT * FROM #Component ORDER BY component_bit_value

;WITH
cte_assembly AS (
    SELECT
        b.Assembly,
        SUM(DISTINCT c.component_bit_value) AS Assembly_Mask
    FROM BOM b
    INNER JOIN #Component c ON
        c.Component = b.Component
    GROUP BY Assembly
)
SELECT a1.Assembly AS Assembly1, a2.Assembly AS Assembly2, 'All component parts match' AS Message
FROM cte_assembly a1
INNER JOIN cte_assembly a2 ON
    a2.Assembly > a1.Assembly AND
    a2.Assembly_Mask = a1.Assembly_Mask
0
 

Author Comment

by:SLIMSHIM
Comment Utility
Number of unique components is an unknown.
currently largest assembly has up to 12 unique components.
combinations bases on color and quantity ca go as high as 40 across like assemblies
this is why i did the concatenate to get unique components of like assemblies

Thanks for helping!
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
Nah, that last query doesn't work.
0
 
LVL 69

Accepted Solution

by:
ScottPletcher earned 500 total points
Comment Utility
Here we go I think:


SELECT b1.Assembly, b2.Assembly
FROM BOM b1
INNER JOIN BOM b2 ON
    b2.assembly > b1.assembly AND
    b2.component = b1.component AND
    b2.quantityper = b1.quantityper
GROUP BY b1.Assembly, b2.Assembly
HAVING COUNT(*) = ( SELECT COUNT(b3.component) FROM BOM b3 WHERE b3.assembly = b1.assembly )
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 7

Expert Comment

by:Doug
Comment Utility
Slim, is it necessary that the unique code be based on an algorithm of the components and quantities? Would any unique identifier (like a number or letter) work? Just clarifying whether the key issue is uniqueness or if the key issue is an algorithm using components and quantity. If any identifier would work and the issue is to assign a common identifier to assemblies with the same components and quantities then one answer might be to use a subquery of components and loop through the assemblies and update a new field (unique index) in the assembly table.
0
 

Author Comment

by:SLIMSHIM
Comment Utility
scott,
again thanks for your help
i ran your code with this result set
Assembly      Assembly
HD700BLK-2      HD700BLK-2QG
HD700BLK      HD700BLKOE
NNHD700DAN-2AC      NNHD700DAN-2CB
NNHD700DAN-2AC      NNHD700DAN-2NA
NNHD700DAN-2CB      NNHD700DAN-2NA
NNHD700DAN-2AC      NNHD700DAN-2RB
NNHD700DAN-2CB      NNHD700DAN-2RB
NNHD700DAN-2NA      NNHD700DAN-2RB

whichever is the ultimate group name should appear on both sides
so for example NNHD700DAN-2RB should be on the left side as well
since it is also in that group.
yet NNHD700DAN-2AC appears twice on left but not on right.
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
Simplest way -- but perhaps not the most efficient way -- is to change the join to:

INNER JOIN BOM b2 ON
     b2.assembly <> b1.assembly AND
     b2.component = b1.component AND
     b2.quantityper = b1.quantityper
0
 

Author Comment

by:SLIMSHIM
Comment Utility
Scott ,
that won't work
here is  result:

Assembly      Assembly
HD700BLKOE      HD700BLK
HD700BLK-2QG      HD700BLK-2
HD700BLK-2      HD700BLK-2QG
HD700BLK      HD700BLKOE
NNHD700DAN-2CB      NNHD700DAN-2AC
NNHD700DAN-2NA      NNHD700DAN-2AC
NNHD700DAN-2RB      NNHD700DAN-2AC
NNHD700DAN-2AC      NNHD700DAN-2CB
NNHD700DAN-2NA      NNHD700DAN-2CB
NNHD700DAN-2RB      NNHD700DAN-2CB
NNHD700DAN-2AC      NNHD700DAN-2NA
NNHD700DAN-2CB      NNHD700DAN-2NA
NNHD700DAN-2RB      NNHD700DAN-2NA
NNHD700DAN-2AC      NNHD700DAN-2RB
NNHD700DAN-2CB      NNHD700DAN-2RB
NNHD700DAN-2NA      NNHD700DAN-2RB

thanks for all your help!!!
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
Ok, I'm not an expert on your data.  I'm not sure what "won't work" means.  But you should be able to adjust the query to get what you need.
0
 
LVL 7

Expert Comment

by:Doug
Comment Utility
Looks to me like Scott's answer gets you most of the way there. I'm guessing it doesn't work because you still need one more row which is the same on both sides. I think this part is what's excluding the missing row of data:  

b2.assembly <> b1.assembly


You also want a row where b2.assembly=b1.assembly, right?
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
That will give you rows where the assembly is equal to itself -- seems pretty self-evident, but if you really need that, make the change.
0
 
LVL 7

Expert Comment

by:Doug
Comment Utility
Slim,
Maybe you could UNION Scott's query with a duplicate of itself that returns only the missing row:

<First query syntax> UNION
SELECT DISTINCT b2.Assembly, b2.Assembly
FROM SELECT b1.Assembly, b2.Assembly
FROM BOM b1
INNER JOIN BOM b2 ON
    b2.assembly <> b1.assembly AND
    b2.component = b1.component AND
    b2.quantityper = b1.quantityper
GROUP BY b1.Assembly, b2.Assembly
HAVING COUNT(*) = ( SELECT COUNT(b3.component) FROM BOM b3 WHERE b3.assembly = b1.assembly )
0
 

Author Comment

by:SLIMSHIM
Comment Utility
Sorry  for taking so long to respond.

here is a final working  code:

declare  @table1 table
(
Master nvarchar(30),
instance nvarchar (30)
)


insert into @table1

SELECT b1.Assembly as Master, b2.Assembly as instance
FROM @table b1
INNER JOIN @table b2 ON
    b2.assembly >= b1.assembly AND
    b2.component = b1.component AND
    b2.quantityper = b1.quantityper
GROUP BY b1.Assembly, b2.Assembly

HAVING COUNT(*) = ( SELECT COUNT(b3.component) FROM @table b3 WHERE b3.assembly = b1.assembly )

-- this is to remove  master assembly entries  that are ambiguous
delete from @table1
where master in
(
select instance as master from @table1
group by instance
having count (*) >1
)

[resulting Data]

Master      instance
HD100CBLK      HD100CBLK
HD100CBLK      HD100FBLKTW
HD100CBLK-2      HD100CBLK-2
HD100CBLKQP      HD100CBLKQP
HD100CBLKQP      HD100FBLK
HD100CBLU      HD100CBLU
HD100CRED      HD100CRED
HD100CRED      HD100CREDOE
HD100CRED-2      HD100CRED-2
HD100FBLK-2      HD100FBLK-2
HD100FRED      HD100FRED
HD100FRED      HD100FREDDR
HD100FRED      HD100FREDOE
HD100FRED-2      HD100FRED-2
MG100CBLU      MG100CBLU
NN100CBLK      NN100CBLK
NN100CBLK      NN100FZYBLK
NN100CDAN-2      NN100CDAN-2
NN100CDAN-2      NN100CDAN-2AC
NN100CDAN-2      NN100CDAN-2CB
NN100CDAN-2      NN100CDAN-2NA
NN100CDAN-2      NN100CDAN-2RB
NN100FBLK      NN100FBLK
NN100FBOPRB      NN100FBOPRB
NN100FBOPRB-2      NN100FBOPRB-2

i"m sure it can be done without the  @temp table

you were a great help and inspiration.
i will close this out tomorrow  . i just want to  try it  again in the AM.

thank you for all your help!!!!!!!!!
0
 

Author Comment

by:SLIMSHIM
Comment Utility
Good Morning all,
sorry for the delay i just burned out last night
first of all let me show data result using same data set: (sorry about last nights post ! was running on fumes)
[resulting  data]
Master      instance
HD700BLK      HD700BLK
HD700BLK      HD700BLKOE
HD700BLK-2      HD700BLK-2
HD700BLK-2      HD700BLK-2QG
HD700RED      HD700RED
HD700RED-2      HD700RED-2
MGHD700BLU      MGHD700BLU
NNHD700BLK      NNHD700BLK
NNHD700DAN-2AC      NNHD700DAN-2AC
NNHD700DAN-2AC      NNHD700DAN-2CB
NNHD700DAN-2AC      NNHD700DAN-2NA
NNHD700DAN-2AC      NNHD700DAN-2RB


with 7 unique Master assemblies
HD700BLK
HD700BLK-2
HD700RED
HD700RED-2
MGHD700BLU
NNHD700BLK
NNHD700DAN-2AC

works great thank you for all your help and input.
is there a reason to avoid the @temptable ?
0
 

Author Closing Comment

by:SLIMSHIM
Comment Utility
I appreciate your patience and input
Thanks
0

Featured Post

Free Gift Card with Acronis Backup Purchase!

Backup any data in any location: local and remote systems, physical and virtual servers, private and public clouds, Macs and PCs, tablets and mobile devices, & more! For limited time only, buy any Acronis backup products and get a FREE Amazon/Best Buy gift card worth up to $200!

Join & Write a Comment

This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now