SQL finding duplicates

Hi,
I have a complex situation:
consider this data structure:
Assembly
Component
QuantityPer

Each assembly is composed of a number of components a quantities

lets use item NNHD700DAN-2AC from data below
it contains
BLT-1300G5      2
NN-042DAN      8
NUT-375G      2
SLV-8000      2
WA-375G      8


i would like a way to find all assemblies that use same components

these assembles use same components
NNHD700DAN-2CB      
NNHD700DAN-2NA      
NNHD700DAN-2RB      

so if there are 12 variations  i would like to list each with all assembly names


i"m looking for a away to generate a [unique] code to identify the group of components then add the names and code to a group to a @temp table


one idea i had was to  concatenate components and quantity  into  a string
Component + CONVERT (VARCHAR(50),QuantityPer,128)
ex.  
BAG-MV4131
BLT-0900G51
BLT-0900G52
BLT-0900M51
BLT-0900M52
BU-042BLK4
BU-042BLK8
BU-042BLU4
BU-042RED4
BU-042RED8
INS-XUF1061
MG-042BLU4
NN-042BLK4
NN-042DAN8
NN-042PRB4
NN-042PRB8
NUT-375G1
NUT-375G2
NUT-375PTB1
NUT-375PTB2
SLV-45621
SLV-45622
SLV-4562BO1
SLV-4562BO2
WA-375G4
WA-375G8
WA-4512SM4
WA-4512SM8
WA-4512SMBO4
WA-4512SMBO8

then assign a unique number based on binary  numbers(which i"m not sure how to do yet)
1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536,131072,262144,524288,1048576 ...
this way each assembly can have a sum of combined components

add assembly and it's sum to temp table
then all with same sum are a group

Any ideas and help would be much appreciated!!!

Thank You in advance

here is sample data
[sample Data]
Assembly      Component      QuantityPer
HD700BLK      BLT-1300G5      1
HD700BLK      BU-042BLK      4
HD700BLK      NUT-375G      1
HD700BLK      SLV-8000      1
HD700BLK      WA-375G      4
HD700BLK-2      BLT-1300G5      2
HD700BLK-2      BU-042BLK      8
HD700BLK-2      NUT-375G      2
HD700BLK-2      SLV-8000      2
HD700BLK-2      WA-375G      8
HD700BLK-2QG      BLT-1300G5      2
HD700BLK-2QG      BU-042BLK      8
HD700BLK-2QG      NUT-375G      2
HD700BLK-2QG      SLV-8000      2
HD700BLK-2QG      WA-375G      8
HD700BLKOE      BLT-1300G5      1
HD700BLKOE      BU-042BLK      4
HD700BLKOE      NUT-375G      1
HD700BLKOE      SLV-8000      1
HD700BLKOE      WA-375G      4
HD700RED      BLT-1300G5      1
HD700RED      BU-042RED      4
HD700RED      NUT-375G      1
HD700RED      SLV-8000      1
HD700RED      WA-375G      4
HD700RED-2      BLT-1300G5      2
HD700RED-2      BU-042RED      8
HD700RED-2      NUT-375G      2
HD700RED-2      SLV-8000      2
HD700RED-2      WA-375G      8
MGHD700BLU      BLT-1300G5      1
MGHD700BLU      MG-042BLU      4
MGHD700BLU      NUT-375G      1
MGHD700BLU      SLV-8000      1
MGHD700BLU      WA-375G      4
NNHD700BLK      BLT-1300G5      1
NNHD700BLK      NN-042BLK      4
NNHD700BLK      NUT-375G      1
NNHD700BLK      SLV-8000      1
NNHD700BLK      WA-375G      4
NNHD700DAN-2AC      BLT-1300G5      2
NNHD700DAN-2AC      NN-042DAN      8
NNHD700DAN-2AC      NUT-375G      2
NNHD700DAN-2AC      SLV-8000      2
NNHD700DAN-2AC      WA-375G      8
NNHD700DAN-2CB      BLT-1300G5      2
NNHD700DAN-2CB      NN-042DAN      8
NNHD700DAN-2CB      NUT-375G      2
NNHD700DAN-2CB      SLV-8000      2
NNHD700DAN-2CB      WA-375G      8
NNHD700DAN-2NA      BLT-1300G5      2
NNHD700DAN-2NA      NN-042DAN      8
NNHD700DAN-2NA      NUT-375G      2
NNHD700DAN-2NA      SLV-8000      2
NNHD700DAN-2NA      WA-375G      8
NNHD700DAN-2RB      BLT-1300G5      2
NNHD700DAN-2RB      NN-042DAN      8
NNHD700DAN-2RB      NUT-375G      2
NNHD700DAN-2RB      SLV-8000      2
NNHD700DAN-2RB      WA-375G      8
SLIMSHIMAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Scott PletcherSenior DBACommented:
Would you put the sample data in the form of INSERT statements to a table rather than just a text splat of data.  It often takes longer to prep the data than write the query.

Are you looking for assemblies that share *all* components?  Where all of one assembly is used in another assembly, though the other assembly may have additional components?  Or where any component at all overlaps between assemblies?
0
SLIMSHIMAuthor Commented:
Thanks for responding.
there are no sub assemblies.
so yes i"m looking for assemblies that share all components.
i will post insert statement shortly to generate data.

thanks again
0
SLIMSHIMAuthor Commented:
Here is insert statement:
INSERT INTO BOM([Assembly],[Component],[QuantityPer])
SELECT 'HD700BLK', 'BLT-1300G5', '1' UNION
SELECT 'HD700BLK', 'BU-042BLK', '4' UNION
SELECT 'HD700BLK', 'NUT-375G', '1' UNION
SELECT 'HD700BLK', 'SLV-8000', '1' UNION
SELECT 'HD700BLK', 'WA-375G', '4' UNION
SELECT 'HD700BLK-2', 'BLT-1300G5', '2' UNION
SELECT 'HD700BLK-2', 'BU-042BLK', '8' UNION
SELECT 'HD700BLK-2', 'NUT-375G', '2' UNION
SELECT 'HD700BLK-2', 'SLV-8000', '2' UNION
SELECT 'HD700BLK-2', 'WA-375G', '8' UNION
SELECT 'HD700BLK-2QG', 'BLT-1300G5', '2' UNION
SELECT 'HD700BLK-2QG', 'BU-042BLK', '8' UNION
SELECT 'HD700BLK-2QG', 'NUT-375G', '2' UNION
SELECT 'HD700BLK-2QG', 'SLV-8000', '2' UNION
SELECT 'HD700BLK-2QG', 'WA-375G', '8' UNION
SELECT 'HD700BLKOE', 'BLT-1300G5', '1' UNION
SELECT 'HD700BLKOE', 'BU-042BLK', '4' UNION
SELECT 'HD700BLKOE', 'NUT-375G', '1' UNION
SELECT 'HD700BLKOE', 'SLV-8000', '1' UNION
SELECT 'HD700BLKOE', 'WA-375G', '4' UNION
SELECT 'HD700RED', 'BLT-1300G5', '1' UNION
SELECT 'HD700RED', 'BU-042RED', '4' UNION
SELECT 'HD700RED', 'NUT-375G', '1' UNION
SELECT 'HD700RED', 'SLV-8000', '1' UNION
SELECT 'HD700RED', 'WA-375G', '4' UNION
SELECT 'HD700RED-2', 'BLT-1300G5', '2' UNION
SELECT 'HD700RED-2', 'BU-042RED', '8' UNION
SELECT 'HD700RED-2', 'NUT-375G', '2' UNION
SELECT 'HD700RED-2', 'SLV-8000', '2' UNION
SELECT 'HD700RED-2', 'WA-375G', '8' UNION
SELECT 'MGHD700BLU', 'BLT-1300G5', '1' UNION
SELECT 'MGHD700BLU', 'MG-042BLU', '4' UNION
SELECT 'MGHD700BLU', 'NUT-375G', '1' UNION
SELECT 'MGHD700BLU', 'SLV-8000', '1' UNION
SELECT 'MGHD700BLU', 'WA-375G', '4' UNION
SELECT 'NNHD700BLK', 'BLT-1300G5', '1' UNION
SELECT 'NNHD700BLK', 'NN-042BLK', '4' UNION
SELECT 'NNHD700BLK', 'NUT-375G', '1' UNION
SELECT 'NNHD700BLK', 'SLV-8000', '1' UNION
SELECT 'NNHD700BLK', 'WA-375G', '4' UNION
SELECT 'NNHD700DAN-2AC', 'BLT-1300G5', '2' UNION
SELECT 'NNHD700DAN-2AC', 'NN-042DAN', '8' UNION
SELECT 'NNHD700DAN-2AC', 'NUT-375G', '2' UNION
SELECT 'NNHD700DAN-2AC', 'SLV-8000', '2' UNION
SELECT 'NNHD700DAN-2AC', 'WA-375G', '8' UNION
SELECT 'NNHD700DAN-2CB', 'BLT-1300G5', '2' UNION
SELECT 'NNHD700DAN-2CB', 'NN-042DAN', '8' UNION
SELECT 'NNHD700DAN-2CB', 'NUT-375G', '2' UNION
SELECT 'NNHD700DAN-2CB', 'SLV-8000', '2' UNION
SELECT 'NNHD700DAN-2CB', 'WA-375G', '8' UNION
SELECT 'NNHD700DAN-2NA', 'BLT-1300G5', '2' UNION
SELECT 'NNHD700DAN-2NA', 'NN-042DAN', '8' UNION
SELECT 'NNHD700DAN-2NA', 'NUT-375G', '2' UNION
SELECT 'NNHD700DAN-2NA', 'SLV-8000', '2' UNION
SELECT 'NNHD700DAN-2NA', 'WA-375G', '8' UNION
SELECT 'NNHD700DAN-2RB', 'BLT-1300G5', '2' UNION
SELECT 'NNHD700DAN-2RB', 'NN-042DAN', '8' UNION
SELECT 'NNHD700DAN-2RB', 'NUT-375G', '2' UNION
SELECT 'NNHD700DAN-2RB', 'SLV-8000', '2' UNION
SELECT 'NNHD700DAN-2RB', 'WA-375G', '8'

thanks for your help!
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

SLIMSHIMAuthor Commented:
just to clarify component and quantity need to match.
0
Scott PletcherSenior DBACommented:
If you a limited number of unique components, something like this should do:


IF OBJECT_ID('tempdb.dbo.#Component') IS NOT NULL
    DROP TABLE #Component
CREATE TABLE #Component (
    Component varchar(30) PRIMARY KEY,
    component_bit_value bigint
    )
INSERT INTO #Component ( Component, component_bit_value )
SELECT Component,
    POWER(CAST(2 AS float), ROW_NUMBER() OVER(ORDER BY Component) - 1) AS component_bit_value
FROM (
    SELECT DISTINCT b.Component
    FROM BOM b
) AS distinct_components
ORDER BY 1
--SELECT * FROM #Component ORDER BY component_bit_value

;WITH
cte_assembly AS (
    SELECT
        b.Assembly,
        SUM(DISTINCT c.component_bit_value) AS Assembly_Mask
    FROM BOM b
    INNER JOIN #Component c ON
        c.Component = b.Component
    GROUP BY Assembly
)
SELECT a1.Assembly AS Assembly1, a2.Assembly AS Assembly2, 'All component parts match' AS Message
FROM cte_assembly a1
INNER JOIN cte_assembly a2 ON
    a2.Assembly > a1.Assembly AND
    a2.Assembly_Mask = a1.Assembly_Mask
0
Scott PletcherSenior DBACommented:
Hmm, just saw the thing about quantity.  The code above does not do that.  There may be an easier way to do the match incl. qty.  Let me think about it.
0
Scott PletcherSenior DBACommented:
This is the quick-and-dirty fix if you have a limited number of component and quantity combinations.


IF OBJECT_ID('tempdb.dbo.#Component') IS NOT NULL
    DROP TABLE #Component
CREATE TABLE #Component (
    Component varchar(30) NOT NULL,
    QuantityPer int NOT NULL,
    component_bit_value bigint NOT NULL,
    PRIMARY KEY ( Component, QuantityPer )
    )
INSERT INTO #Component ( Component, QuantityPer, component_bit_value )
SELECT Component, QuantityPer,
    POWER(CAST(2 AS float), ROW_NUMBER() OVER(ORDER BY Component) - 1) AS component_bit_value
FROM (
    SELECT DISTINCT b.Component, b.QuantityPer
    FROM BOM b
) AS distinct_components
ORDER BY 1
--SELECT * FROM #Component ORDER BY component_bit_value

;WITH
cte_assembly AS (
    SELECT
        b.Assembly,
        SUM(DISTINCT c.component_bit_value) AS Assembly_Mask
    FROM BOM b
    INNER JOIN #Component c ON
        c.Component = b.Component
    GROUP BY Assembly
)
SELECT a1.Assembly AS Assembly1, a2.Assembly AS Assembly2, 'All component parts match' AS Message
FROM cte_assembly a1
INNER JOIN cte_assembly a2 ON
    a2.Assembly > a1.Assembly AND
    a2.Assembly_Mask = a1.Assembly_Mask
0
SLIMSHIMAuthor Commented:
Number of unique components is an unknown.
currently largest assembly has up to 12 unique components.
combinations bases on color and quantity ca go as high as 40 across like assemblies
this is why i did the concatenate to get unique components of like assemblies

Thanks for helping!
0
Scott PletcherSenior DBACommented:
Nah, that last query doesn't work.
0
Scott PletcherSenior DBACommented:
Here we go I think:


SELECT b1.Assembly, b2.Assembly
FROM BOM b1
INNER JOIN BOM b2 ON
    b2.assembly > b1.assembly AND
    b2.component = b1.component AND
    b2.quantityper = b1.quantityper
GROUP BY b1.Assembly, b2.Assembly
HAVING COUNT(*) = ( SELECT COUNT(b3.component) FROM BOM b3 WHERE b3.assembly = b1.assembly )
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
DougCommented:
Slim, is it necessary that the unique code be based on an algorithm of the components and quantities? Would any unique identifier (like a number or letter) work? Just clarifying whether the key issue is uniqueness or if the key issue is an algorithm using components and quantity. If any identifier would work and the issue is to assign a common identifier to assemblies with the same components and quantities then one answer might be to use a subquery of components and loop through the assemblies and update a new field (unique index) in the assembly table.
0
SLIMSHIMAuthor Commented:
scott,
again thanks for your help
i ran your code with this result set
Assembly      Assembly
HD700BLK-2      HD700BLK-2QG
HD700BLK      HD700BLKOE
NNHD700DAN-2AC      NNHD700DAN-2CB
NNHD700DAN-2AC      NNHD700DAN-2NA
NNHD700DAN-2CB      NNHD700DAN-2NA
NNHD700DAN-2AC      NNHD700DAN-2RB
NNHD700DAN-2CB      NNHD700DAN-2RB
NNHD700DAN-2NA      NNHD700DAN-2RB

whichever is the ultimate group name should appear on both sides
so for example NNHD700DAN-2RB should be on the left side as well
since it is also in that group.
yet NNHD700DAN-2AC appears twice on left but not on right.
0
Scott PletcherSenior DBACommented:
Simplest way -- but perhaps not the most efficient way -- is to change the join to:

INNER JOIN BOM b2 ON
     b2.assembly <> b1.assembly AND
     b2.component = b1.component AND
     b2.quantityper = b1.quantityper
0
SLIMSHIMAuthor Commented:
Scott ,
that won't work
here is  result:

Assembly      Assembly
HD700BLKOE      HD700BLK
HD700BLK-2QG      HD700BLK-2
HD700BLK-2      HD700BLK-2QG
HD700BLK      HD700BLKOE
NNHD700DAN-2CB      NNHD700DAN-2AC
NNHD700DAN-2NA      NNHD700DAN-2AC
NNHD700DAN-2RB      NNHD700DAN-2AC
NNHD700DAN-2AC      NNHD700DAN-2CB
NNHD700DAN-2NA      NNHD700DAN-2CB
NNHD700DAN-2RB      NNHD700DAN-2CB
NNHD700DAN-2AC      NNHD700DAN-2NA
NNHD700DAN-2CB      NNHD700DAN-2NA
NNHD700DAN-2RB      NNHD700DAN-2NA
NNHD700DAN-2AC      NNHD700DAN-2RB
NNHD700DAN-2CB      NNHD700DAN-2RB
NNHD700DAN-2NA      NNHD700DAN-2RB

thanks for all your help!!!
0
Scott PletcherSenior DBACommented:
Ok, I'm not an expert on your data.  I'm not sure what "won't work" means.  But you should be able to adjust the query to get what you need.
0
DougCommented:
Looks to me like Scott's answer gets you most of the way there. I'm guessing it doesn't work because you still need one more row which is the same on both sides. I think this part is what's excluding the missing row of data:  

b2.assembly <> b1.assembly


You also want a row where b2.assembly=b1.assembly, right?
0
Scott PletcherSenior DBACommented:
That will give you rows where the assembly is equal to itself -- seems pretty self-evident, but if you really need that, make the change.
0
DougCommented:
Slim,
Maybe you could UNION Scott's query with a duplicate of itself that returns only the missing row:

<First query syntax> UNION
SELECT DISTINCT b2.Assembly, b2.Assembly
FROM SELECT b1.Assembly, b2.Assembly
FROM BOM b1
INNER JOIN BOM b2 ON
    b2.assembly <> b1.assembly AND
    b2.component = b1.component AND
    b2.quantityper = b1.quantityper
GROUP BY b1.Assembly, b2.Assembly
HAVING COUNT(*) = ( SELECT COUNT(b3.component) FROM BOM b3 WHERE b3.assembly = b1.assembly )
0
SLIMSHIMAuthor Commented:
Sorry  for taking so long to respond.

here is a final working  code:

declare  @table1 table
(
Master nvarchar(30),
instance nvarchar (30)
)


insert into @table1

SELECT b1.Assembly as Master, b2.Assembly as instance
FROM @table b1
INNER JOIN @table b2 ON
    b2.assembly >= b1.assembly AND
    b2.component = b1.component AND
    b2.quantityper = b1.quantityper
GROUP BY b1.Assembly, b2.Assembly

HAVING COUNT(*) = ( SELECT COUNT(b3.component) FROM @table b3 WHERE b3.assembly = b1.assembly )

-- this is to remove  master assembly entries  that are ambiguous
delete from @table1
where master in
(
select instance as master from @table1
group by instance
having count (*) >1
)

[resulting Data]

Master      instance
HD100CBLK      HD100CBLK
HD100CBLK      HD100FBLKTW
HD100CBLK-2      HD100CBLK-2
HD100CBLKQP      HD100CBLKQP
HD100CBLKQP      HD100FBLK
HD100CBLU      HD100CBLU
HD100CRED      HD100CRED
HD100CRED      HD100CREDOE
HD100CRED-2      HD100CRED-2
HD100FBLK-2      HD100FBLK-2
HD100FRED      HD100FRED
HD100FRED      HD100FREDDR
HD100FRED      HD100FREDOE
HD100FRED-2      HD100FRED-2
MG100CBLU      MG100CBLU
NN100CBLK      NN100CBLK
NN100CBLK      NN100FZYBLK
NN100CDAN-2      NN100CDAN-2
NN100CDAN-2      NN100CDAN-2AC
NN100CDAN-2      NN100CDAN-2CB
NN100CDAN-2      NN100CDAN-2NA
NN100CDAN-2      NN100CDAN-2RB
NN100FBLK      NN100FBLK
NN100FBOPRB      NN100FBOPRB
NN100FBOPRB-2      NN100FBOPRB-2

i"m sure it can be done without the  @temp table

you were a great help and inspiration.
i will close this out tomorrow  . i just want to  try it  again in the AM.

thank you for all your help!!!!!!!!!
0
SLIMSHIMAuthor Commented:
Good Morning all,
sorry for the delay i just burned out last night
first of all let me show data result using same data set: (sorry about last nights post ! was running on fumes)
[resulting  data]
Master      instance
HD700BLK      HD700BLK
HD700BLK      HD700BLKOE
HD700BLK-2      HD700BLK-2
HD700BLK-2      HD700BLK-2QG
HD700RED      HD700RED
HD700RED-2      HD700RED-2
MGHD700BLU      MGHD700BLU
NNHD700BLK      NNHD700BLK
NNHD700DAN-2AC      NNHD700DAN-2AC
NNHD700DAN-2AC      NNHD700DAN-2CB
NNHD700DAN-2AC      NNHD700DAN-2NA
NNHD700DAN-2AC      NNHD700DAN-2RB


with 7 unique Master assemblies
HD700BLK
HD700BLK-2
HD700RED
HD700RED-2
MGHD700BLU
NNHD700BLK
NNHD700DAN-2AC

works great thank you for all your help and input.
is there a reason to avoid the @temptable ?
0
SLIMSHIMAuthor Commented:
I appreciate your patience and input
Thanks
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.