• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 337
  • Last Modified:

Duplicate records - limited table linking available

I have a query / table linking problem. What I have works for about 95% of the time and I am trying to fine tune this.

I have 2 tables I am trying to link and print a report.

Table 1 (tblPosition):
HeaderKey
PosKey
Position
Width

Table 2 (tblItems):
HeaderKey
ItemKey
Item
Width
QTY

I am linking the HeaderKey and Width.  I do not have any other way of linking the 2 tables

I am trying to get the Table 2 Item’s position from Table 1.  This works correctly except when I have 2 different items with the same width.

sample code embeded below of working and non-working data

I am not sure what else to do – perhaps a stored procedure?

In the non working example the desired output is:
1      1      7      bbb
1      2      7      bbb
1      3      2      ddd
1      4      7      ccc
1      5      5      aaa      

I want ‘bbb’ to display twice because the qty is 2 and ‘ccc’ only once because the quantity is 1.  

I don’t care which order they appear in as long as it is consistent every time I run the report / query.

Thanks
-- Working Example
--------------------------------------
-- Create Sample Data using a Table Varable
DECLARE @tblPosition TABLE
(HeaderKey int, PosKey int, Position int, width int)
-- Load Sample Data
INSERT INTO @tblPosition VALUES (1, 1, 1, 5)
INSERT INTO @tblPosition VALUES (1, 2, 2, 7)
INSERT INTO @tblPosition VALUES (1, 3, 3, 2)
INSERT INTO @tblPosition VALUES (1, 4, 4, 7)

-- Create Sample Data using a Table Varable
DECLARE @tblItems TABLE
(HeaderKey int, ItemKey int, Item varchar(10), width int, QTY int)
-- Load Sample Data
INSERT INTO @tblItems VALUES (1, 1, 'aaa', 5, 1)
INSERT INTO @tblItems VALUES (1, 2, 'bbb', 7, 2)
INSERT INTO @tblItems VALUES (1, 3, 'ccc', 2, 1)

-- Link table and return result
Select a.HeaderKey, a.Position, a.width, b.Item 
From @tblPosition a inner join @tblItems b
	on a.Headerkey = b.Headerkey and a.width = b.width
Order by a.Position


-- Non Working Example
--------------------------------------
-- Create Sample Data using a Table Varable
DECLARE @tblPosition TABLE
(HeaderKey int, PosKey int, Position int, width int)
-- Load Sample Data
INSERT INTO @tblPosition VALUES (1, 1, 1, 7)
INSERT INTO @tblPosition VALUES (1, 2, 2, 7)
INSERT INTO @tblPosition VALUES (1, 3, 3, 2)
INSERT INTO @tblPosition VALUES (1, 4, 4, 7)
INSERT INTO @tblPosition VALUES (1, 5, 5, 5)

-- Create Sample Data using a Table Varable
DECLARE @tblItems TABLE
(HeaderKey int, ItemKey int, Item varchar(10), width int, QTY int)
-- Load Sample Data
INSERT INTO @tblItems VALUES (1, 1, 'aaa', 5, 1)
INSERT INTO @tblItems VALUES (1, 2, 'bbb', 7, 2)
INSERT INTO @tblItems VALUES (1, 3, 'ccc', 7, 1)
INSERT INTO @tblItems VALUES (1, 3, 'ddd', 2, 1)

-- Link table and return result
Select a.HeaderKey, a.Position, a.width, b.Item 
From @tblPosition a inner join @tblItems b
	on a.Headerkey = b.Headerkey and a.width = b.width
Order by a.Position

Open in new window

0
wsadfilm
Asked:
wsadfilm
  • 4
  • 3
1 Solution
 
lwadwellCommented:
I am confused ...

You mentioned that the quantity is a factor ... so with
  INSERT INTO @tblItems VALUES (1, 2, 'bbb', 7, 2)
having a quantity of 2, you only want to 2 of these
  INSERT INTO @tblPosition VALUES (1, 1, 1, 7)
  INSERT INTO @tblPosition VALUES (1, 2, 2, 7)
  INSERT INTO @tblPosition VALUES (1, 4, 4, 7)

What determines which two?  Your example has
  HeaderKey      Position      width      Item
  1                      1              7              bbb  <-- kept
  1                      2              7              bbb  <-- kept
  1                      4              7              bbb  <-- discarded

And with this one, with a quantity of 1
  INSERT INTO @tblItems VALUES (1, 3, 'ccc', 7, 1)
and possible options from:
  INSERT INTO @tblPosition VALUES (1, 1, 1, 7)
  INSERT INTO @tblPosition VALUES (1, 2, 2, 7)
  INSERT INTO @tblPosition VALUES (1, 4, 4, 7)
to become:
  HeaderKey      Position      width      Item
  1                      1              7              ccc  <-- discard
  1                      2              7              ccc  <-- discard
  1                      4              7              ccc  <-- keep

How is the records to be kept/discarded determined?  Is it a priority thing ... where once bbb had the two rows ... ccc could only get the remainder ... if so, why did bbb get to pick first?
0
 
wsadfilmAuthor Commented:
Sorry for the confusion.

For this it doesn’t matter which records are kept or discarded as long as it is consistent in re-running the query.  With my example I just chose an order for demonstration purposes.  It could be any combination – it could have been ccc first then bbb or bbb ccc bbb – just as long as I have 2 of the one and 1 of the other – based on the qty data and that it is repeatable.
0
 
lwadwellCommented:
Have a look at this ... it might seem a bit complex but I will try to explain
note: I added an extra width 7 to create a third case to extend my testing.
note: I also changed to use temp tables instead
IF OBJECT_ID('tempdb..#tmp_tblPosition') IS NOT NULL
	DROP TABLE #tmp_tblPosition
IF OBJECT_ID('tempdb..#tmp_tblItems') IS NOT NULL
	DROP TABLE #tmp_tblItems

create table #tmp_tblPosition (HeaderKey int, PosKey int, Position int, width int);
-- Load Sample Data
INSERT INTO #tmp_tblPosition VALUES (1, 1, 1, 7)
INSERT INTO #tmp_tblPosition VALUES (1, 2, 2, 7)
INSERT INTO #tmp_tblPosition VALUES (1, 3, 3, 2)
INSERT INTO #tmp_tblPosition VALUES (1, 4, 4, 7)
INSERT INTO #tmp_tblPosition VALUES (1, 5, 5, 5)
INSERT INTO #tmp_tblPosition VALUES (1, 6, 4, 7)

-- Create Sample Data using a Table Varable
--DECLARE @tblItems TABLE
create table #tmp_tblItems (HeaderKey int, ItemKey int, Item varchar(10), width int, QTY int);
-- Load Sample Data
INSERT INTO #tmp_tblItems VALUES (1, 1, 'aaa', 5, 1)
INSERT INTO #tmp_tblItems VALUES (1, 2, 'bbb', 7, 2)
INSERT INTO #tmp_tblItems VALUES (1, 3, 'ccc', 7, 1)
INSERT INTO #tmp_tblItems VALUES (1, 4, 'ddd', 2, 1)
INSERT INTO #tmp_tblItems VALUES (1, 5, 'eee', 7, 1)

;with Item_w_rn as (
    Select HeaderKey, ItemKey, Item, width, QTY
         , row_number()over(partition by HeaderKey, width order by ItemKey) rn 
    From #tmp_tblItems
), recur_Item as (
    Select HeaderKey, ItemKey, Item, width, QTY, 1 as start_pos, qty as end_pos, rn
    From Item_w_rn
    Where rn = 1
    Union All
    Select ti.HeaderKey, ti.ItemKey, ti.Item, ti.width, ti.QTY
         , ri.end_pos+1 as start_pos, ri.end_pos+ti.qty as end_pos, ti.rn
    From recur_Item ri
    inner join Item_w_rn ti on ri.HeaderKey = ti.HeaderKey and ri.width = ti.width and ri.rn+1 = ti.rn
), Position_w_rn as (
    Select HeaderKey, PosKey, Position, width
         , row_number()over(partition by HeaderKey, width order by PosKey) rn 
    From #tmp_tblPosition
)
select b.HeaderKey, b.Position, b.width, a.Item
from recur_Item a
inner join Position_w_rn b on a.Headerkey = b.Headerkey and a.width = b.width and b.rn between a.start_pos and a.end_pos
Order by b.Position

Open in new window

There are three CTE's (Common Table Expressions) being used in my solution.
1. Item_w_rn - add a row_number(), a unique sequence number, to the tblItems table for each HeaderKey/Width combination
2. recur_Item - is a recurring cte that takes the output from the first cte and
    a) select the rows where the row number is 1
    b) UNION ALL a join of those in order to the record with the next highest row number
   The idea here is to create a range value for each item using the QTY ... e.g.
HeaderKey   width      Item   Start_pos   End_pos
  1                    7           bbb         1              2
  1                    7           ccc          3              3
  1                    7           eee         4              4
3. Position_w_rn - as a row_number() to the tblPosition table for each HeaderKey/Width combination

The query then joins the two cte's together where HeaderKey/Width match and the row number in the Position_w_rn cte is between the start_pos and end_pos from the recur_Item cte.

I hope this makes sense.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Habib PourfardSoftware DeveloperCommented:
it is a hard question. I finished with following query which works well except for position, it may help you.
SELECT  a.HeaderKey
       ,a.Position
       ,a.width
       ,b.Item
       FROM 
(SELECT *, ROW_NUMBER() OVER (PARTITION BY width ORDER BY PosKey) AS Pos FROM @tblPosition T2) a
INNER JOIN (SELECT *, (SELECT SUM(Qty)  FROM @tblItems Tmp WHERE Tmp.ItemKey <= @tblItems.ItemKey AND Tmp.width = @tblItems.width GROUP BY width) AS Pos  FROM @tblItems) b
     ON a.Headerkey = b.Headerkey AND a.width = b.width
     WHERE a.Pos >= b.Pos
ORDER BY a.Position

Open in new window

0
 
wsadfilmAuthor Commented:
Thank you for the ideas - I am in the process of setting it up and testing with live data.
0
 
wsadfilmAuthor Commented:
lwadwell - your solution is intertersting and returns accurate results against live test data.

I have stepped through and understand most of the code and I beilieve I understand your logic.  What I don't fully understand is the recur_Item and start_pos and end_pos.  I see how you are using them in the Join between statment but I don't quite understand how you generate the numbers.

If it isn't too much trouble could you help clarify - I like to learn and understand the code I use.
0
 
lwadwellCommented:
No problem at all ... and I am glad you like to understand the code you use, me too.
I am going to provide a rather long description in a commented example below.  Here is some extra reading http://www.simple-talk.com/sql/t-sql-programming/sql-server-cte-basics/
/* --------------------------------------------------------------------
   STEP 1: Create the table and data for this example.
-------------------------------------------------------------------- */
-- Drop the table is left over from some other run
IF OBJECT_ID('tempdb..#tmp_tblExample') IS NOT NULL
	DROP TABLE #tmp_tblExample
-- Define the table
create table #tmp_tblExample (KeyId int, RN int, QTY int);
-- Load Sample Data
INSERT INTO #tmp_tblExample VALUES (1, 1, 4)
INSERT INTO #tmp_tblExample VALUES (1, 2, 2)
INSERT INTO #tmp_tblExample VALUES (1, 3, 1)
INSERT INTO #tmp_tblExample VALUES (1, 4, 3)
INSERT INTO #tmp_tblExample VALUES (1, 5, 5)
INSERT INTO #tmp_tblExample VALUES (1, 6, 2)
-- View the Sample Data
SELECT * FROM #tmp_tblExample


/* --------------------------------------------------------------------
   STEP 2: Create the start and end pos using standard SQL
   ~~~~~~~
           First, this is what you could to do to create the start and
   end position values using normal joins.  It is important top 
   understand this as it forms the basis of how the recursive CTE will
   work.
-------------------------------------------------------------------- */
-- create ROW 1
SELECT row1.KeyId, row1.RN, row1.QTY, 1 as start_pos, row1.QTY as end_pos
FROM #tmp_tblExample row1
WHERE row1.RN = 1
UNION ALL                -- For row2, use the end pos from row1, add 1, as its start pos
-- create ROW 2.         --           the end pos from row1 + QTY and the new end pos
SELECT row2.KeyId, row2.RN, row2.QTY, row1.end_pos+1 as start_pos, row1.end_pos+row2.QTY as end_pos
FROM  -- ROW 1 SQL as an inline view
     (SELECT row1.KeyId, row1.RN, row1.QTY, 1 as start_pos, row1.QTY as end_pos
      FROM #tmp_tblExample row1
      WHERE row1.RN = 1) row1
     -- join to the row with the next highest row number
     JOIN #tmp_tblExample row2 ON row1.KeyID = row2.KeyID  
                                  and row1.RN + 1 = row2.RN -- i.e. row2.RN = 2
WHERE row1.RN = 1
UNION ALL                -- For row3, use the end pos from row2, add 1, as its start pos
-- create ROW 3.         --           the end pos from row2 + QTY and the new end pos
SELECT row3.KeyId, row3.RN, row3.QTY, row2.end_pos+1 as start_pos, row2.end_pos+row3.QTY as end_pos
FROM  -- ROW 2 SQL as an inline view
     (SELECT row2.KeyId, row2.RN, row2.QTY, row1.end_pos+1 as start_pos, row1.end_pos+row2.QTY as end_pos
      FROM  -- ROW 1 SQL as an inline view
           (SELECT row1.KeyId, row1.RN, row1.QTY, 1 as start_pos, row1.QTY as end_pos
            FROM #tmp_tblExample row1
            WHERE row1.RN = 1) row1
           -- join to the row with the next highest row number
           JOIN #tmp_tblExample row2 ON row1.KeyID = row2.KeyID  
                                        and row1.RN + 1 = row2.RN -- i.e. row2.RN = 2
      WHERE row1.RN = 1) row2
     -- join to the row with the next highest row number
     JOIN #tmp_tblExample row3 ON row2.KeyID = row3.KeyID  
                                  and row2.RN + 1 = row3.RN -- i.e. row3.RN = 3
        /*-------------------------------------------------+
        |  Hopefully you see from this, just the first 3   |
        | rows example that the full SQL would be tiresome |
        | to build and maintain.  Thankfully a recursive   |
        | CTE handles all of that for you.                 |
        +-------------------------------------------------*/


/* --------------------------------------------------------------------
   STEP 3: Create the start and end pos using Recursive CTE
   ~~~~~~~
           A recursive CTE is exactly that - it's recursive.  By that
   it joins to itself ... in a LOOP.  This is a trick specific to SQL
   Server as far as I know.  So it effectively loop through so that:
     loop 1 - only creates the row 1 (top half of the union all)
     loop 2 - takes row 1 to create row 2
     loop 3 - takes row 2 to create row 3 
     ... etc and on until not more rows can be created (or the 
     recursion limit is reached).
-------------------------------------------------------------------- */
;WITH recursive_CTE AS (
-- create ROW 1
SELECT row1.KeyId, row1.RN, row1.QTY, 1 as start_pos, row1.QTY as end_pos
FROM #tmp_tblExample row1
WHERE row1.RN = 1
UNION ALL                -- For rowY, use the end pos from rowX, add 1, as its start pos
-- create ROW Y.         --           the end pos from rowX + QTY and the new end pos
SELECT rowY.KeyId, rowY.RN, rowY.QTY, rowX.end_pos+1 as start_pos, rowX.end_pos+rowY.QTY as end_pos
FROM recursive_CTE rowX  -- row X is the output from the previous loop through
     JOIN #tmp_tblExample rowY ON rowX.KeyID = rowY.KeyID  
                                  AND rowX.RN + 1 = rowY.RN 
)
SELECT *
FROM recursive_CTE 

/* --------------------------------------------------------------------
   EXTRA: Another method inspired by @pourfard
   ~~~~~~
          @pourfard wasn't far from another method to get the start
   and end position values.  Here is my implementation of his/her idea.
-------------------------------------------------------------------- */
SELECT KeyId, RN, QTY
     , ISNULL((SELECT sum(x.QTY) 
               FROM #tmp_tblExample x 
               WHERE x.KeyID = y.KeyID AND x.RN < y.RN),0)+1     as start_pos
     , ISNULL((SELECT sum(x.QTY) 
               FROM #tmp_tblExample x 
               WHERE x.KeyID = y.KeyID AND x.RN < y.RN),0)+y.QTY as end_pos
FROM #tmp_tblExample y

Open in new window

If this still isn't clear enough (and I could imagine why it wouldn't be) ... please ask more questions.

Using the method inspired by @pourfard ... the SQL would be:
;with Item_w_rn as (
    Select HeaderKey, ItemKey, Item, width, QTY
         , row_number()over(partition by HeaderKey, width order by ItemKey) rn 
    From #tmp_tblItems
), Position_w_rn as (
    Select HeaderKey, PosKey, Position, width
         , row_number()over(partition by HeaderKey, width order by PosKey) rn 
    From #tmp_tblPosition
)
select b.HeaderKey, b.Position, b.width, a.Item
from (SELECT HeaderKey, ItemKey, Item, width, QTY
           , ISNULL((SELECT sum(y.QTY) 
                     FROM Item_w_rn y 
                     WHERE x.HeaderKey = y.HeaderKey and x.width = y.width AND x.RN > y.RN),0)+1     as start_pos
           , ISNULL((SELECT sum(y.QTY) 
                     FROM Item_w_rn y 
                     WHERE x.HeaderKey = y.HeaderKey and x.width = y.width AND x.RN > y.RN),0)+x.QTY as end_pos
      FROM Item_w_rn x) a
inner join Position_w_rn b on a.Headerkey = b.Headerkey and a.width = b.width and b.rn between a.start_pos and a.end_pos
Order by b.Position

Open in new window

0
 
wsadfilmAuthor Commented:
Thank you for your help, solution, and explination.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now