Link to home
Start Free TrialLog in
Avatar of Stephen Forero
Stephen ForeroFlag for United States of America

asked on

sql inner join duplicating values

I have 2 tables that I am trying to join and my results are being duplicated.

I am trying to return the following question.

Select *
From PositionsTbl
Inner join AccountsTbl on accountstbl.acctnum = positionstbl.acctnum
where acctactive = true

The problem I am getting, say for example with acctnum 11111, is the results should be 2 lines... quantity total = 2.
But the way I wrote the query is results are 4 lines, quantity 4.

I know it is because I have the same account in twice but I don't know how to achieve the results I am trying to.
I am also not married to the structure of the table.

Any ideas???

AccountsTbl
AcctNum         AcctActive
11111               True
11111               True
22222               False
33333               True

PositionsTbl
AcctNum          Quantity
11111               1
11111               1
22222               1
33333               1
Avatar of Mayank Gairola
Mayank Gairola
Flag of India image

What is the PK(primary key) of table PositionsTbl or the other table? You should always join on identity column/columns  i.e. column/columns that can uniquely identify each row in the table.
The issue here is that you have 2 '11111 ' in both the tables and each '11111 ' of a table matches the 2 '11111' of the other table so finally you get 2*2= 4 records.
Avatar of Deepak Chauhan
@but I don't know how to achieve the results

select distinct * from (
Select *
 From PositionsTbl
 Inner join AccountsTbl on accountstbl.acctnum = positionstbl.acctnum
 where acctactive = true ) TBL
The account table should have a UNIQUE constraint on the AcctNum column. Otherwise you'll need a DISTINCT operator:

USE tempdb;
GO

CREATE TABLE #Accounts
    (
      AcctNum INT ,
      AcctActive BIT ,
      CONSTRAINT UQ UNIQUE ( AcctNum )
    );

INSERT  INTO #Accounts
VALUES  ( 11111, 1 ),
        ( 22222, 0 ),
        ( 33333, 1 );
 
DECLARE @Positions TABLE
    (
      AcctNum INT ,
      Quantity INT
    );

INSERT  INTO @Positions
        ( AcctNum, Quantity )
VALUES  ( 11111, 1 ),
        ( 11111, 1 ),
        ( 22222, 1 ),
        ( 33333, 1 );


SELECT  *
FROM    @Positions P
        INNER JOIN #Accounts A ON A.AcctNum = P.AcctNum
WHERE   A.AcctActive = 1;

DROP TABLE #Accounts;

Open in new window


or

DECLARE @AccountsTbl TABLE
    (
      AcctNum INT ,
      AcctActive BIT
    );

INSERT  INTO @AccountsTbl
VALUES  ( 11111, 1 ),
        ( 11111, 1 ),
        ( 22222, 0 ),
        ( 33333, 1 );
 
DECLARE @PositionsTbl TABLE
    (
      AcctNum INT ,
      Quantity INT
    );

INSERT  INTO @PositionsTbl
        ( AcctNum, Quantity )
VALUES  ( 11111, 1 ),
        ( 11111, 1 ),
        ( 22222, 1 ),
        ( 33333, 1 );

WITH    Accounts
          AS ( SELECT DISTINCT
                        A.AcctNum
               FROM     @AccountsTbl A
               WHERE    A.AcctActive = 1
             )
    SELECT  *
    FROM    @PositionsTbl P
            INNER JOIN Accounts A ON A.AcctNum = P.AcctNum;

Open in new window

Avatar of Stephen Forero

ASKER

there is no primary key on the positionstbl
the accountstbl has a guid unique identifier that I should have listed.

I see your point but I'm not sure how to change the tables to fit the proper way.
The big picture problem is the is a ClientsTbl also.
Client1 can have acctNum 11111
But Client2 can also have that same acctNum 11111.

Am I being to vague in my descriptions?
@Deepak - Above query will return only a single record in the given dataset.
The issue here is joining on the wrong columns.
Am I being to vague in my descriptions?

Yup. Please post a concise and complete example. Include table DDL and sample data INSERT statements as runnable T-SQL script..
btw, maybe you're looking for an aggregate:

DECLARE @AccountsTbl TABLE
    (
      AcctNum INT ,
      AcctActive BIT
    );

INSERT  INTO @AccountsTbl
VALUES  ( 11111, 1 ),
        ( 11111, 0 ),
        ( 22222, 0 ),
        ( 33333, 1 );
 
DECLARE @PositionsTbl TABLE
    (
      AcctNum INT ,
      Quantity INT
    );

INSERT  INTO @PositionsTbl
        ( AcctNum, Quantity )
VALUES  ( 11111, 1 ),
        ( 11111, 1 ),
        ( 22222, 1 ),
        ( 33333, 1 );

WITH    Accounts
          AS ( SELECT DISTINCT
                        A.AcctNum
               FROM     @AccountsTbl A
               WHERE    A.AcctActive = 1
             )
    SELECT  P.AcctNum ,
            SUM(P.Quantity) AS SumQuantity
    FROM    @PositionsTbl P
            INNER JOIN Accounts A ON A.AcctNum = P.AcctNum
    GROUP BY P.AcctNum;

Open in new window


btw, the tbl suffix is quite uncommon and ugly.
let me try and give the bigger picture.  There will be 3 main areas at first.
The clients... the accounts... and the positions.
Somewhere I need to say, John has access to acct1 and acct2, Paul has access to only Acct2.

Finally, I want a query that pulls the positions, but not duplicated entries if more than one person has access to it

ClientTable
John
Paul
Peter

AccountsTable
Acct1
Acct2
Acct3

PositionsTable
Acct1   position=ibm            qty 1
Acct1   position =dell           qty 1
Acct2   position=facebook  qty 2
I can change the structure now at this point in development to whatever it should be
Ok then.

1. Add id columns in each table like client_id, account_id and position_id
2. Add 2 more tables. One to store customer- account mapping and the other to store account- position mapping.

Then you can join all the table using their primary key and the mapping tables and you won't get any duplicates.
Like

select *
client c
join client_account ca
on c.client_id=ca.client_id
join account a
on a.account_id=ca.account_id
join account_position ap
on ap.account_id=a.account_id
join position p
on p.position_id=ap.position_id
where acctactive = true
okay, I created an excel sheet that sort of maps out what you were saying.  Take a look....
I was able to do everything except the account-position mappings.  The reason why is because the positions table can be 20,000 rows, and it changes every 20 minutes.  So I don't know how I could fill in a table that maps these.
tables.xlsx
Not sure if I understood your question correctly. But irrespective of the number of rowsin the position table, each record will have a unique id(position_id) which will be tied to only a single account_id. We need to store this mapping in account_position table.
Each new insert/update to the position table should also trigger a subsequent change in accounts_position table.
Mayank... I think I am starting to understand.
so right now I am doing a sqlbulk upload from a csv into PositionsTbl.
So you are saying for every row that gets updated, I also update a Accounts-Positions table with the ID's mapped together?

how would I trigger that type of update?
thanks for your help everyone... let me research this and come back before I close out the thread
appreciate the help!!!!
Nope, triggers are the wrong approach.

When I understand your sheet correctly, then it is a mere modelling question. What kind of relationship does exist between Client-Account and Accounts-Position?

When it is a 1:m relationship, than your tables should be

Clients: ClientID (PK), Name
Accounts: AccountID (PK), ClientID (FK), AccountNumber
Positions: PositionID (PK), AccountID (FK), PositionText, PositionQuantityNumber

This is the normal real world scenario. The other possibility is a m:n relation between Client-Account:

Clients: ClientID (PK), Name
AccountClientBridge: AccountID (PK, FK), ClientID (PK, FK)
Accounts: AccountID (PK), AccountNumber
Positions: PositionID (PK), AccountID (FK), PositionText, PositionQuantityNumber

A position does normally belong to one account. Should this here not be the case, and one position can belong to more accounts, then it is also a m:n relationship and needs also a bridge table:

Clients: ClientID (PK), Name
AccountClientBridge: AccountID (PK), ClientID (PK)
Accounts: AccountID (PK), AccountNumber
AccountPositionBridge: AccountID (PK, FK), PositionID (PK, FK)
Positions: PositionID (PK), PositionText, PositionQuantityNumber
ste5an, what I am trying to say very poorly is in my particular case is the following:

different clients, can be sharing the same account number.
so client1 AND client2 should have access to account1.
The positions being loaded will ONLY ever belong to 1 specific account.(IBM qty=1 will only belong to Acct1)....... but then again, that one account is shared by client1 AND client2.

which example does that fit in into your descriptions?
I think based on what you are telling me I would want your last example...
Clients: ClientID (PK), Name
 AccountClientBridge: AccountID (PK), ClientID (PK)
 Accounts: AccountID (PK), AccountNumber
 AccountPositionBridge: AccountID (PK, FK), PositionID (PK, FK)
 Positions: PositionID (PK), PositionText, PositionQuantityNumber

The issue is the AccountPositionBridge... new positions are being sent in every 20 minutes (over 10,000 rows of positions).  So I would need to somehow update the bridge with every 20 minutes.
so a trigger is not the right approach?
ASKER CERTIFIED SOLUTION
Avatar of ste5an
ste5an
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
so in your last example...
Clients: ClientID (PK), Name
 AccountClientBridge: AccountID (PK, FK), ClientID (PK, FK)
 Accounts: AccountID (PK), AccountNumber
 Positions: PositionID (PK), AccountID (FK), PositionText, PositionQuantityNumber

in my clientstable... can I have different clients point to the same accountID?
like
clientID1    acct1
clientID2    acct1
When a client can have only one account, but one account can be shared between clients:

Clients: ClientID (PK), AccountID (FK), ClientName
Accounts: AccountID (PK), AccountNumber
Positions: PositionID (PK), AccountID (FK), PositionText, PositionQuantityNumber
thanks so much for this.  seems to be what I needed!