Joining 2 tables in SQL

Hello Experts,

Can someone please help me join tables with different number of fields.  I am trying to select records from one table and do a left join with another table that i have created from 2 other tables.  My left table is the customer table.  I want all the records in the customer table and all the records in the Here is an example query that I have:

select ID
,CUSTID
,BPCUST
,BPV_USER_ID
,PRDUSER
,ENROLLMENT_DATE
,LOGINDATE
,LSTUSEDATE
,LSTSRTDATE
from CUSTOMER x

LEFT JOIN

select ID, CUSTID,'Online' as Online
from (

select ID,CUSTID,TRANSDT
from (
select ID,CUSTID
,max(b.TRANSDT) as TRANSDT
from tbl1
group by di_id,CUSTID)

union all

select DI_ID ,CUSTID
,max(TRANSDT) as TRANSDT
from tbl2
group by ID, CUSTID) t
on x.id = t.id and x.custid = t.custid

This is just a concept, but it did not work.  Can someone please come my aid?

Thanks.
LVL 1
fb1990Asked:
Who is Participating?
 
Harish VargheseConnect With a Mentor Project LeaderCommented:
Yes, PortletPaul is right. In fact, you can skip the wrapper SELECT on the UNION sqls:
select 
	substring(DI_ID,3,4) as diid
	,x.MEM_NUMBER
	,x.IS_BP_USER
	,x.ENROLLMENT_DATE

from CUSTOMER x
--where substring(x.DI_ID,3,4) in ('114','806','111','520','340','391','316','364','536','533','728','102','135')

LEFT OUTER JOIN 
    (select DI_ID as diid,MEM_NUM as MEM_NUMBER
		,max(TRANS_DT) as trans_dt
		from TBL1
		where DI_ID in ('114','806','111','520','340','391','316','364','536','533','728','102','135')
		group by di_id,mem_num

    union all

	select DI_ID as diid,MEM_NUM as MEM_NUMBER
		,max(TRANS_DT) as trans_dt
		from TBL2
		where DI_ID in ('114','806','111','520','340','391','316','364','536','533','728','102','135')
		group by di_id,mem_num
	) T
ON substring(x.DI_ID,3,4)  = t.diid 
       AND x.MEM_NUMBER = T.MEM_NUMBER
--where substring(x.DI_ID,3,4) in ('114','806','111','520','340','391','316','364','536','533','728','102','135')

Open in new window

And the WHERE clause for CUSTOMER table should be at the end. Again, using a substring in your only WHERE condition is definitely a performance concern with 16M records. If that cannot be avoided, try using a computed column on this table using substring and create an index on it.
0
 
Harish VargheseProject LeaderCommented:
Please provide the structure of the tables and state the requirement clearly.
0
 
fb1990Author Commented:
My apologies for the ambiguity.  It is like i have 2 tables. The query works individually.  For example when i run this query:
select ID
,CUSTID
,BPCUST
,BPV_USER_ID
,PRDUSER
,ENROLLMENT_DATE
,LOGINDATE
,LSTUSEDATE
,LSTSRTDATE
from CUSTOMER x

I get my results.  But i want combine the results to the data that i am getting from this other query

select ID,CUSTID,TRANSDT
from (
select ID,CUSTID
,max(b.TRANSDT) as TRANSDT
from tbl1
group by di_id,CUSTID)

union all

select DI_ID ,CUSTID
,max(TRANSDT) as TRANSDT
from tbl2
group by ID, CUSTID

The 2 resultsets have ID and CUSTID in common.  Does that bring some perspective to what I am trying to do?

Thanks for the swift response.
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
Russell FoxDatabase DeveloperCommented:
You're close, just over-complicating things a bit. This should work - feel free to reformat, this is just how I tend to indent things for clarity:
SELECT ID
	,CUSTID
	,BPCUST
	,BPV_USER_ID
	,PRDUSER
	,ENROLLMENT_DATE
	,LOGINDATE
	,LSTUSEDATE
	,LSTSRTDATE
FROM CUSTOMER x

LEFT JOIN
	(
	SELECT ID
		, CUSTID
		, MAX(b.TRANSDT) AS TRANSDT
	FROM tbl1 
	GROUP BY di_id,CUSTID
	-- -------------------------------
	UNION ALL 
	-- -------------------------------
	SELECT DI_ID
		, CUSTID
		, MAX(TRANSDT) AS TRANSDT
	FROM tbl2
	GROUP BY ID, CUSTID
	) t
ON x.id = y.id and x.custid = y.custid

Open in new window

0
 
Harish VargheseProject LeaderCommented:
select ID
	,CUSTID
	,BPCUST
	,BPV_USER_ID
	,PRDUSER
	,ENROLLMENT_DATE
	,LOGINDATE
	,LSTUSEDATE
	,LSTSRTDATE
from CUSTOMER x
LEFT OUTER JOIN 
	(select ID,CUSTID,TRANSDT
	from ( select ID,CUSTID
			 ,max(b.TRANSDT) as TRANSDT
			 from tbl1 
			 group by id,CUSTID)

	union all

	select DI_ID ,CUSTID
	,max(TRANSDT) as TRANSDT
	from tbl2
	group by DI_ID, CUSTID) T
ON x.ID = T.ID 
AND x.CUSTID = T.CUSTID

Open in new window

0
 
PortletPaulCommented:
Please note your existing UNION ALL query could lead to "duplicates" in the final result. Is that expected/wanted?

Also note that the UNION ALL query you provided has some syntax errors e.g. "max(b.TRANSDT) " but no alias b, and there is mention of DI_ID and ID but they are inconsistent.

SELECT
      t.ID
    , t.CUSTID
    , c.BPCUST
    , c.BPV_USER_ID
    , c.PRDUSER
    , c.ENROLLMENT_DATE
    , c.LOGINDATE
    , c.LSTUSEDATE
    , c.LSTSRTDATE
FROM (
      SELECT
            di_id        AS ID
          , CUSTID
          , MAX(TRANSDT) AS TRANSDT
      FROM tbl1
      GROUP BY
            di_id
          , CUSTID

      UNION ALL

            SELECT
                  di_id        AS ID
                , CUSTID
                , MAX(TRANSDT) AS TRANSDT
            FROM tbl2
            GROUP BY
                  di_id
                , CUSTID
      ) t
      INNER JOIN CUSTOMER c
                  ON t.id = c.id
                        AND t.custid = c.custid

Open in new window

0
 
FutureDBA-Commented:
I don't have data to test the query, and it all comes down to writing preferences, but for things like this, I like using with clause.

Try something like this. This will also get rid of any duplicate rows by using distinct clause.

WITH X AS 
	(select 
		ID,
		CUSTID,
		BPCUST,
		BPV_USER_ID,
		PRDUSER,
		ENROLLMENT_DATE,
		LOGINDATE,
		LSTUSEDATE,
		LSTSRTDATE,
	from CUSTOMER),
	 Q AS
		(select 
			ID,
			CUSTID,
			max(b.TRANSDT) as TRANSDT
		from tbl1 
		group by di_id,CUSTID

		union all

		select 
			DI_ID AS ID,
			CUSTID,
			max(TRANSDT) as TRANSDT
		from tbl2
		group by ID, CUSTID)

 SELECT DISTINCT * FROM X LEFT OUTER JOIN Q on X.ID = Q.ID AND Q.CUSTID = X.CUSTID

Open in new window

0
 
PortletPaulCommented:
My reference to duplicates wasn't meant to alarm, or to criticize, it is simply a fact that the existing union all query could produce duplicates. But we have no way of knowing if it will. For example "table1" and "table2" might be mutually exclusive sets of customers and hence no duplication can ever occur. Or it may be that the possible repetition is actually needed.

----
Why would you prefer a CTE when there is absolutely no advantage from it* & using distinct across the final result would be the least efficient way to remove duplication too (just use union instead of union all).

* my rule of thumb is that a CTE is worth it if:
a. there is a need for recursion
b. there is re-use potential (avoid doing the same thing over and over)

I utterly disagree with those who do it for cosmetic reasons, it just does not make the query simpler to read
(because the CTE is still part of the query, all you achieve is having to "look upward" to comprehend the whole query)
0
 
fb1990Author Commented:
You guys are so great.  I am testing PortletPaul and Harish Varghese solutions right now with my live data.  I will back soon.

Thanks for the help thus far...
0
 
fb1990Author Commented:
I am testing the solution provided by FutureDBA as well.  I want examine my results and get back...All 3 solution are running as we speak

Thanks
0
 
fb1990Author Commented:
whew..The codes have running for about 5 hour still not results.  I usually get data out of the environment within a maximum of 1 hour..What could be going here?
0
 
Harish VargheseProject LeaderCommented:
How many rows are there in CUSTOMER table? Since you are using a LEFT OUTER JOIN here, you are going to fetch all records from CUSTOMER table. And do you have proper indexes in tbl1 and tbl2? You need to have a clustered index on ID (or DI_ID?) and CUSTID columns in tbl1 and tbl2 OR a non-clustered index on same columns with an INCLUDE (TRANSDT).
0
 
fb1990Author Commented:
Hello Harish, You are definitely right.  The customer table has about 16M records while tbl1 and tabl2 will yield approximately 6 million record each with the max TRANSDT.  I might have to work on a sample of the data
0
 
Harish VargheseProject LeaderCommented:
Fetching 16M records should not be a routine activity. You can also split the query into multiple statements and see how much time each takes and then fine tune from that point:
Create Table TempTblAll (
	ID int,
	CUSTID int,
	TRANSDT datetime,
	Primary Key (ID, CUSTID) WITH FILLFACTOR = 90
)

Insert into TempTblAll
Select ID,CUSTID
	 ,max(b.TRANSDT) as TRANSDT
from tbl1 
group by di_id,CUSTID

Insert into TempTblAll
select DI_ID ,CUSTID
	,max(TRANSDT) as TRANSDT
from tbl2 T
Where Not Exists (Select 1 from TempTblAll A WHERE A.ID = T.DI_ID And A.CUSTID = T.CUSTID)
group by DI_ID, CUSTID

SELECT ID
	,CUSTID
	,BPCUST
	,BPV_USER_ID
	,PRDUSER
	,ENROLLMENT_DATE
	,LOGINDATE
	,LSTUSEDATE
	,LSTSRTDATE
FROM CUSTOMER x
LEFT JOIN TempTblAll T
	ON x.ID = T.ID 
	AND x.CUSTID = T.CUSTID

Open in new window

0
 
fb1990Author Commented:
Thanks for this suggestion Harish.  One question for you.  I used a where clause on the customer table based on your original solution
-------
from customer x
where x.ID in ('114','806','111','520','340','391','316','364','536','533','728','102','135')
-----
, i am getting SQL Command Not Properly ended error.  Do you you any suggestion for me.

Also,  I will not pulling this as a routine.  I am try to find a pattern and then decide on what i really wanted.  For example i do not need all the field.  The best option for would have been pull a small subset of the data  and decide from that point. You have some great suggestions.  Thank you!
0
 
Harish VargheseProject LeaderCommented:
I do not see any issue with the WHERE clause as such. Can you please post the complete query that you are running? You can also give just one value first and see if the error still occurs - WHERE x.ID in  ('114')
0
 
PortletPaulCommented:
ID would be integer wouldn't it?
why are you using strings?
you could be causing "implicit conversion" which will slow your query
0
 
fb1990Author Commented:
Here is my code based on Harish Solution and my ID is varchar2 not integer

select 
substr(DI_ID,3,4) as diid
,x.MEM_NUMBER
,x.IS_BP_USER
,x.ENROLLMENT_DATE

from CUSTOMER x
where substr(x.DI_ID,3,4) in ('114','806','111','520','340','391','316','364','536','533','728','102','135')

LEFT OUTER JOIN 
    (select diid,MEM_NUMBER,trans_dt
from (select DI_ID as diid,MEM_NUM as MEM_NUMBER
,max(TRANS_DT) as trans_dt
from TBL1
where DI_ID in ('114','806','111','520','340','391','316','364','536','533','728','102','135')

group by di_id,mem_num)

    union all

select DI_ID as diid,MEM_NUM as MEM_NUMBER
,max(TRANS_DT) as trans_dt
from TBL2
where DI_ID in ('114','806','111','520','340','391','316','364','536','533','728','102','135')
group by di_id,mem_num) T
ON substr(x.DI_ID,3,4)  = t.diid 
AND x.MEM_NUMBER = T.MEM_NUMBER

Open in new window

0
 
fb1990Author Commented:
Thank you very much to all that contributed.  I have been on assignment and away from the office.  Please excuse my inaction.
0
All Courses

From novice to tech pro — start learning today.