SQL/Excel Join Multiple Tables

Hello,
     I was recently tasked with merging multiple "databases" that contain mailing information. They are actually excel sheets with slightly different layouts, but I imported them into SQL because I thought it was a clever solution to my problem. Turns out I was wrong.

Here is a basic overview of the data each excel file contains. Each excel file contains mailing information such as First Name, Last Name, Address, etc. The important part of each file is the mailing "Code," this code represents what group the person belongs to.

There are many people who appear in multiple files, and so they have multiple codes. Because of this, I need each person to appear with all relevant codes in the same row. So what I did initially is to import each file into its own SQL table, and then I did a number of full joins on each table for First Name AND Last Name. Like this:
SELECT *
FROM dbo.LIST1$
FULL JOIN dbo.LIST2
ON LIST1$.FName = LIST2.FName AND LIST1$.LName =  LIST2.LName
FULL JOIN dbo.LIST3
ON LIST2.FName = LIST3.FName AND LIST2.LName = LIST3.LName

Open in new window

And then this continues on for about 5 more tables.
This gives me the structure I want, but some people who show codes from LIST1 and LIST2 in the same row, but then they appear again in another row with LIST3 as the code, even though the LName and FName matches that in LIST1 and LIST2. There is also a massive section of NULL rows near the end.

Perhaps doing it through SQL is the wrong way, and I'm open to any suggestions, I just need to get these joined and organized. Any help is appreciated! Thank you!
indigo6Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

PortletPaulfreelancerCommented:
try this perhaps:
select
  FName
, LName
, SUM(case when ListID = 1 then 1 end) as List1
, SUM(case when ListID = 2 then 1 end) as List2
, SUM(case when ListID = 3 then 1 end) as List3
, SUM(case when ListID = 4 then 1 end) as List4
, SUM(case when ListID = 5 then 1 end) as List5
FROM (
            select FName , LName, 1 as ListID from LIST1$
            union all
            select FName , LName, 2 as ListID from LIST2$
            union all
            select FName , LName, 3 as ListID from LIST3$
            union all
            select FName , LName, 4 as ListID from LIST4$
            union all
            select FName , LName, 5 as ListID from LIST5$
     ) as U
group by
  FName
, LName

Open in new window

0
PortletPaulfreelancerCommented:
mmm, not sure how "code" fits in, the above assumes each table equates to a code, but if there are multiple codes in each table then this may be better:
select
  FName
, LName
, code
, SUM(case when ListID = 1 then 1 end) as List1
, SUM(case when ListID = 2 then 1 end) as List2
, SUM(case when ListID = 3 then 1 end) as List3
, SUM(case when ListID = 4 then 1 end) as List4
, SUM(case when ListID = 5 then 1 end) as List5
FROM (
            select FName , LName, code, 1 as ListID from LIST1$
            union all
            select FName , LName, code, 2 as ListID from LIST2$
            union all
            select FName , LName, code, 3 as ListID from LIST3$
            union all
            select FName , LName, code, 4 as ListID from LIST4$
            union all
            select FName , LName, code, 5 as ListID from LIST5$
     ) as U
group by
  FName
, LName
, code

Open in new window

before running any of these not I don't know how many records there are or if there are any indexes - so I can't guess how long these will take to run.

If you have questions please ask, and/or provide more info about the data in the tables for us to look at.

e.g. where is the email address?
0
indigo6Author Commented:
Thanks for the prompt response!

I will strip some personal info from the sheets and post them.
0
Newly released Acronis True Image 2019

In announcing the release of the 15th Anniversary Edition of Acronis True Image 2019, the company revealed that its artificial intelligence-based anti-ransomware technology – stopped more than 200,000 ransomware attacks on 150,000 customers last year.

PortletPaulfreelancerCommented:
btw: another possibility would be to merge some data into comma separated values

some representational data (scrubbed for privacy) from each table
- just a few records of each
would help

an 'expected result layout' would also be very beneficial - so we know what to aim at
0
PortletPaulfreelancerCommented:
:) excellent - thanks
0
indigo6Author Commented:
Ok, here are some sparse examples. The final product will have every column, even if it's null. I hope this clears up the code confusion. For example, Steve is in all three lists, so all the data from all his records in the sheets will appear. April only appears in one, so only one. And Samuel appears twice, so his info from two of the sheets will appear in the same row. Any fields which do not contain data for a particular person can remain null. There are about 5 more lists as well.
AltList.xlsx
Alum.xlsx
OPH.xlsx
0
PortletPaulfreelancerCommented:
mmm, see: http://sqlfiddle.com/#!3/4d724/3 for results of the following
select
  U.FName
, U.LName
, t1.*
, t2.*
, t3.*
--, t4.*
--, t5.*
FROM (
            select FName , LName
            from table1
  
          union
            select FName , LName
            from table2
  
          union
            select FName , LName
            from table3

            
     ) as U
left join table1 as t1 on U.FName = t1.FName and U.LName = t1.LName
left join table2 as t2 on U.FName = t2.FName and U.LName = t2.LName
left join table3 as t3 on U.FName = t3.FName and U.LName = t3.LName
--left join table4 as t4 on U.FName = t4.FName and U.LName = t4.LName
--left join table5 as t5 on U.FName = t5.FName and U.LName = t5.LName

Open in new window

some points:
Can you be certain that "John Smith" is the same person in each of those files?
one list had no email address
wouldn't matching on email address be useful?

using those spreadsheets I had to remove spaces and a '/' in the headings, so I'm not too certain if the field names I ended-up with are the same as you are using.

I'd suggest using the actual table definitions (in the end it will make life easier for you)

for reference: http://sqlfiddle.com/#!3/4d724/1
CREATE TABLE Table1
	([Code] varchar(3), [Attendee] varchar(16), [INST_ORG] varchar(3), [MAddress] varchar(23)
    , [MAddII] int, [MCity] varchar(6), [MState] varchar(2), [Mzip] int, [MPostalCode] int
    , [Country] varchar(3), [LNAME] varchar(7), [FNAME] varchar(5), [MI] varchar(1)
    , [DEGREE] varchar(5), [OCCUP] int, [SPEC] int, [SALU] varchar(3), [Region] varchar(3)
    , [E_Mail] varchar(24), [PHONENo] varchar(12), [Ext] int, [FaxNo] int, [MISC] int
    , [NoMailings] varchar(5), [EmailMarketing] varchar(5))
;
	
INSERT INTO Table1
	([Code], [Attendee], [INST_ORG], [MAddress], [MAddII], [MCity], [MState], [Mzip], [MPostalCode], [Country], [LNAME], [FNAME], [MI], [DEGREE], [OCCUP], [SPEC], [SALU], [Region], [E_Mail], [PHONENo], [Ext], [FaxNo], [MISC], [NoMailings], [EmailMarketing])
VALUES
	('OPH', 'Steve M. Purnell', 'UTX', '1594 Raintree Boulevard', 1, 'Blaine', 'MN', 55434, 1, 'USA', 'Purnell', 'Steve', 'M', 'Ph. D', 1, 1, 'Dr.', 'Mid', 'SteveMPurnell@dayrep.com', '763-755-5555', 1, 1, 1, 'FALSE', 'FALSE')
;

CREATE TABLE Table2
	([code] varchar(4), [Fname] varchar(6), [Lname] varchar(7), [Address] varchar(23)
    , [ADDII] int, [City] varchar(11), [State] varchar(2), [Zip] int, [Salutation] varchar(3)
    , [Email] varchar(24), [VCF] int, [Resident_Fellow] varchar(8), [Year] int)
;
	
INSERT INTO Table2
	([code], [Fname], [Lname], [Address], [ADDII], [City], [State], [Zip], [Salutation], [Email], [VCF], [Resident_Fellow], [Year])
VALUES
	('ALUM', 'Samuel', 'Lyons', '2630 Trouser Leg Road', NULL, 'Springfield', 'MA', 01105, 'Dr.', 'SamuelJLyons@dayrep.com', NULL, 'Fellow', 2010),
	('ALUM', 'Steve', 'Purnell', '1594 Raintree Boulevard', NULL, 'Blaine', 'MN', 55434, 'Dr.', 'SteveMPurnell@dayrep.com', NULL, 'Resident', 2011)
;

CREATE TABLE Table3
	([CODE] varchar(3), [LName] varchar(7), [FName] varchar(6), [MD] varchar(2)
    , [Address] varchar(23), [City] varchar(11), [State] varchar(2), [Zip] int)
;
	
INSERT INTO Table3
	([CODE], [LName], [FName], [MD], [Address], [City], [State], [Zip])
VALUES
	('ALT', 'Lyons', 'Samuel', 'MD', '2630 Trouser Leg Road', 'Springfield', 'MA', 01105),
	('ALT', 'Purnell', 'Steve', 'MD', '1594 Raintree Boulevard', 'Blaine', 'MN', 55434),
	('ALT', 'Defazio', 'April', 'MD', '524 Valley View Drive', 'Brookline', 'MA', 02146)
;


select
  FName
, LName
, max(email) as email
, max(case when ListID = 1 then code end) as List1
, max(case when ListID = 2 then code end) as List2
, max(case when ListID = 3 then code end) as List3
, max(case when ListID = 4 then code end) as List4
, max(case when ListID = 5 then code end) as List5
FROM (
            select FName , LName, code, 1 as ListID , E_Mail as email
            from table1
  
          union all
            select FName , LName, code, 2 as ListID , Email as email
            from table2
  
          union all
            select FName , LName, code, 3 as ListID , null as email
            from table3
            --union all
            --select FName , LName, code, 4 as ListID from LIST4$
            --union all
            --select FName , LName, code, 5 as ListID from LIST5$
     ) as U
group by
  FName
, LName

Open in new window

0
indigo6Author Commented:
Thank you so much for this! I think this has me heading in the right direction. I asked the same question about duplicate names but they seemed ok with the fact that there may be two people with the same names. Looking at the list, the names are quite unique, so it probably won't be a large problem. I will try this on my test db and report back.
0
indigo6Author Commented:
Ok, we are really close. I can't get the union to work properly because each table has different fields and I need every field. The problem with my initial query, I think, is that the full joins only join the tables in pairs. How can I use a full join (or some other query) that will join across all 9 tables? Thanks!
0
PortletPaulfreelancerCommented:
You can only union that part of each table which is common (such as the names);
then 'left join' (see ID: 39451453)
have you tried this approach yet?
0
indigo6Author Commented:
I tried:

select
  FName
, LName
, max(email) as email
, max(case when ListID = 1 then ALT end) as List1
, max(case when ListID = 2 then ALUM end) as List2
, max(case when ListID = 3 then OPH end) as List3
FROM (
            select FName , LName, ALT, 1 as ListID
            from ALT
  
          union all
            select FName , LName, ALUM, 2 as ListID , Email as email
            from Alum
  
          union all
            select FName , LName, PUB, 3 as ListID
            from OPH
     ) as U
group by
  FName
, LName

Open in new window


ALT, ALUM, and PUB are the columns in the source tables that have the "code"

The message I received was:
Msg 205, Level 16, State 1, Line 1
All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.

Thank you!
0
indigo6Author Commented:
I think it may have to do with my database design as well. Since these were just imported from excel, there are no keys, etc.
0
PortletPaulfreelancerCommented:
each query tied together by UNION or UNION ALL, must have the same number of columns. Plus each column must be the same, or 'compatible', data type.

Line 13 has one more column than lines 9 & 17

Also, after the unioning is performed, the third column will have just one column heading (not 3), so in the case expressions lines 5,6,7 you cannot choose ALT, ALUM, PUB.

The simplest way to handle the different number of columns is to move the sequence around so that email is in the first part of the unions. Then NULL may be used in the next parts.

If you wish to retain the current sequence line 9 should read:
select FName , LName, ALT, 1 as ListID, cast(NULL as varchar) as Email -- assuming Email is varchar, might be nvarchar

Open in new window

Otherwise, try this version:
select
  FName
, LName
, max(email) as email
, max(case when ListID = 1 then ListValue end) as List1
, max(case when ListID = 2 then ListValue end) as List2
, max(case when ListID = 3 then ListValue end) as List3
FROM (
            select FName , LName, ALUM as ListValue, 2 as ListID , Email as email
            from Alum --<< moved up
  
          union all
            select FName , LName, ALT as ListValue, 1 as ListID , NULL as email
            from ALT
  
          union all
            select FName , LName, PUB as ListValue, 3 as ListID, NULL as email
            from OPH
     ) as U
group by
  FName
, LName

Open in new window

note that ALT, ALUM, PUB must be the same data type (as they go into the same column now called ListValue). If they are not the same data type then use CAST() so that they are forced to be the same.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
indigo6Author Commented:
You can tell I'm a newbie! I'm learning more about this as we go, and I'm committed to learning more about SQL. Also, this works perfectly! Thank you so much!
0
PortletPaulfreelancerCommented:
>>I think it may have to do with
nope, see my comment above, it's a problem in the query itself.

>> ... database design ... there are no keys
You are dealing with a "clean up"/"rationalization" and frequently this means the database design will be less than perfect :) so don't feel too bad about that. After all it is a "transitional step" rather than something permanent (well I think it is).

You may need to add indexes for performance, but given that your data may be troublesome be careful about trying to create unique indexes. Giving each table an identity field as primary key may be useful so you can uniquely address rows should you need to. But until you start to get quality data indexing might be a bit hit and miss.

Non-clustered indexes on FName , LName should assist for performance. Basically anything you will use to group by or join by would be primary candidates for indexing.
0
PortletPaulfreelancerCommented:
You are welcome! Thanks for the grading.

Ask away, there are lots of folk here with vast experience (and we get a kick out of helping).

Cheers,
Paul
0
indigo6Author Commented:
Hi Paul, I know this is solved, but I tried to expand the syntax to:
select
 FName
, LName
, Address
, City as City
, [State ] as State
, Zip as Zip
, INST_ORG as PP_INST_ORG
, Country as Country
, email as email
, max(case when ListID = 1 then ListValue end) as List1
, max(case when ListID = 2 then ListValue end) as List2
, max(case when ListID = 3 then ListValue end) as List3

FROM (
        select FName , LName, ALTA as ListValue, 1 as ListID , NULL as email, Address, City, [State ], Zip, NULL as INST_ORG, NULL as Country 
            from ALTA
  
        union all
		select FName , LName, ALUM as ListValue, 2 as ListID , Email as email, Address, City, State, Zip, NULL as INST_ORG, NULL as Country
            from Alum
  
        union all
            select FName , LName, OPHTH_PP as ListValue, 3 as ListID, [E-Mail] as email, INST_ORG, Address, MCity as City, MState as State, Mzip as Zip, Country
            from OPH
     ) as U
group by
  FName
, LName
, Address
, City
, [State ]
, Zip
, INST_ORG
, Country
, email

Open in new window


And I get the message
Msg 8114, Level 16, State 5, Line 1
Error converting data type varchar to float.

I'm not trying to cast anything, and I certainly don't want any strings to be floating point numbers. Not sure where this error is, since the only thing on line 1 is "select"
0
PortletPaulfreelancerCommented:
It is absolutely necessary when unioning that all column be the same (or compatible) data types, so whilst you have not explicitly asked for any conversion, unioning may be causing implict type conversion - when trying to stack the results together.

so the error source is in the unions, but it is felt immediately (line 1)

The likelihood is that the problem stems from any use of NULL in those unions
don't leave it to chance, explicitly CAST(NULL as ..... ) where ..... is the appropriate choice for each column, e.g.

select FName , LName, ALTA as ListValue, 1 as ListID , cast(NULL as varchar) as email, Address, City, [State ], Zip, cast(NULL as ....) as INST_ORG, cast(NULL as ....) as Country

If this still fails what I would then do is try it, bit by bit: e.g.

       select FName , LName, ALTA as ListValue, 1 as ListID , NULL as email, Address, City, [State ], Zip, NULL as INST_ORG, NULL as Country 
            from ALTA
  
        union all
		select FName , LName, ALUM as ListValue, 2 as ListID , Email as email, Address, City, State, Zip, NULL as INST_ORG, NULL as Country
            from Alum

Open in new window

then if thats ok, this bit:
		select FName , LName, ALUM as ListValue, 2 as ListID , Email as email, Address, City, State, Zip, NULL as INST_ORG, NULL as Country
            from Alum
  
        union all
            select FName , LName, OPHTH_PP as ListValue, 3 as ListID, [E-Mail] as email, INST_ORG, Address, MCity as City, MState as State, Mzip as Zip, Country
            from OPH

Open in new window

from this you should be able to identify which table the problem comes from
once you know that it's a matter of testing the columns one by one until the problem columns are identified (I do this by using -- ) like this:
		select FName , LName, ALUM as ListValue, 2 as ListID --, Email as email, Address, City, State, Zip, NULL as INST_ORG, NULL as Country
            from Alum
  
        union all
            select FName , LName, OPHTH_PP as ListValue, 3 as ListID --, [E-Mail] as email, INST_ORG, Address, MCity as City, MState as State, Mzip as Zip, Country
            from OPH

Open in new window


bottom line:
it happens when "stacking data" together, every column MUST be the same data types.
So be explicit, use CAST(NULL as ....).
Then if still failing you need to find which table(s) and column(s) are involved
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Query Syntax

From novice to tech pro — start learning today.