Solved

SQL Query returning multiple rows

Posted on 2015-02-23
20
20 Views
Last Modified: 2016-06-19
I have a query where I am getting contact data for attendees of required courses and if they have or have not attended the course pluse the course name of the ther courses that is required that they need to take. The data for the above is comming from multiple databases and their tables.  Not sure if my JOINS are correct I have tried using DISTICT, GROUP BY and HAVING statements and still duplicating some of the returned row not all rows have duplicate data.

SQL Query:
    SELECT 
		GETDATE() AS CurrentDate,
		DATEADD(d, + 30, GETDATE()) AS MonthlyCurrentDate,
		r.ntid AS 'Student ID', 
		A.A_NBR AS '5 Digit Assoc A #',  
        CASE 
			WHEN r.ntid LIKE '%s%' OR r.ntid LIKE '%swa%' OR r.ntid LIKE '%sid%' OR r.ntid LIKE '%shi%' OR r.ntid LIKE '%sor%' OR r.ntid LIKE '%sak%'  OR r.ntid LIKE '%sco%'
			THEN 'LSP' 
			WHEN r.ntid LIKE '%0A%' OR r.ntid LIKE '%0a%' OR r.ntid LIKE '%a0%' OR r.ntid LIKE '%A0%' 
			THEN 'Agent' 
		END AS 'Student Type Dept', 
		CASE 
			WHEN WC.[Last Name] IS NOT NULL THEN WC.[Last Name]
			ELSE PARSENAME(REPLACE(r.name, ' ', '.'), 2) 
		END AS 'A Last Name', 
		CASE 
			WHEN WC.[First Name] IS NOT NULL THEN WC.[First Name]
			ELSE PARSENAME(REPLACE(r.name, ' ', '.'), 1) 
		END AS 'A First Name', 
		CASE 
			WHEN s.LAST_NM IS NOT NULL THEN s.LAST_NM 
			ELSE PARSENAME(REPLACE(r.name, ' ', '.'), 2) 
		END AS 'LSP Last Name', 
		CASE 
			WHEN s.FIRST_NM IS NOT NULL THEN s.FIRST_NM 
			ELSE PARSENAME(REPLACE(r.name, ' ', '.'), 1) 
		END AS 'LSP First Name', 
		CASE
			--WHEN WC.[EMAIL ADDRESS] IS NOT NULL THEN WC.[EMAIL ADDRESS]
			WHEN AG.Email IS NOT NULL THEN AG.Email + '@company.COM'
			ELSE WC.[A ID]+ '@company.COM'
		END AS 'Email',
		c.EndDateTimeView,
		r.dateTimeRegistered,
		c.name AS 'Course Name' ,
		CASE 
			WHEN r.attended = 'true' AND c.Name LIKE '%Associates%' THEN 'Yes'
			WHEN r.attended = 'true' AND c.Name LIKE '%Day 3%' OR c.Name LIKE '%Retain Bundle Grow%' THEN 'Yes'
			WHEN r.attended = 'true' AND c.Name LIKE '%Property University%' THEN 'Yes'	    
			ELSE 'No'
		END AS 'Attended',
		CASE
			WHEN c.Name LIKE '%Associates%' THEN 'AF Financial University - Associates'
			WHEN c.Name LIKE '%Retain Bundle Grow%' OR c.Name LIKE '%Day 3%' THEN 'Trusted Advisor University'
			WHEN c.Name LIKE '%Property University%' THEN 'Property University'
			ELSE 'No workshops attended'
		END AS 'Workshops',
		 CASE
			WHEN c.Name NOT LIKE '%Associates%' THEN 'Need to attend : AF University Associates'
			ELSE 'Attended: AF University Associates'
		END AS 'Need Workshops AFU',
		CASE 
			WHEN  c.Name NOT LIKE '%Day 3%' OR c.Name NOT LIKE '%Retain Bundle Grow%'THEN 'Need to attend: Trusted Advisor University'
			ELSE 'Attended: Trusted Advisor University'
		END AS 'Need SS Workshop',
		CASE
			WHEN c.Name NOT LIKE '%Property University%' THEN 'Neet to attend: Advanced Auto and Property University'
			ELSE 'Attended: Advanced Auto and Property University'
		END AS 'Need PU Workshop',
		CASE
			WHEN s.StartDate IS NOT NULL THEN s.StartDate
			ELSE '2001-01-01 00:00:00.0000000'
		END AS 'LSP Start Date'
FROM    dbo._registration AS r 
		INNER JOIN wcr_staff.dbo.Affiliations AS a ON a.REP_NT_ID = r.ntid 
		INNER JOIN dbo._course AS C ON C.CourseID = r.courseNumber 
		INNER JOIN wcr_ag.dbo.tblA AS AG ON AG.AGN_A_NBR = a.A_NBR
		LEFT OUTER JOIN wcr_staff.dbo.Staff AS s ON s.SALES_PARTY_ID = A.SALES_PARTY_ID
		LEFT OUTER JOIN WC_Policy_Universe.dbo.['List2014$'] WC ON AG.AGN_NBR = WC.[A ID]
WHERE	(NOT (r.ntid = ' ')) 
		AND(AG.AGN_STATUS_CD = 3)
		AND AG.Type = 'Primary'
		AND (a.IsActive = 1)
		AND (c.EndDateTimeView >= '2012-01-01') 
		OR c.EndDateTimeView >= DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE())+31,0)
	    AND c.EndDateTimeView < DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE())+32,0)        
        OR c.EndDateTimeView BETWEEN CONVERT(DATETIME, FLOOR(CONVERT(FLOAT,GETDATE())))
		AND CONVERT(DATETIME,FLOOR(CONVERT(FLOAT,GETDATE()))+1)
ORDER BY A.AG_NBR ASC

Open in new window



Results of query where duplicate are:
2015-02-23 09:02:19.643	2015-03-25 09:02:19.643	scofvx0e	001937	LSP	HOERY	ROBERT	SPELLMAN	AMANDA	A001937@Company.COM	2015-02-23 17:00:00.000	2015-02-12 13:01:36.000	Sales University: Day 1 Auto	No	No workshops attended	Need to attend : AF University Associates	Need to attend: Trusted Advisor University	Neet to attend: Advanced Auto and Property University	2015-01-07 00:00:00.0000000
2015-02-23 09:02:19.643	2015-03-25 09:02:19.643	scofvx0e	001937	LSP	HOERY	ROBERT	SPELLMAN	AMANDA	A001937@Company.COM	2015-02-23 17:00:00.000	2015-02-12 13:01:36.000	Sales University: Day 1 Auto	No	No workshops attended	Need to attend : AF University Associates	Need to attend: Trusted Advisor University	Neet to attend: Advanced Auto and Property University	2015-01-07 00:00:00.0000000
2015-02-23 09:02:19.643	2015-03-25 09:02:19.643	scofvx0e	001937	LSP	HOERY	ROBERT	SPELLMAN	AMANDA	A001937@Company.COM	2015-02-23 17:00:00.000	2015-02-12 13:01:36.000	Sales University: Day 1 Auto	No	No workshops attended	Need to attend : AF University Associates	Need to attend: Trusted Advisor University	Neet to attend: Advanced Auto and Property University	2015-01-07 00:00:00.0000000

Open in new window


As you can see that there are three rows returning the exact same data. I know that somtimes when there is simuliar data in different tables that can cause duplicates but I need data from those other database tables.

Please help ASAP
Thanks in advance
0
Comment
Question by:newjeep19
  • 10
  • 7
  • 2
20 Comments
 
LVL 40

Expert Comment

by:Kyle Abrahams
ID: 40626115
Editing based on the fact that you already tried distinct:

what if you first insert into a temp table and then select distinct from that:
 SELECT  distinct
		GETDATE() AS CurrentDate,
		DATEADD(d, + 30, GETDATE()) AS MonthlyCurrentDate,
		r.ntid AS 'Student ID', 
		A.A_NBR AS '5 Digit Assoc A #',  
        CASE 
			WHEN r.ntid LIKE '%s%' OR r.ntid LIKE '%swa%' OR r.ntid LIKE '%sid%' OR r.ntid LIKE '%shi%' OR r.ntid LIKE '%sor%' OR r.ntid LIKE '%sak%'  OR r.ntid LIKE '%sco%'
			THEN 'LSP' 
			WHEN r.ntid LIKE '%0A%' OR r.ntid LIKE '%0a%' OR r.ntid LIKE '%a0%' OR r.ntid LIKE '%A0%' 
			THEN 'Agent' 
		END AS 'Student Type Dept', 
		CASE 
			WHEN WC.[Last Name] IS NOT NULL THEN WC.[Last Name]
			ELSE PARSENAME(REPLACE(r.name, ' ', '.'), 2) 
		END AS 'A Last Name', 
		CASE 
			WHEN WC.[First Name] IS NOT NULL THEN WC.[First Name]
			ELSE PARSENAME(REPLACE(r.name, ' ', '.'), 1) 
		END AS 'A First Name', 
		CASE 
			WHEN s.LAST_NM IS NOT NULL THEN s.LAST_NM 
			ELSE PARSENAME(REPLACE(r.name, ' ', '.'), 2) 
		END AS 'LSP Last Name', 
		CASE 
			WHEN s.FIRST_NM IS NOT NULL THEN s.FIRST_NM 
			ELSE PARSENAME(REPLACE(r.name, ' ', '.'), 1) 
		END AS 'LSP First Name', 
		CASE
			--WHEN WC.[EMAIL ADDRESS] IS NOT NULL THEN WC.[EMAIL ADDRESS]
			WHEN AG.Email IS NOT NULL THEN AG.Email + '@company.COM'
			ELSE WC.[A ID]+ '@company.COM'
		END AS 'Email',
		c.EndDateTimeView,
		r.dateTimeRegistered,
		c.name AS 'Course Name' ,
		CASE 
			WHEN r.attended = 'true' AND c.Name LIKE '%Associates%' THEN 'Yes'
			WHEN r.attended = 'true' AND c.Name LIKE '%Day 3%' OR c.Name LIKE '%Retain Bundle Grow%' THEN 'Yes'
			WHEN r.attended = 'true' AND c.Name LIKE '%Property University%' THEN 'Yes'	    
			ELSE 'No'
		END AS 'Attended',
		CASE
			WHEN c.Name LIKE '%Associates%' THEN 'AF Financial University - Associates'
			WHEN c.Name LIKE '%Retain Bundle Grow%' OR c.Name LIKE '%Day 3%' THEN 'Trusted Advisor University'
			WHEN c.Name LIKE '%Property University%' THEN 'Property University'
			ELSE 'No workshops attended'
		END AS 'Workshops',
		 CASE
			WHEN c.Name NOT LIKE '%Associates%' THEN 'Need to attend : AF University Associates'
			ELSE 'Attended: AF University Associates'
		END AS 'Need Workshops AFU',
		CASE 
			WHEN  c.Name NOT LIKE '%Day 3%' OR c.Name NOT LIKE '%Retain Bundle Grow%'THEN 'Need to attend: Trusted Advisor University'
			ELSE 'Attended: Trusted Advisor University'
		END AS 'Need SS Workshop',
		CASE
			WHEN c.Name NOT LIKE '%Property University%' THEN 'Neet to attend: Advanced Auto and Property University'
			ELSE 'Attended: Advanced Auto and Property University'
		END AS 'Need PU Workshop',
		CASE
			WHEN s.StartDate IS NOT NULL THEN s.StartDate
			ELSE '2001-01-01 00:00:00.0000000'
		END AS 'LSP Start Date'
into #tmp
FROM    dbo._registration AS r 
		INNER JOIN wcr_staff.dbo.Affiliations AS a ON a.REP_NT_ID = r.ntid 
		INNER JOIN dbo._course AS C ON C.CourseID = r.courseNumber 
		INNER JOIN wcr_ag.dbo.tblA AS AG ON AG.AGN_A_NBR = a.A_NBR
		LEFT OUTER JOIN wcr_staff.dbo.Staff AS s ON s.SALES_PARTY_ID = A.SALES_PARTY_ID
		LEFT OUTER JOIN WC_Policy_Universe.dbo.['List2014$'] WC ON AG.AGN_NBR = WC.[A ID]
WHERE	(NOT (r.ntid = ' ')) 
		AND(AG.AGN_STATUS_CD = 3)
		AND AG.Type = 'Primary'
		AND (a.IsActive = 1)
		AND (c.EndDateTimeView >= '2012-01-01') 
		OR c.EndDateTimeView >= DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE())+31,0)
	    AND c.EndDateTimeView < DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE())+32,0)        
        OR c.EndDateTimeView BETWEEN CONVERT(DATETIME, FLOOR(CONVERT(FLOAT,GETDATE())))
		AND CONVERT(DATETIME,FLOOR(CONVERT(FLOAT,GETDATE()))+1)
ORDER BY A.AG_NBR ASC

select distinct * from #tmp

Open in new window

0
 

Author Comment

by:newjeep19
ID: 40626226
Thank you for your quick respose. inserting into a temp table then adding the distict dose help but there still are some results returning w/duplicate data. It looks like that maybe the case statements could be the issue. however, I need to show if they have attended the course or not and list the name of the attended course and or the name of the course they need to attended.
Please see below:
2015-02-23 12:57:06.497	2015-03-25 12:57:06.497	sco07y0b	026640	LSP	BRAINARD	MICHAEL	HESSE	DONNA	boss@company.COM	No	Attended: AF University Associates	Need to attend: Trusted Advisor University	Neet to attend: Advanced Auto and Property University	2014-06-06 00:00:00.0000000
2015-02-23 12:57:06.497	2015-03-25 12:57:06.497	sco07y0b	026640	LSP	BRAINARD	MICHAEL	HESSE	DONNA	boss@company.COM	No	Need to attend : AF University Associates	Need to attend: Trusted Advisor University	Neet to attend: Advanced Auto and Property University	2014-06-06 00:00:00.0000000
2015-02-23 12:57:06.497	2015-03-25 12:57:06.497	sco07y0b	026640	LSP	BRAINARD	MICHAEL	HESSE	DONNA	boss@company.COM	Yes	Need to attend : AF University Associates	Need to attend: Trusted Advisor University	Attended: Advanced Auto and Property University	2014-06-06 00:00:00.0000000

Open in new window

0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40627452
Placing into a temp table does not alter the way "select distinct" works.

Using "distinct" on top of this query is just like a Hail Mary pass with 3 seconds on the clock - a desperate attempt; but unlike the pass in SQL it almost never works well.

Please see: Select Distinct is returning duplicates

A row is "distinct" if ANY TINY PART OF IT is in some way different to every other row and the 3 rows immediately above ARE DIFFERENTevery weeny tiny part is considered
Reduce the columns being evaluated and/or revise the joins so that these no longer multiply the number of rows.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40627481
btw: I am suspicious that your where clause may not be acting exactly as you expect. I think you need to place the time ranges using OR into covering parentheses.

I only use parentheses when required by logic to do so, and using that approach I would have your where clause as:
WHERE r.ntid <> ' '
      AND AG.AGN_STATUS_CD = 3
      AND AG.Type = 'Primary'
      AND a.IsActive = 1
      AND (
            c.EndDateTimeView >= '2012-01-01'
            OR 
                (     c.EndDateTimeView >= DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()) + 31, 0)
                  AND c.EndDateTimeView <  DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()) + 32, 0)
                )
            OR 
                (    c.EndDateTimeView >= CONVERT(datetime, FLOOR(CONVERT(float, GETDATE())))
                 AND c.EndDateTimeView <  CONVERT(datetime, FLOOR(CONVERT(float, GETDATE())) + 1)
                )
          )

Open in new window

I also never use between for date ranges, and I notice that in one case you don't either. so I have applied that to the other.

Personally I try to avoid NOT(x=y) if I can as I just find it confusing
&do note that  neither approach would not locate r.ntid if it is NULL (I don't know if that is required though)

{+ edit}
having just re-looked at this, lines 7 to 14 appear to be redundant because of c.EndDateTimeView >= '2012-01-01'
0
 

Author Comment

by:newjeep19
ID: 40633722
Thank you for your response. How do I"revise the joins so that these no longer multiply the number of rows. "? I am doing a inner join....do I need to change to an outer join IE Left or right? Sorry just trying to learn....not that strong on the SQL side......thanks
0
 
LVL 40

Expert Comment

by:Kyle Abrahams
ID: 40633781
Joins are really about what you're matching on.

a left or right outer join could potentially bring in more rows.

You really need to look at why the values are different and decide what to do with them.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40634142
>."How do I "revise the joins so that these no longer multiply the number of rows. "?

ANY join type can expand the number of rows, this is deliberate and not a bug. It is dependant on the data.
ANY one of, or any combination of, the 5 tables you join could be expanding the number of row.

I did supply an article reference for that: Select Distinct is returning duplicates. That article discusses using GROUP BY and aggregate functions such as MAX() . It also links to another article about the issues you face DISTINCT and GROUP BY... and why does it not work for my query?

One approach is to identify which of the tables is expanding the number of rows

do this by using
select count(*) from table1
where ...

then
select count(*) from table1 inner join table2 on ....
where ...

then
select count(*) from table1 inner join table2 on .... inner join table3 on ....
where ...

etc.

as soon as you see the count of rows increase, you have a table that expands the row numbers.

note, you may have to repeat this once you solve a table, because the next table may do the same thing.

Carefully consider what information you need from that table.

then instead of just joining the whole table,

         inner join abc on t1.id = abc.fk

join to a "derived table" instead e.g.

        inner join (select abc.fk , count(*) as x from abc group by  abc.fk) as D1 on t1.id = D1.fk

by the way, this is not always "one to one" you may need to reduce 2 or more joined tables into a single "derived table".  

all examples above are deliberately simple
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40634157
oh, an alternate approach is to reduce the number of columns, e.g. start like this
SELECT DISTINCT
		r.ntid AS 'Student ID'


FROM    dbo._registration AS r 
		INNER JOIN wcr_staff.dbo.Affiliations AS a ON a.REP_NT_ID = r.ntid 
		INNER JOIN dbo._course AS C ON C.CourseID = r.courseNumber 
		INNER JOIN wcr_ag.dbo.tblA AS AG ON AG.AGN_A_NBR = a.A_NBR
		LEFT OUTER JOIN wcr_staff.dbo.Staff AS s ON s.SALES_PARTY_ID = A.SALES_PARTY_ID
		LEFT OUTER JOIN WC_Policy_Universe.dbo.['List2014$'] WC ON AG.AGN_NBR = WC.[A ID]
WHERE r.ntid <> ' '
      AND AG.AGN_STATUS_CD = 3
      AND AG.Type = 'Primary'
      AND a.IsActive = 1
      AND (
            c.EndDateTimeView >= '2012-01-01'
            OR 
                (     c.EndDateTimeView >= DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()) + 31, 0)
                  AND c.EndDateTimeView <  DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()) + 32, 0)
                )
            OR 
                (    c.EndDateTimeView >= CONVERT(datetime, FLOOR(CONVERT(float, GETDATE())))
                 AND c.EndDateTimeView <  CONVERT(datetime, FLOOR(CONVERT(float, GETDATE())) + 1)
                )
          )

Open in new window


and keep add one (or just a few) column(s) at a time until you see "duplicates" appear.
Joins to the table(s) that/those columns come from need attention.

You have succeeded when you can remove "DISTINCT" and still get the results you need.
0
 

Author Comment

by:newjeep19
ID: 40634421
OK When I start to add the addtional tables it start to increase the count......I get that. However, I need data from those other tables. I understand that using DISTINCT and Group By dose not really work. However, I need to return data that is not duplicated. I have searched all over the web trying to fugure out how to get data from differnt tables and database using diffent joins, sub select, and unions. I also tried using your example of creating a derived table and the other suggetion of creating a temp table and I am getting errors (i.e. using the derived table)  or still getting duplicate data (using the other methods).  Again, I know what is causing the duplication.......I just don't know is how to stop the data to stop duplicating.

EX. When I run query w/just the first table (registration table) I get 1911 rows retruned.....when I add the WC_Policy_Universe.dbo.['List2014$'] database/table it increases to 1943 (the row count goes up by 33 rows).  

So, how do I stop the extra 33 rows from returning in my query?
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 48

Expert Comment

by:PortletPaul
ID: 40634534
>>"So, how do I stop the extra 33 rows from returning in my query?"

Words just don't work for this. Both the cause and solution relate to "the data" and because I cannot see your data I cannot tell you exactly what to do. I have pointed to some techniques.

when you join to that table, it is to access wc.[First Name]  wc.[Last Name]  and wc.[Email Address]

      Line 13:                   WHEN WC.[Last Name] IS NOT NULL THEN WC.[Last Name]
      Line 13:                   WHEN WC.[Last Name] IS NOT NULL THEN WC.[Last Name]
      Line 17:                   WHEN WC.[First Name] IS NOT NULL THEN WC.[First Name]
      Line 17:                   WHEN WC.[First Name] IS NOT NULL THEN WC.[First Name]
      Line 29:                   --WHEN WC.[EMAIL ADDRESS] IS NOT NULL THEN WC.[EMAIL ADDRESS]
      Line 29:                   --WHEN WC.[EMAIL ADDRESS] IS NOT NULL THEN WC.[EMAIL ADDRESS]
      Line 31:                   ELSE WC.[A ID]+ '@company.COM'
      Line 69:             LEFT OUTER JOIN WC_Policy_Universe.dbo.['List2014$'] WC ON AG.AGN_NBR = WC.[A ID]

so, a derived table for this MIGHT be:

LEFT OUTER JOIN (
            select distinct
             [A ID], [First Name],  [Last Name] , wc.[Email Address]
            from WC_Policy_Universe.dbo.['List2014$']
      ) WC  ON AG.AGN_NBR = WC.[A ID]

OR, perhaps you need "the most recent row" determined by a [last modified] field that exists in that table, so you need a more complex derived table such as:

LEFT OUTER JOIN (
                        select [A ID], [First Name],  [Last Name] , wc.[Email Address]
                        from (
                                           select  [A ID], [First Name],  [Last Name] , wc.[Email Address]
                                                    , row_number() over(partition by  [A ID] ORDER BY  [last modified] DESC as rn
                                  from WC_Policy_Universe.dbo.['List2014$']
                                          ) as INQ
      ) WC  ON AG.AGN_NBR = WC.[A ID] AND WC.RN = 1
0
 

Author Comment

by:newjeep19
ID: 40636233
OK thanks for response.
Below is my query using the derived table:
 SELECT 
		CASE
			WHEN a.IsActive = 1 OR AG.AGN_STATUS_CD = 3 THEN r.ntid
		END AS 'Student ID',  
        CASE 
			WHEN  a.IsActive = 1 AND r.ntid LIKE '%s%' OR r.ntid LIKE '%swa%' OR r.ntid LIKE '%sid%' OR r.ntid LIKE '%shi%' OR r.ntid LIKE '%sor%' OR r.ntid LIKE '%sak%'  OR r.ntid LIKE '%sco%'
			THEN 'L' 
		END AS 'Student Type Dept', 
		CASE 
			WHEN AG.AGN_STATUS_CD = 3 AND WC.[Last Name] IS NOT NULL THEN WC.[Last Name]
		END AS 'A Last Name', 
		CASE 
			WHEN AG.AGN_STATUS_CD = 3  AND WC.[First Name] IS NOT NULL THEN WC.[First Name] 
		END AS 'A First Name', 
		CASE 
			WHEN a.IsActive = 1 AND s.LAST_NM IS NOT NULL THEN s.LAST_NM 
		END AS 'L Last Name', 
		CASE 
			WHEN a.IsActive = 1 AND s.FIRST_NM IS NOT NULL THEN s.FIRST_NM  
		END AS 'L First Name', 
		CASE
			WHEN AG.Email IS NOT NULL THEN AG.Email + '@company.COM'
		END AS 'Email',
		c.EndDateTimeView,
		r.dateTimeRegistered,
		 CASE 
      WHEN a.IsActive = 1 AND r.ntid LIKE '%s%' OR r.ntid LIKE '%swa%' OR r.ntid LIKE '%sid%' OR r.ntid LIKE '%shi%' 
	  OR r.ntid LIKE '%sor%' OR r.ntid LIKE '%sak%' OR r.ntid LIKE '%sco%' 
	  THEN
		CASE
			WHEN r.attended = 'true' AND c.Name LIKE '%Associates%'  
			     OR c.Name LIKE '%Day 3%'
				 OR c.Name LIKE '%Property University%' 
			THEN 'Attended ' + c.Name 
			ELSE 'Need s to attended ' + c.Name
		END 
	END AS 'Attended',
	CASE
		WHEN a.IsActive = 1 AND s.StartDate IS NOT NULL THEN s.StartDate
		ELSE '2001-01-01 00:00:00.0000000'
	END AS 'Start Date'
FROM    (
			      select distinct ntid, attended, dateTimeRegistered, courseNumber from dbo._registration) AS r 
		LEFT OUTER JOIN (
				  select distinct REP_NT_ID, IsActive, A_NBR, SALES_PARTY_ID from wcr_staff.dbo.Affiliations) AS a ON a.REP_NT_ID = r.ntid 
		LEFT OUTER JOIN (
				 select distinct EndDateTimeView, Name, CourseID, StartDateTimeView from dbo._course) AS C ON r.courseNumber = C.CourseID 
		LEFT OUTER JOIN (
				select distinct Email, AGN_A_NBR, AGN_STATUS_CD from wcr_agents.dbo.tblAgents) AS AG ON AG.AGN_AGENT_NBR = a.A_NBR
		LEFT OUTER JOIN (
		      select distinct LAST_NM , FIRST_NM, SALES_PARTY_ID, StartDate from wcr_staff.dbo.Staff) AS s ON A.SALES_PARTY_ID = s.SALES_PARTY_ID 
		LEFT OUTER JOIN ( 
            select  distinct [A ID], [First Name], [Last Name] from WC_Policy_Universe.dbo.['List2014$']
      ) WC ON AG.AGN_AGENT_NBR = WC.[A ID]

WHERE (C.Name LIKE '%Bachelors%' OR C.Name LIKE '%Associates%' 
	   OR C.Name LIKE '%Property University%' OR C.Name LIKE '%Day 3%')
	   AND r.ntid <> ' '
	   AND a.IsActive = 1
       AND (
             c.StartDateTimeView >= DATEADD(yyyy, -2, GETDATE()) 
           )
ORDER BY a.A_NBR ASC

Open in new window

I am still getting duplicate data seebelow:
scofvx0e	L	H	ROBERT	S	AMANDA	A001937@allstate.COM	2015-04-16 17:00:00.000	2015-02-12 13:20:01.000	Attended Advanced University	2015-01-07 00:00:00.0000000
scofvx0e	L	H	ROBERT	S	AMANDA	A001937@allstate.COM	2015-05-19 17:00:00.000	2015-02-12 13:26:32.000	Need s to attended  Financial University - Associates	2015-01-07 00:00:00.0000000
scofvx0e	L	H	ROBERT	S	AMANDA	A001937@allstate.COM	2015-02-25 17:00:00.000	2015-02-12 13:01:55.000	Attended Sales University: Day 3 Sales	2015-01-07 00:00:00.0000000

Open in new window

Still not sure how to stop duplicate data.....
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40636505
Oh dear. That query has taken some advice and gone ballistic with it :)

You almost certainly don't need so many derived tables, but as I cannot use your data I cannot tell you which are needed or not needed.

FROM    (
			      select distinct ntid, attended, dateTimeRegistered, courseNumber from dbo._registration) AS r 
		LEFT OUTER JOIN (
				  select distinct REP_NT_ID, IsActive, A_NBR, SALES_PARTY_ID from wcr_staff.dbo.Affiliations) AS a ON a.REP_NT_ID = r.ntid 
		LEFT OUTER JOIN (
				 select distinct EndDateTimeView, Name, CourseID, StartDateTimeView from dbo._course) AS C ON r.courseNumber = C.CourseID 
		LEFT OUTER JOIN (
				select distinct Email, AGN_A_NBR, AGN_STATUS_CD from wcr_agents.dbo.tblAgents) AS AG ON AG.AGN_AGENT_NBR = a.A_NBR
		LEFT OUTER JOIN (
		      select distinct LAST_NM , FIRST_NM, SALES_PARTY_ID, StartDate from wcr_staff.dbo.Staff) AS s ON A.SALES_PARTY_ID = s.SALES_PARTY_ID 
		LEFT OUTER JOIN ( 
            select  distinct [A ID], [First Name], [Last Name] from WC_Policy_Universe.dbo.['List2014$']
      ) WC ON AG.AGN_AGENT_NBR = WC.[A ID]

Open in new window


Please, going forward, do not apply this to every query you produce as they will probably perform very slowly.
--------------------------
--------------------------
--------------------------

Before going any further it is imperative you understand what the term "distinct" actually means in SQL.

"SELECT DISTINCT" creates unique ROWS
& absolutely every column of every row is considered when determining if a row is unique.

it only takes the smallest difference in the data for a row to be unique. For example the time of day
if one row has a time of     09:12:35 +09876
and another has a time of 09:12:35 +09886

they are different
they will be on different rows!
 (and by the way you might not even see those milliseconds displayed)

-----------------------------------------
Those 3 rows you display above ARE DISTINCT!!!!!!!!!!!!!!!!!!!!!

No amount of derived tables is going to change the definition of distinct


This is almost where we started. I displayed some differences in some rows and explained that they are distinct.

You have to decide what to do with each of these differences.
e.g.
for the datetime values you could     CAST( R.dateTimeRegistered as date)    then the time portion will be ignored

for the text however the options are much harder, this case expression is producing DIFFERENT VALUES
so you will get DIFFERENT ROWS

      WHEN a.IsActive = 1 AND r.ntid LIKE '%s%' OR r.ntid LIKE '%swa%' OR r.ntid LIKE '%sid%' OR r.ntid LIKE '%shi%'
        OR r.ntid LIKE '%sor%' OR r.ntid LIKE '%sak%' OR r.ntid LIKE '%sco%'
        THEN
            CASE
                  WHEN r.attended = 'true' AND c.Name LIKE '%Associates%'  
                       OR c.Name LIKE '%Day 3%'
                         OR c.Name LIKE '%Property University%'
                  THEN 'Attended ' + c.Name
                  ELSE 'Need s to attended ' + c.Name
            END
      END AS 'Attended',

I don't think you have achieved much (or any) value from those derived tables, every one of them using distinct.
Each time you use distinct it slows the query down, so using it should only be done when needed.

You have differences in the rows. Select distinct will not magically solve those differences.
Look through the rows
find the differences
consider the column that holds those different values
make a decision about what to do for each column (e.g. cast datetime to date, accept the difference)
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40636534
Also, just so this isn't the cause of misunderstandings, SQL will repeat information. It does this deliberately, it is not a bug or some annoyance designed to frustrate you.

So, please don't expect a query to "suppress" repeated information

SQL does repeat information, this is normal (and necessary)

scofvx0e	L	H	ROBERT	Attended Advanced University
scofvx0e	L	H	ROBERT	Need s to attended  Financial University - Associates
scofvx0e	L	H	ROBERT	Attended Sales University: Day 3 Sales

SQL does  NOT  do the following

scofvx0e	L	H	ROBERT	Attended Advanced University
                                        Need s to attended  Financial University - Associates
                                        Attended Sales University: Day 3 Sales

Open in new window

0
 

Author Comment

by:newjeep19
ID: 40639488
OK thank you for your explanation. So, basically in SQL I cannot get the results to return as below:

scofvx0e      L      H      ROBERT      Attended Advanced University, Need s to attended  Financial University – Associates,  Attended Sales University: Day 3 Sales

The above is what I am actually looking to achieve in my query. I will look to see if that is possible using code in the application instead of SQL. I appreciate your time and effort in explaining how SQL works.
0
 
LVL 48

Accepted Solution

by:
PortletPaul earned 500 total points
ID: 40640401
I didn't say precisely that

You had not referenced a desire to have that text column as a comma separated string, which is different again to what I was trying to explain above. There is a technique for achieving a comma separated string (e.g. this PAQ)

However it isn't just one column that is producing the multiple rows. If you look at the yellow highlight (of image in ID: 40627452) above it is 2 columns (at least).

So, the rows you are getting ARE DISTINCT, and you need to decide what to do for all of the columns that are causing the rows to be different

Putting a column into  comma separated string is one of those decision options, or concatenating 2 columns into one comma separated string is another of those options (however will a multi-part comma separated string really be useful ?)

If you have the ability to re-process the source rows in the "presentation layer" then that is quite likely to be the best solution.

----
just as a small point:
now you know how "distinct" works, in future think of this type of problem as "unwanted repetition" :)
I find this helps as "select distinct" is often NOT the answer :)
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40640423
By the way, all those "derived tables" each one using "select distinct": will make the query slow and it didn't solve the problem you  are facing.

IF any of them DO help reduce the row repetition it can remain, BUT I suspect they do nothing useful and they should be removed in favour of just joining to the table.


-----
(it also wasn't what I intended at all)

What I had intended was using derived tables "when it is appropriate" to solving the dilemma you are facing (and to do this selectively) not to every individual table.
0
 

Author Comment

by:newjeep19
ID: 40699264
Thanks
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 40699361
Is this resolved now?
0
 

Author Comment

by:newjeep19
ID: 40699387
yes thank you for your help. I gave you an A w/500 points very educational
0

Featured Post

Free Gift Card with Acronis Backup Purchase!

Backup any data in any location: local and remote systems, physical and virtual servers, private and public clouds, Macs and PCs, tablets and mobile devices, & more! For limited time only, buy any Acronis backup products and get a FREE Amazon/Best Buy gift card worth up to $200!

Join & Write a Comment

Suggested Solutions

Everyone has problem when going to load data into Data warehouse (EDW). They all need to confirm that data quality is good but they don't no how to proceed. Microsoft has provided new task within SSIS 2008 called "Data Profiler Task". It solve th…
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now