Link to home
Start Free TrialLog in
Avatar of sej69
sej69

asked on

Duplicate records on a SQL query using left joins

Ok, I know this is pretty much by design; I'm just trying to find a way around it for this specific instance...

In short, (and I guess expected) I get multiple records returned when I call a query with an inner join.  But in this case I do not want to know what's on the 2nd table; I just want to know it exists.

MSSQL 2008:

tblMain
pkMainID, int
szName, string(20)
...

tblDates
pkDateID
fkMainID
dtOpen  (datetime)
dtClose (datetime)

In my case I want to be able to search the dates for open/close records without pulling up the dates.  in my C# app, I handle this when the individual record is opened.  It would be way too much information to pull in during the initial query.

I need to be able to search the dates for open records and closed records.  Eg.  if dtOpen > 2010-05-11' and dtClose = '1900-01-01'  would mean that the record was opened but not closed yet.  I also need to search the dtOpen and dtClose for specific dates.

What I have for my query looks something like this:
SELECT DISTINCT szName, pkID, DTS.dtOpen, DTS.dtClose
FROM
tblMain (LEFT JOIN tblDates as DTS on tblMain.pkMainID = DTS.fkMainID)
WHERE
DTS.dtOpen = '2011-05-11' AND DTS.dtOpen = '1900-01-01' .... AND other qualifications

If the tblDate has more than one record, the SQL will give me multiple records for each which I don't want.  Does anyone know how I can use that table for checking but not reporting?

TIA!
Avatar of Christopher Gordon
Christopher Gordon
Flag of United States of America image

You can use a subquery.  

Select *
from basetable
where pkMainId in (select fkMainId from detailTable where dtOpen between @startDate and @endDAte)
SELECT m.szName, m.pkID, MAX(d.dtOpen), MAX(d.dtClose) 
FROM tblMain as m
	LEFT JOIN tblDates as d
		ON m.pkMainID = d.fkMainID
GROUP BY m.szName, m.pkID
HAVING MAX(d.dtOpen) = '2011-05-11' AND MAX(d.dtOpen) = '1900-01-01' 

Open in new window

Avatar of lcohan
I suggest you check out the IF EXISTS, INTERSECT, EXCEPT in SQL and build your queries with CASE if needed.
WEell, that poses another question.  Suppose you have a row in the tblDates for pkMainID = 1 with dtOpen = '2011-05-11' and a dtdtClose of '2011-05-11' and then another row in that table for pkMainID = 1 with dtOpen = '2011-05-11' and a dtdtClose of '1900-01-01'.  Which one do you want to get back?

SQL_1 should provide a single row for a distinct dtOpen and dtClose combination for pkMainID;
SQL_2 should provide a single row for a pkMainID and a dtOpen but there is no guarantee on which dtClose will be used;
SQL_3 should provide a single row for a pkMainID and a dtClose but there is no guarantee on which dtOpendtClose will be used;
SELECT DISTINCT szName
  ,M.pkID
  ,DTS.dtOpen
  ,DTS.dtClose
FROM tblMain M
LEFT JOIN 
(
 SELECT  pkMainID
        ,dtOpen 
        ,dtClose
        ,row_number() OVER(PARTITION BY pkMainID, dtOpen, dtClose 
                           ORDER BY pkMainID, dtOpen, dtClose) AS Selector
 FROM   tblDates
) DTS
ON  M.pkMainID = DTS.fkMainID
WHERE DTS.Selector = 1
--    AND DTS.dtOpen = '2011-05-11'
--    AND DTS.dtOpen = '1900-01-01'
;

-- SQL_2
SELECT DISTINCT szName
  ,M.pkID
  ,DTS.dtOpen
  ,DTS.dtClose
FROM tblMain M
LEFT JOIN 
(
 SELECT  Top 1
         pkMainID
        ,dtOpen 
        ,dtClose
        ,row_number() OVER(PARTITION BY pkMainID, dtOpen 
                           ORDER BY pkMainID, dtOpen) AS Selector
 FROM   tblDates
) DTS
ON  M.pkMainID = DTS.fkMainID
WHERE DTS.Selector = 1
--    AND DTS.dtOpen = '2011-05-11'
--    AND DTS.dtOpen = '1900-01-01'
;

SELECT DISTINCT szName
  ,M.pkID
  ,DTS.dtOpen
  ,DTS.dtClose
FROM tblMain M
LEFT JOIN 
(
 SELECT  pkMainID
        ,dtOpen 
        ,dtClose
        ,row_number() OVER(PARTITION BY pkMainID, dtClose 
                           ORDER BY pkMainID, dtClose) AS Selector
 FROM   tblDates
) DTS
ON  M.pkMainID = DTS.fkMainID
WHERE DTS.Selector = 1
--    AND DTS.dtOpen = '2011-05-11'
--    AND DTS.dtOpen = '1900-01-01'

Open in new window

Avatar of sej69
sej69

ASKER

With this app, the data will in inconsistent if the open and close are on the same dates as well as having the same open or same close date for the record.

These are events that start one day and end another, then some time later (And always later) the even will be reopened.

Also for the records, all I need is the ID of the record that contains the dates requested.  I don't care if the date is pulled in the initial query because I run another query if the user wants full detail on the record and clicks on it.  At that point the query goes out and pulls all dates for the listed pkMainId...

All I need is to search on the dates.

I read in a couple of other places about using HAVING but I'm "having" a hard time getting the syntax right for MSSQL 2008.  I keep getting an incorrect syntax near the keyword 'having'.
Avatar of sej69

ASKER

This is the line that is causing me issues:

ORDER BY Event DESC  HAVING  MAX(CD.dtOpened) = '2011-03-23'
It should be other way.
HAVING  MAX(CD.dtOpened) = '2011-03-23' ORDER BY Event DESC

Open in new window

Avatar of sej69

ASKER

Ok, I finally figured out what I was doing with the HAVING clause... I needed to group by all the selected columns in the query before the having...

However, the query duplicated the dates in the all the rows returned...  Every record has the MAX(CD.Opened) date of '2011-03-23' that I used in the query...
Avatar of sej69

ASKER

8080, I tried your number 1 but the query seems to have put the sql server management studio into limbo...(not responding) When I looked at the task manager the query was eating up memory really fast...
Can you provide some sample data with expected result?
>>When I looked at the task manager the query was eating up memory really fast...<<
That would probably mean that your table is not appropriately indexed.
Avatar of sej69

ASKER

This is my current query
SELECT DISTINCT 
nMatter Matter, szClaimant Claimant, szComments Comments, TATT.szInitials Attorney, CLT.szClientName Client, 
CLT.pkClientID ClientID, AOL.szName LawTypeName, fkAttorney, AOL.pkLawID Area, dtLoss DateOfLoss, 
szInsured Insured, szDestroyComment Destroy, MAX(CD.dtOpened) 
FROM 
tblCaseDates CD,(tblMatter 
LEFT JOIN tblAttorney as TATT on tblMatter.fkAttorney = TATT.pkAttorneyID) 
LEFT JOIN tblClient as CLT on tblMatter.fkClient = CLT.pkClientID 
LEFT JOIN tblAreaOfLaw as AOL on tblMatter.fkAreaOfLaw = AOL.pkLawID  
WHERE  pkLawID = '2'  
GROUP BY 
nMatter, szClaimant, szComments, TATT.szInitials, CLT.szClientName, CLT.pkClientID, 
AOL.szName, fkAttorney, AOL.pkLawID, dtLoss, szInsured, szDestroyComment 
HAVING  MAX(CD.dtOpened) = '2011-03-23' 
ORDER BY Matter DESC 

Open in new window


Matter	Claimant	Comments	Attorney	Client	ClientID	LawTypeName	fkAttorney	Area	DateOfLoss	Insured	Destroy	dtOpened
77778	George	Sue vs. George	AAA	Max	1234	Thisone	1818	2	2007-02-13	Sue	123	2011-03-23
55555	Frontmall	Nation Insurance      	BBB	Hanson	121	Thisone	1802	2	1900-01-01	Sue Smith	02-19-20	2011-03-23

Open in new window


I put the dtOpened up to see what was being returned.  The dtOpened is different for both records but yet it shows the same when the query is run.

I don't care about seeing dtOpened or dtClosed in the query results; I just want to search to find an event that had that date to include it in my results list.
Avatar of sej69

ASKER

I've got indexes on all fields in the main table, and on anything that is searched in the other tables (which is about everything...)
Uh . . . that query bears NO resemblence to the information in the original post.

Also, you appear to have a cartesian join between tblCaseDates and the rest of the tables involved in the query.  That would explain how this query would consume a lot fo memory.

Is this the query you really want debugged?  I

f so, since it does not resemble your original post:
What are you trying to accomplish with this query?
how does the tblCaseDates relate to at least oneof the other tables? (I am anticipating that it would be best to have it relate to the tblMatter table. ;-)
What is this query doing or not doing that is the problem?
>>you appear to have a cartesian join between tblCaseDates and the rest of the tables involved in the query<<
Yep.  Not to mention
The pointless use of DISTINCT when there is already a GROUP BY clause.
The LEFT JOIN tblAreaOfLaw AS AOL which is an implicit INNER JOIN as the following is in the WHERE clause: pkLawID = '2'
Avatar of sej69

ASKER

Yes, I was trying to keep the information as confidential as possible.  Right now, I just need to get it to work so I posted the whole thing.  I'm a very good programmer that is just 'ok' with SQL queries (but learning).  I like to add things to the search engine but I probably will not select this now when I close this case.  I have legal requirements that I need to follow on this side non-programming related... =/  

The issue I am having is the same as stated above.

tblCaseDates
 pkDateID
 fkMatterID

Where tblCaseDates.fkMatterID  = tblMatter.nMatter

Again, all I need to do is be able to query the dtOpened and dtClosed for specific dates; I don't need to display that data as I make an implicit query for that data later on if the user wants to look at the record.

In otherwords, if record has a date (dtOpened = 'xx/yy/zzz') in it's table and with the FK = to the matter ID then I want the record to show.  

The DISTINCT is "legacy" when I was experimenting with the effects.  The group by was just put in last night.

What do you mean by "implicit INNER JOIN"?  I have a feeling that shouldn't be the case.....

Thanks guys.
ASKER CERTIFIED SOLUTION
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sej69

ASKER

Ah, I see.  Yes, that is probably what I meant; I didn't know that syntax - thanks.

Any ideas on how I can get the data out in the format I want?
Please post your current query and the schema for all the tables involved.
Avatar of sej69

ASKER

THanks