Avatar of mike1142
mike1142
 asked on

Last Row Within A Row

Such good luck I will try another.

I have a bunch of People and each person can have 0 to many visits and each visit can have 0 to many tasks.
Table 1 looks like this
Col1    	Col 2	
personID	Name	etc.

Table 2 looks like
Col 1   	Col 2   	Col 3
personID	VisitID 	VisitDate	etc

Table 3
Col 1  	Col 2    	Col 3   	Col 4   	Col 5	Col 6
VisitID	TaskSeqNum	TaskDate	TaskTime	Value1	Value2

Open in new window

I want the last TaskSeqNum for the last visitID for a set of personID's where value 1 and value 2 are not null (they can still have value 3 and value 4 but I don't care) and the TaskDate of the TaskSeqNum cannot be more than a year from a given date range. Also value 1 and 2 are integers and Value 1 must be less than X AND value 2 must be less then Y. If they don't have a task I don't want them. They will almost assuredly have a visit but if they don't I don't want them.

You cannot make the assumption that any of the keys will be sequential but they should be (personID VisitID TaskSeqNum are varchar)

I need a T-SQL query that will do this. I have seen a potential solution using ROW_NUMBER and RANKING but I don't understand it. I don't want to muddy the waters so I will not post the potential.

Thanks
Microsoft SQL Server 2008SQL

Avatar of undefined
Last Comment
mike1142

8/22/2022 - Mon
Olaf Doschke

Chop a problem into small parts and it'S easier to solve.

First the last visit of a all persons can be determined by

Select personid, max(visitdate) as lastvisitdate from table2 group by personid

Knowing that date with person the visit id of that visit is

Select lv.personid, table2.visitid from table2 inner join
(Select personid, max(visitdate) as lastvisitdate from table2 group by personid) lv
on lv.personid = table2.personid and lv.lastvisitdate = table2.visitdate

lv here is short for last visit.

Next the last taskseqnum of a visitid is a partial problem similar to the last visit date, only this time you have split date and time. A bad design, if you ask me, To be able to determine max dateteim we need to combine Taskdate and Tasktime to a datetime value, and whether and how to combine them depends on their type, which isn't given. If you tell us, wht those fields are, we can surely write an expression for that. The following query assumes a single field taskdatetime, which you would need to replace with the expression:

Select visitid, max(taskdatetime) as lasttaskdatetime from table3 group by visitid

We get the taskseqnum of that last task datetime the same way we got the visitid of the last visit:

Select lt.visitid, table3.taskeseqnum from table3 inner join
(Select visitid, max(taskdatetime) as lasttaskdatetime from table3 group by visitid) lt
on lt.visitid = table3.visitid and lt.lasttaskdatetime = table3.lasttaskdatetime

lt as in last task.

You also had conditions for this, we add them in the inner query:

Select lt.visitid, table3.taskeseqnum from table3 inner join
(Select visitid, max(taskdatetime) as lasttaskdatetime from table3 where not value1 is null and not value2 is null and value1<x and value2<y group by visitid) lt
on lt.visitid = table3.visitid and lt.lasttaskdatetime = table3.lasttaskdatetime

Now we need to bring these two queries together as inner join on their visitids:

Select lvop.personid, ltov.taskseqnum from
(Select lv.personid, table2.visitid from table2 inner join
(Select personid, max(visitdate) as lastvisitdate from table2 group by personid) lv
on lv.personid = table2.personid and lv.lastvisitdate = table2.visitdate) lvop
inner join
(Select lt.visitid, table3.taskeseqnum from table3 inner join
(Select visitid, max(taskdatetime) as lasttaskdatetime from table3 where not value1 is null and not value2 is null and value1<x and value2<y group by visitid) lt
on lt.visitid = table3.visitid and lt.lasttaskdatetime = table3.lasttaskdatetime) ltov
on lvop.visitid = ltov.visitid

Open in new window

(lvop: last visit of person, ltov: last task of visit)

Without any data it's hard to judge, if this will give you what you want. Also I can't tell an expression to make a datetime from taskdate and tasktime, not knowing their field types. If any visit is on a single date only, you could take just the tasktime, perhaps.

Bye, Olaf.
SOLUTION
Aaron Tomosky

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
mike1142

ASKER
OK, so I tried the fiddle thing and it would not give me a link. Then when I went back it would not show my "fiddle history". There was an error in the inserts but I could not find it.

Here is the schema (I saved it to a test file thankfully)

CREATE TABLE [dbo].[PersonAccounts](
	[DatabaseID] [varchar](3) NOT NULL,
	[PersonID] [varchar](30) NULL,
	[AccountID] [varchar](44) NOT NULL,
	[AccountNumber] [varchar](12) NULL,
	[Name] [varchar](30) NULL,
 CONSTRAINT [pk_201] PRIMARY KEY CLUSTERED 
(
	[DatabaseID] ASC,
	[AccountID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[PersonVisits](
	[DatabaseID] [varchar](3) NOT NULL,
	[PersonID] [varchar](25) NOT NULL,
	[VisitID] [varchar](45) NOT NULL,
	[VisitDate] [varchar](12) NULL,
	[VisitSeqID] [varchar](18) NULL,
 CONSTRAINT [mtpk_cs551919] PRIMARY KEY CLUSTERED 
(
	[DatabaseID] ASC,
	[PersonID] ASC,
	[VisitID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[PersonTaskSets](
	[DatabaseID] [varchar](3) NOT NULL,
	[VisSeqID] [varchar](12) NOT NULL,
	[TaskSetSeqID] [varchar](8) NOT NULL,
	[Value1] [decimal](20, 7) NULL,
	[Value2] [int] NULL,
	[Value3] [int] NULL,
	[Value4] [decimal](20, 7) NULL,
	CONSTRAINT [mtpk_cs551963] PRIMARY KEY CLUSTERED 
(
	[DatabaseID] ASC,
	[VisSeqID] ASC,
	[TaskSetSeqID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Open in new window


And here is some sample data and a select that should link every thing together.

INSERT INTO [dbo].[PersonAccounts]
(DatabaseID,PersonID,AccountID,AccountNumber,Name)
VALUES
('XXX','B123456','AA12346','D12321','Sally,Silly'),
('XXX','B123788','AA46546','D44455','Mike,Manic')

GO

INSERT INTO [dbo].[PersonVisits]
([DatabaseID],[PersonID],[VisitID],[VisitDate],[VisitSeqID])
VALUES
('XXX','B123456','SOME|SILLY|THING|FROM|LEGACY|SYSTEM','20130521','123465789'),
('XXX','B123456','SOME|SILLY|THING|FROM|LEGACY|SYSTEM','20131215','777444844')

GO

INSERT INTO [dbo].[PersonTaskSets]
([DatabaseID],[VisSeqID],[TaskSetSeqID],[Value1],[Value2],[Value3],[Value4])
VALUES
('XXX','123465789',1,NULL,300,45,NULL,NULL),
('XXX','123465789',2,120.555,NULL,NULL,NULL),
('XXX','777444844',1,NULL,55,12,NULL,NULL)

GO
	
SELECT * FROM
[dbo].[PersonAccounts] AS PA
INNER JOIN
[dbo].[PersonVisits] AS PV
ON PA.DatabaseID = PV.DatabaseID
AND PA.PersonID = PV.PersonID
INNER JOIN
[dbo].[PersonTaskSets] AS PTS
ON PA.DatabaseID = PTS.DatabaseID
AND PV.VisSeqID = PTS.VisSeqID

GO

Open in new window


I need Sally's last task which has a value2 and 3 (nulls in those columns i do not care about) within a year of a date range (suppose I should use the start date?) and BOTH value2 and value3 must be < 2 separate integers.

Hope this helps.
ASKER CERTIFIED SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Olaf Doschke

I don't see the TaskDate and TaskTime columns you mentioned in your question. If we shouldn't assume the IDs to be sequential, how should we then determine the last task of a last visit?

From the last data you posted the TaskSetSeqID could be used instead of the 'virtual' taskdatetime I used in my query.

Bye, Olaf.
Your help has saved me hundreds of hours of internet surfing.
fblack61
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Scott Pletcher

To me, taking it step by step makes the query harder to read (and likely processes less efficiently as well).

I didn't know what values you wanted for the conditions you specified earlier, so I just created the appropriate variables but didn't actually include WHERE conditions in the code.  ROW_NUMBER() really is extremely valuable here, though, so I used it as well.


DECLARE @given_date_range_begin datetime
DECLARE @given_date_range_end datetime
DECLARE @value1_cap int
DECLARE @value2_cap int

SELECT pa.AccountID, pa.AccountNumber,
    subquery.personID, subquery.VisSeqID, subquery.TaskSetSeqID, subquery.Value1, subquery.Value2
FROM (
    SELECT pv.personID, pts.VisSeqID, pts.TaskSetSeqID, pts.Value1, pts.Value2,
        ROW_NUMBER() OVER(PARTITION BY pv.personID ORDER BY pv.VisitDate DESC, pv.VisitSeqID DESC, pts.TaskSetSeqID DESC) AS row_num
    FROM dbo.PersonTaskSets pts
    INNER JOIN dbo.PersonVisits pv ON
        pv.VisitSeqID = pts.VisSeqID
    --WHERE
        /*
        p2.VisitDate >= DATEADD(YEAR, -1, @given_date_range_begin) AND
        p2.VisitDate <= DATEADD(YEAR, +1, @given_date_range_end) AND
        */
        /*
        pts.Value1 < @value1_cap AND
        pts.Value2 < @value2_cap
        */
) AS subquery
INNER JOIN dbo.PersonAccounts pa ON
    pa.PersonID = subquery.personID
WHERE
    subquery.row_num = 1
Mark Wills

Well, don't be scared of CTE. CTE stands for Common Table Expression. All that is really doing is giving a name to a (sub)query which can be reused by referring to its name. True, it has a great advantage in that it can be recursive, but it doesn't have to.

And the optimizer works really well with well formed CTE's

Which brings me to another couple of questions / observations...
1) Looking at your compound unique keys, I believe that some of those ID's might not be unique (e.g. mtpk_cs551919)
2) Looking at your *real* tables, we would like to verify what "last" means
3) Whilst happy with my earlier query, it does need some attention given the added information you have posted and think I need to include things like databaseID (and others per point 1) to make sure the correct rows are selected and indexes can be used effectively.

So, if you would mind casting your eye over the above points, then I can respond with a more optimised query.
mike1142

ASKER
CTEs are good I like them I think I understand most of this but I have another similar questions and I think ROW/RANK is part of the answer but I think I should ask it in another question.

I was chastised about the nested query solution because of "performance issues"

Thanks all. How do I get you guys to follow me to another questions if you are able?
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Aaron Tomosky

Post it in your closing remarks on this question.
mike1142

ASKER
Hope for taking so long. I hope I got everyone fairly.
Mark Wills

Well... Nearly... Pauls answer was a simple rehash of my code. And the subquery is basically the same as the CTE without using the CTE.

So, was wondering what the problem was ?
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
PortletPaul

I agree, my contribution - which clearly identifies Mark's work as input - merely offered an alternative without CTE, and some updates to the fields that emerged.
Mark Wills

Well, that is entirely the prerogative and privilege of the Asker as to re-opening or not.

I know my query works fine (for the original conditions)
I know there was some concerns back at the office with being "chastised about the nested query"
I know that the original requirement was subsequently superseded, and I was awaiting further clarification.
I know that queries posted after the example are probably wrong because of the keys and uniquely identifying rows  (as evidenced by including databaseID in the Askers own example, and should be part of the partition and absent in subsequent posts)

What I don't know is what I could have done differently to have my work at least considered or in contention considering the row_number() approach is pivotal in other solutions.

I still believe there is some fine tuning and optimisation needed and was a little surprised with how it was closed and believed there is more work needed to do the problem justice. But, as I said, that is the prerogative of the Asker and possibly convenient due to possible lock out..
mike1142

ASKER
Wow, I am really sorry. There were definite major contributors and I thought I covered that well but to be honest I just came back to this question to look at PortletPaul's solution and implement it in my final production code. It was the easiest for me to understand and to duplicate. There was no further information needed as I was successful in implementing this in a test environment.

Please tell me what I need to do or where you thought, given this additional information, I went wrong.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Mark Wills

No problems. Now I know...
mike1142

ASKER
Mark, Paul and Admin

The point distribution is indeed incorrect. I think I need to do it again.
mike1142

ASKER
OK, I hope this is OK now. I used the CTE solution originally suggested by Mark but Paul's was marked best because of clarity and will be easiest to reference in the future.

When doing the points the site does an odd thing every time you mark a comment.  It  jumps to the bottom of the page to "Grade The Solution" and I lose my place. I will try and be more careful. Thank you gentlemen.
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck