We help IT Professionals succeed at work.

Last Row Within A Row

357 Views
Last Modified: 2014-06-03
Such good luck I will try another.

I have a bunch of People and each person can have 0 to many visits and each visit can have 0 to many tasks.
Table 1 looks like this
Col1    	Col 2	
personID	Name	etc.

Table 2 looks like
Col 1   	Col 2   	Col 3
personID	VisitID 	VisitDate	etc

Table 3
Col 1  	Col 2    	Col 3   	Col 4   	Col 5	Col 6
VisitID	TaskSeqNum	TaskDate	TaskTime	Value1	Value2

Open in new window

I want the last TaskSeqNum for the last visitID for a set of personID's where value 1 and value 2 are not null (they can still have value 3 and value 4 but I don't care) and the TaskDate of the TaskSeqNum cannot be more than a year from a given date range. Also value 1 and 2 are integers and Value 1 must be less than X AND value 2 must be less then Y. If they don't have a task I don't want them. They will almost assuredly have a visit but if they don't I don't want them.

You cannot make the assumption that any of the keys will be sequential but they should be (personID VisitID TaskSeqNum are varchar)

I need a T-SQL query that will do this. I have seen a potential solution using ROW_NUMBER and RANKING but I don't understand it. I don't want to muddy the waters so I will not post the potential.

Thanks
Comment
Watch Question

Olaf DoschkeSoftware Developer
CERTIFIED EXPERT

Commented:
Chop a problem into small parts and it'S easier to solve.

First the last visit of a all persons can be determined by

Select personid, max(visitdate) as lastvisitdate from table2 group by personid

Knowing that date with person the visit id of that visit is

Select lv.personid, table2.visitid from table2 inner join
(Select personid, max(visitdate) as lastvisitdate from table2 group by personid) lv
on lv.personid = table2.personid and lv.lastvisitdate = table2.visitdate

lv here is short for last visit.

Next the last taskseqnum of a visitid is a partial problem similar to the last visit date, only this time you have split date and time. A bad design, if you ask me, To be able to determine max dateteim we need to combine Taskdate and Tasktime to a datetime value, and whether and how to combine them depends on their type, which isn't given. If you tell us, wht those fields are, we can surely write an expression for that. The following query assumes a single field taskdatetime, which you would need to replace with the expression:

Select visitid, max(taskdatetime) as lasttaskdatetime from table3 group by visitid

We get the taskseqnum of that last task datetime the same way we got the visitid of the last visit:

Select lt.visitid, table3.taskeseqnum from table3 inner join
(Select visitid, max(taskdatetime) as lasttaskdatetime from table3 group by visitid) lt
on lt.visitid = table3.visitid and lt.lasttaskdatetime = table3.lasttaskdatetime

lt as in last task.

You also had conditions for this, we add them in the inner query:

Select lt.visitid, table3.taskeseqnum from table3 inner join
(Select visitid, max(taskdatetime) as lasttaskdatetime from table3 where not value1 is null and not value2 is null and value1<x and value2<y group by visitid) lt
on lt.visitid = table3.visitid and lt.lasttaskdatetime = table3.lasttaskdatetime

Now we need to bring these two queries together as inner join on their visitids:

Select lvop.personid, ltov.taskseqnum from
(Select lv.personid, table2.visitid from table2 inner join
(Select personid, max(visitdate) as lastvisitdate from table2 group by personid) lv
on lv.personid = table2.personid and lv.lastvisitdate = table2.visitdate) lvop
inner join
(Select lt.visitid, table3.taskeseqnum from table3 inner join
(Select visitid, max(taskdatetime) as lasttaskdatetime from table3 where not value1 is null and not value2 is null and value1<x and value2<y group by visitid) lt
on lt.visitid = table3.visitid and lt.lasttaskdatetime = table3.lasttaskdatetime) ltov
on lvop.visitid = ltov.visitid

Open in new window

(lvop: last visit of person, ltov: last task of visit)

Without any data it's hard to judge, if this will give you what you want. Also I can't tell an expression to make a datetime from taskdate and tasktime, not knowing their field types. If any visit is on a single date only, you could take just the tasktime, perhaps.

Bye, Olaf.
Aaron TomoskyDirector, SD-WAN Solutions
CERTIFIED EXPERT
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Mark WillsTopic Advisor, Page Editor
CERTIFIED EXPERT
Distinguished Expert 2018
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Author

Commented:
OK, so I tried the fiddle thing and it would not give me a link. Then when I went back it would not show my "fiddle history". There was an error in the inserts but I could not find it.

Here is the schema (I saved it to a test file thankfully)

CREATE TABLE [dbo].[PersonAccounts](
	[DatabaseID] [varchar](3) NOT NULL,
	[PersonID] [varchar](30) NULL,
	[AccountID] [varchar](44) NOT NULL,
	[AccountNumber] [varchar](12) NULL,
	[Name] [varchar](30) NULL,
 CONSTRAINT [pk_201] PRIMARY KEY CLUSTERED 
(
	[DatabaseID] ASC,
	[AccountID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[PersonVisits](
	[DatabaseID] [varchar](3) NOT NULL,
	[PersonID] [varchar](25) NOT NULL,
	[VisitID] [varchar](45) NOT NULL,
	[VisitDate] [varchar](12) NULL,
	[VisitSeqID] [varchar](18) NULL,
 CONSTRAINT [mtpk_cs551919] PRIMARY KEY CLUSTERED 
(
	[DatabaseID] ASC,
	[PersonID] ASC,
	[VisitID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[PersonTaskSets](
	[DatabaseID] [varchar](3) NOT NULL,
	[VisSeqID] [varchar](12) NOT NULL,
	[TaskSetSeqID] [varchar](8) NOT NULL,
	[Value1] [decimal](20, 7) NULL,
	[Value2] [int] NULL,
	[Value3] [int] NULL,
	[Value4] [decimal](20, 7) NULL,
	CONSTRAINT [mtpk_cs551963] PRIMARY KEY CLUSTERED 
(
	[DatabaseID] ASC,
	[VisSeqID] ASC,
	[TaskSetSeqID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Open in new window


And here is some sample data and a select that should link every thing together.

INSERT INTO [dbo].[PersonAccounts]
(DatabaseID,PersonID,AccountID,AccountNumber,Name)
VALUES
('XXX','B123456','AA12346','D12321','Sally,Silly'),
('XXX','B123788','AA46546','D44455','Mike,Manic')

GO

INSERT INTO [dbo].[PersonVisits]
([DatabaseID],[PersonID],[VisitID],[VisitDate],[VisitSeqID])
VALUES
('XXX','B123456','SOME|SILLY|THING|FROM|LEGACY|SYSTEM','20130521','123465789'),
('XXX','B123456','SOME|SILLY|THING|FROM|LEGACY|SYSTEM','20131215','777444844')

GO

INSERT INTO [dbo].[PersonTaskSets]
([DatabaseID],[VisSeqID],[TaskSetSeqID],[Value1],[Value2],[Value3],[Value4])
VALUES
('XXX','123465789',1,NULL,300,45,NULL,NULL),
('XXX','123465789',2,120.555,NULL,NULL,NULL),
('XXX','777444844',1,NULL,55,12,NULL,NULL)

GO
	
SELECT * FROM
[dbo].[PersonAccounts] AS PA
INNER JOIN
[dbo].[PersonVisits] AS PV
ON PA.DatabaseID = PV.DatabaseID
AND PA.PersonID = PV.PersonID
INNER JOIN
[dbo].[PersonTaskSets] AS PTS
ON PA.DatabaseID = PTS.DatabaseID
AND PV.VisSeqID = PTS.VisSeqID

GO

Open in new window


I need Sally's last task which has a value2 and 3 (nulls in those columns i do not care about) within a year of a date range (suppose I should use the start date?) and BOTH value2 and value3 must be < 2 separate integers.

Hope this helps.
EE Topic Advisor
CERTIFIED EXPERT
Most Valuable Expert 2014
Awarded 2013
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Olaf DoschkeSoftware Developer
CERTIFIED EXPERT

Commented:
I don't see the TaskDate and TaskTime columns you mentioned in your question. If we shouldn't assume the IDs to be sequential, how should we then determine the last task of a last visit?

From the last data you posted the TaskSetSeqID could be used instead of the 'virtual' taskdatetime I used in my query.

Bye, Olaf.
Olaf DoschkeSoftware Developer
CERTIFIED EXPERT
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Scott PletcherSenior DBA
CERTIFIED EXPERT
Most Valuable Expert 2018
Distinguished Expert 2019

Commented:
To me, taking it step by step makes the query harder to read (and likely processes less efficiently as well).

I didn't know what values you wanted for the conditions you specified earlier, so I just created the appropriate variables but didn't actually include WHERE conditions in the code.  ROW_NUMBER() really is extremely valuable here, though, so I used it as well.


DECLARE @given_date_range_begin datetime
DECLARE @given_date_range_end datetime
DECLARE @value1_cap int
DECLARE @value2_cap int

SELECT pa.AccountID, pa.AccountNumber,
    subquery.personID, subquery.VisSeqID, subquery.TaskSetSeqID, subquery.Value1, subquery.Value2
FROM (
    SELECT pv.personID, pts.VisSeqID, pts.TaskSetSeqID, pts.Value1, pts.Value2,
        ROW_NUMBER() OVER(PARTITION BY pv.personID ORDER BY pv.VisitDate DESC, pv.VisitSeqID DESC, pts.TaskSetSeqID DESC) AS row_num
    FROM dbo.PersonTaskSets pts
    INNER JOIN dbo.PersonVisits pv ON
        pv.VisitSeqID = pts.VisSeqID
    --WHERE
        /*
        p2.VisitDate >= DATEADD(YEAR, -1, @given_date_range_begin) AND
        p2.VisitDate <= DATEADD(YEAR, +1, @given_date_range_end) AND
        */
        /*
        pts.Value1 < @value1_cap AND
        pts.Value2 < @value2_cap
        */
) AS subquery
INNER JOIN dbo.PersonAccounts pa ON
    pa.PersonID = subquery.personID
WHERE
    subquery.row_num = 1
Mark WillsTopic Advisor, Page Editor
CERTIFIED EXPERT
Distinguished Expert 2018

Commented:
Well, don't be scared of CTE. CTE stands for Common Table Expression. All that is really doing is giving a name to a (sub)query which can be reused by referring to its name. True, it has a great advantage in that it can be recursive, but it doesn't have to.

And the optimizer works really well with well formed CTE's

Which brings me to another couple of questions / observations...
1) Looking at your compound unique keys, I believe that some of those ID's might not be unique (e.g. mtpk_cs551919)
2) Looking at your *real* tables, we would like to verify what "last" means
3) Whilst happy with my earlier query, it does need some attention given the added information you have posted and think I need to include things like databaseID (and others per point 1) to make sure the correct rows are selected and indexes can be used effectively.

So, if you would mind casting your eye over the above points, then I can respond with a more optimised query.

Author

Commented:
CTEs are good I like them I think I understand most of this but I have another similar questions and I think ROW/RANK is part of the answer but I think I should ask it in another question.

I was chastised about the nested query solution because of "performance issues"

Thanks all. How do I get you guys to follow me to another questions if you are able?
Aaron TomoskyDirector, SD-WAN Solutions
CERTIFIED EXPERT

Commented:
Post it in your closing remarks on this question.

Author

Commented:
Hope for taking so long. I hope I got everyone fairly.
Mark WillsTopic Advisor, Page Editor
CERTIFIED EXPERT
Distinguished Expert 2018

Commented:
Well... Nearly... Pauls answer was a simple rehash of my code. And the subquery is basically the same as the CTE without using the CTE.

So, was wondering what the problem was ?
PortletPaulEE Topic Advisor
CERTIFIED EXPERT
Most Valuable Expert 2014
Awarded 2013

Commented:
I agree, my contribution - which clearly identifies Mark's work as input - merely offered an alternative without CTE, and some updates to the fields that emerged.
Mark WillsTopic Advisor, Page Editor
CERTIFIED EXPERT
Distinguished Expert 2018

Commented:
Well, that is entirely the prerogative and privilege of the Asker as to re-opening or not.

I know my query works fine (for the original conditions)
I know there was some concerns back at the office with being "chastised about the nested query"
I know that the original requirement was subsequently superseded, and I was awaiting further clarification.
I know that queries posted after the example are probably wrong because of the keys and uniquely identifying rows  (as evidenced by including databaseID in the Askers own example, and should be part of the partition and absent in subsequent posts)

What I don't know is what I could have done differently to have my work at least considered or in contention considering the row_number() approach is pivotal in other solutions.

I still believe there is some fine tuning and optimisation needed and was a little surprised with how it was closed and believed there is more work needed to do the problem justice. But, as I said, that is the prerogative of the Asker and possibly convenient due to possible lock out..

Author

Commented:
Wow, I am really sorry. There were definite major contributors and I thought I covered that well but to be honest I just came back to this question to look at PortletPaul's solution and implement it in my final production code. It was the easiest for me to understand and to duplicate. There was no further information needed as I was successful in implementing this in a test environment.

Please tell me what I need to do or where you thought, given this additional information, I went wrong.
Mark WillsTopic Advisor, Page Editor
CERTIFIED EXPERT
Distinguished Expert 2018

Commented:
No problems. Now I know...

Author

Commented:
Mark, Paul and Admin

The point distribution is indeed incorrect. I think I need to do it again.

Author

Commented:
OK, I hope this is OK now. I used the CTE solution originally suggested by Mark but Paul's was marked best because of clarity and will be easiest to reference in the future.

When doing the points the site does an odd thing every time you mark a comment.  It  jumps to the bottom of the page to "Grade The Solution" and I lose my place. I will try and be more careful. Thank you gentlemen.
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.