Pure SQL Question (expert :-)

Hello,
We have a table of events. Each event has a date, is associated with an object, and has a type. So the table is like :
eventsTable (eventUniqueKey, objectID, EventDate, eventType)
We want a select that lists all the objectID that have an event of eventType 'X' without an eventType 'Y' after.
To make things clear :
eventUniqueKey / objectID / EventDate / eventType
1 / 1 / 2016-01-01 / X
2 / 1 / 2016-01-02 / Y
In this case, object 1 has X and an Y after, so it must not be pulled out.
1 / 1 / 2016-01-01 / X
2 / 2 / 2016-01-02 / Y
In this case, objet 1 has and X event but no Y event (Y event is to object 2), so objectID 1 must be pulled out, and not objectID 2.
1 / 1 / 2016-01-02 / X
2 / 1 / 2016-01-01 / Y
In this case, object 1 has X but the Y is before, so objectID 1 must be pulled out.
We ask for a 'pure' SQL that does that, we did it with temp table and scalar function, etc. but it is pretty slow.
Thanks for help
javilmer2Asked:
Who is Participating?
 
zephyr_hex (Megan)Connect With a Mentor DeveloperCommented:
I assume we're only seeing a small subset of your data, and that other scenarios may exist... which could throw a wrench into answers you receive.

In any event, this is one way to get the rows you've identified in your 3 scenarios:

Given the following data set in #tmp_table:
data
Rows 1 and 6 are a valid pair.  Rows 2 and 4 are invalid X events.

SELECT * FROM (
	SELECT ROW_NUMBER() OVER(PARTITION BY o_id ORDER BY e_date ASC) AS rn_asc, 
	ROW_NUMBER() OVER(PARTITION BY o_id ORDER BY e_date DESC) AS rn_desc,
	*
	FROM #tmp_table
	) u
WHERE (u.rn_asc = u.rn_desc AND e_type = 'X') OR (rn_asc = 2 AND e_type = 'X')

Open in new window


The basic premise is to assign row numbers based on the original id and date ASC and then also DESC, and then look for the invalid scenarios:  where the row number for ASC and DESC is the same on an X event, or where the X event occurs before the Y event.

Result:
result
0
 
Megan BrooksConnect With a Mentor SQL Server ConsultantCommented:
I think you are looking for something like this:
SELECT *
FROM eventsTable et1
WHERE EventType = 'X'
AND NOT EXISTS (
    SELECT * FROM eventsTable et2
    WHERE et1.objectID = et2.objectID
        AND et1.EventDate < et2.EventDate
        AND et2.EventType = 'Y'
)

Open in new window


This isn't tested (supplying a bit of formatted test data would help with that).
0
 
zephyr_hex (Megan)DeveloperCommented:
@Megan (hey!  that's my real name, too ;) )

You can use my test data set:

SELECT 1 AS u_id, 1 AS o_id, CAST('2016-01-01' AS DATE) AS e_date, 'X' AS e_type
INTO #tmp_table

INSERT INTO #tmp_table VALUES (2,1,CAST('2016-01-02' AS DATE),'Y'), (1,3,CAST('2016-01-01' AS DATE), 'X'),(2,4,CAST('2016-01-02' AS DATE), 'Y'),(1,5,CAST('2016-01-02' AS DATE), 'X'),(2,5,CAST('2016-01-01' AS DATE), 'Y')

Open in new window


With your query modified to match my lazily named columns:
SELECT *
FROM #tmp_table et1
WHERE e_type = 'X'
AND NOT EXISTS (
    SELECT * FROM #tmp_table et2
    WHERE et1.o_id = et2.o_id
        AND et1.e_date < et2.e_date
        AND et2.e_type = 'Y'
)

Open in new window


Also returns the rows I would expect to be identified as invalid:
result
0
What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

 
PortletPaulfreelancerCommented:
You have not specified what you want as the result of the query, but if you just want a summary you probably need to group by object_id, for example:
select x.objectID, max(x.EventDate) as max_EventDate
from eventsTable x
left join eventsTable y
  on x.objectID = y.objectID
 and x.EventDate < y.EventDate
 and y.eventType = 'Y' 
where x.eventType = 'X'
group by x.objectID
having count(y.objectID) = 0

Open in new window

try this slightly expanded test:
declare @eventsTable table
    (eventUniqueKey int, objectID int, EventDate datetime, eventType varchar(1))
;
    
INSERT INTO @eventsTable 
    ("eventUniqueKey", "objectID", "EventDate", "eventType")
VALUES
    (1, 11, '2016-01-01 00:00:00', 'X'),
    (2, 11, '2016-01-02 00:00:00', 'Y'),
    (3, 12, '2016-01-01 00:00:00', 'X'),
    (4, 12, '2016-01-02 00:00:00', 'Y'),
    (5, 13, '2016-01-02 00:00:00', 'X'),
    (6, 13, '2016-01-01 00:00:00', 'Y')
;

select x.objectID, max(x.EventDate) as max_EventDate
from @eventsTable x
left join @eventsTable y
  on x.objectID = y.objectID
 and x.EventDate < y.EventDate
 and y.eventType = 'Y' 
where x.eventType = 'X'
group by x.objectID
having count(y.objectID) = 0

Open in new window

test result
objectID max_EventDate 
-------- ------------- 
13       2016-01-02 00:00:00

Open in new window

0
 
PortletPaulfreelancerCommented:
apologies, think the previous sample set was wrong and so was the query/outcome.I think this corrects my errors.
declare @eventsTable table
    (eventUniqueKey int, objectID int, EventDate datetime, eventType varchar(1))
;
    
INSERT INTO @eventsTable 
    ("eventUniqueKey", "objectID", "EventDate", "eventType")
VALUES
    (1, 11, '2016-01-01 00:00:00', 'X'),
    (2, 11, '2016-01-02 00:00:00', 'Y'),
    (3, 12, '2016-01-01 00:00:00', 'X'),
    (4, 22, '2016-01-02 00:00:00', 'Y'),
    (5, 13, '2016-01-02 00:00:00', 'X'),
    (6, 13, '2016-01-01 00:00:00', 'Y')
;

select x.objectID, max(x.EventDate) as max_EventDate
from @eventsTable x
left join @eventsTable y
  on x.objectID = y.objectID
 and x.EventDate > y.EventDate
 and y.eventType = 'Y' 
where x.eventType = 'X'
group by x.objectID
having count(y.objectID) > 0

Open in new window

result
objectID max_EventDate 
-------- ------------- 
13       2016-01-02 00:00:00

Open in new window

i.e. ONLY the third test case returns a row
1
 
Pawan KumarDatabase ExpertCommented:
Most scalable solution , Enjoy :)

CREATE TABLE eventsTable
(
	eventUniqueKey int, objectID int, EventDate datetime, eventType varchar(1)
)
GO
    
INSERT INTO eventsTable 
    ("eventUniqueKey", "objectID", "EventDate", "eventType")
VALUES
    (1, 11, '2016-01-01 00:00:00', 'X'),
    (2, 11, '2016-01-02 00:00:00', 'Y'),

    (3, 12, '2016-01-01 00:00:00', 'X'),

    (4, 22, '2016-01-02 00:00:00', 'Y'),
    
	(5, 13, '2016-01-02 00:00:00', 'X'),
    (6, 13, '2016-01-01 00:00:00', 'Y')
;

SELECT tbl.objectid, tbl.eventdate FROM eventsTable tbl 
OUTER APPLY (SELECT MAX(tbl1.eventdate) eventdate FROM eventsTable tbl1 WHERE tbl1.eventtype='Y' AND tbl.objectid = tbl1.objectid) tbl2
WHERE tbl.eventtype='X' AND tbl.eventdate > tbl2.eventdate

--

Open in new window


--
0
 
PortletPaulfreelancerCommented:
Remarkable how a proposed solution can be "best" or "most" when we are all guessing what the full requirements are.

We also have no idea what the scale is or what indexing has been applied.

Personally I would ignore "sales speak" when considering the options provided.
1
 
zephyr_hex (Megan)DeveloperCommented:
I concur with PortletPaul.  I see nothing "scaleable" about Pawan's solution.  I'm not even sure why someone would claim their query is scaleable in this situation.
0
 
Pawan KumarDatabase ExpertCommented:
@Author - Could you please verify the answers and mark one as accepted solution if you dont need more help on this?
0
 
zephyr_hex (Megan)DeveloperCommented:
I object.  There is nothing in Pawan's suggested solution that makes it better than the other proposed solutions.  In fact, I don't even get the right results from Pawan's query.

Going back to the OP's request:  
We want a select that lists all the objectID that have an event of eventType 'X' without an eventType 'Y'

Using Pawan's code with the test data provided by the OP:

SELECT 1 AS u_id, 1 AS o_id, CAST('2016-01-01' AS DATE) AS e_date, 'X' AS e_type
INTO #tmp_table

INSERT INTO #tmp_table VALUES (2,1,CAST('2016-01-02' AS DATE),'Y'), (1,3,CAST('2016-01-01' AS DATE), 'X'),(2,4,CAST('2016-01-02' AS DATE), 'Y'),(1,5,CAST('2016-01-02' AS DATE), 'X'),(2,5,CAST('2016-01-01' AS DATE), 'Y')


SELECT tbl.o_id, tbl.e_date FROM #tmp_table tbl 
OUTER APPLY (SELECT MAX(tbl1.e_date) e_date FROM #tmp_table tbl1 WHERE tbl1.e_type='Y' AND tbl.o_id = tbl1.o_id) tbl2
WHERE tbl.e_type='X' AND tbl.e_date > tbl2.e_date

Open in new window


Yields:
result
But the expected result is these TWO records because they both are event type X with no corresponding Y event.
expected result
PortletPaul's solution is also incorrect.  I get the same result PortletPaul shows in his answer, but it's not the right answer.  It's missing the other type X record with no corresponding Y.

As such, I counter the close request with the following proposal:

#41798923 - 250 points
#41798925 - 250 points

These two answers provide the solution the OP requested.
0
 
Pawan KumarDatabase ExpertCommented:
zephyr_hex - The author has not responded. You think our answer is incorrect. It may also possible that your answer is incorrect.  

Only Author can decide on that, what he needs. We waited for 14 days.

In thin case lets close this question with awarding 0 points to all.
0
 
zephyr_hex (Megan)DeveloperCommented:
Except, the OP DID give us what the correct / expected result should be in the question. So we know what result is correct and what result is not correct for the data set the OP gave us.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.