asiminator
asked on
simple query question
Say i have a table as follows:
create table test (x date);
test
____
2008-10-20 15:15:15
2008-10-20 15:15:16
2008-10-20 15:15:17
2008-10-20 15:15:18
2008-10-20 15:15:35
2008-10-20 15:15:36
2008-10-20 15:15:37
2008-10-20 15:15:38
2008-10-20 15:16:15
2008-10-20 15:16:16
2008-10-20 15:16:17
2008-10-20 15:16:18
2008-10-20 15:16:19
so there are 3 distinct periods... what i want is a query that will return the following:
sequence, start, end
1, 2008-10-20 15:15:15, 2008-10-20 15:15:18
2, 2008-10-20 15:15:35, 2008-10-20 15:15:38
3, 2008-10-20 15:16:15, 2008-10-20 15:16:19
the break between continuous records should be a number of seconds that i an choose.
Thanks in advance.
create table test (x date);
test
____
2008-10-20 15:15:15
2008-10-20 15:15:16
2008-10-20 15:15:17
2008-10-20 15:15:18
2008-10-20 15:15:35
2008-10-20 15:15:36
2008-10-20 15:15:37
2008-10-20 15:15:38
2008-10-20 15:16:15
2008-10-20 15:16:16
2008-10-20 15:16:17
2008-10-20 15:16:18
2008-10-20 15:16:19
so there are 3 distinct periods... what i want is a query that will return the following:
sequence, start, end
1, 2008-10-20 15:15:15, 2008-10-20 15:15:18
2, 2008-10-20 15:15:35, 2008-10-20 15:15:38
3, 2008-10-20 15:16:15, 2008-10-20 15:16:19
the break between continuous records should be a number of seconds that i an choose.
Thanks in advance.
not quite sure on the part - 'the break between continuous records should be a number of seconds that i an choose.'
can you give some more details?
try this for the 3 distinct periods
can you give some more details?
try this for the 3 distinct periods
select s.num, t.x, s.x from (
select row_number() over (order by x) num, x from (
select x from test
where not exists (
select 1 from (select x - 1/60/60/24 as y from test)
where x = y
)
order by x
)
) s,
(
select row_number() over (order by x) num, x from (
select x from test
where not exists (
select 1 from (select x + 1/60/60/24 as y from test)
where x = y
)
order by x
)
) t
where s.num = t.num;
See attached.
distinct-periods.txt
distinct-periods.txt
try these.... They should be very efficient (albeit new syntax for a lot of people) since they only need to query the data one time
SELECT DISTINCT DENSE_RANK() OVER (ORDER BY minx) seq, minx, MAX(x) OVER (PARTITION BY minx) maxx
FROM (SELECT *
FROM (SELECT x, TO_DATE(NULL) minx, TO_DATE(NULL) maxx, ROW_NUMBER() OVER (ORDER BY x) rn
FROM test)
MODEL
DIMENSION BY(rn)
MEASURES(x, minx)
RULES UPDATE AUTOMATIC ORDER
(minx [ANY]
ORDER BY x =
CASE
WHEN x[CV(rn)] <= x[CV(rn) - 1] + 1 / 86400 THEN minx[CV(rn) - 1]
ELSE x[CV(rn)]
END))
.....
SELECT ROWNUM seq, minx, maxx
FROM ( SELECT minx, MAX(x) maxx
FROM (SELECT *
FROM (SELECT x,
TO_DATE(NULL) minx,
TO_DATE(NULL) maxx,
ROW_NUMBER() OVER (ORDER BY x) rn
FROM test)
MODEL
DIMENSION BY(rn)
MEASURES(x, minx)
RULES UPDATE AUTOMATIC ORDER
(minx [ANY]
ORDER BY x =
CASE
WHEN x[CV(rn)] <= x[CV(rn) - 1] + 1 / 86400
THEN
minx[CV(rn) - 1]
ELSE
x[CV(rn)]
END))
GROUP BY minx)
if you want to adjust the threshold for neighboring data, change the
1/86400 to N/86400 where N is whatever threshold, in seconds, you want
1/86400 to N/86400 where N is whatever threshold, in seconds, you want
you can remove the to_date(null)maxx, that's a carryover from an earlier test
ASKER
im checking the solutiions and ill come back soon
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
jamesgu, very nice!
More efficient than any proposed thus far
More efficient than any proposed thus far
by efficient, I don't just mean smaller code,
but actually doing less work to get the same output.
jamesgu's version performs fewer io (same as mine) and fewer sorts (less than anybody else's)
but actually doing less work to get the same output.
jamesgu's version performs fewer io (same as mine) and fewer sorts (less than anybody else's)
ASKER
Thanks to all that answered but this was by far the best solution. Thank you all for yor efforts
thanks sdstuber
and thanks asiminator, it was an interesting question (but not simple as in the title ;)
and thanks asiminator, it was an interesting question (but not simple as in the title ;)
oops!!!
Found a bug. jamesgu's version doesn't handle isolated rows correctly.
for instance
given the above data in table test, add these two rows....
insert into test (x) values (sysdate);
insert into test (x) values (sysdate+1);
I still like the cleverness in jamesgu's version, but I'm afraid I'll have to retract my recommendation and suggest my own again (post 22842370)
Fast is good but correct is better. :)
Found a bug. jamesgu's version doesn't handle isolated rows correctly.
for instance
given the above data in table test, add these two rows....
insert into test (x) values (sysdate);
insert into test (x) values (sysdate+1);
I still like the cleverness in jamesgu's version, but I'm afraid I'll have to retract my recommendation and suggest my own again (post 22842370)
Fast is good but correct is better. :)
also, if it's possible that X might have non-unique values then I suggest using this...
SELECT DISTINCT DENSE_RANK() OVER (ORDER BY minx) seq, minx, MAX(x) OVER (PARTITION BY minx) maxx
FROM (SELECT *
FROM (SELECT x, TO_DATE(NULL) minx, ROW_NUMBER() OVER (ORDER BY x) rn FROM test)
MODEL
DIMENSION BY(rn)
MEASURES(x, minx)
RULES UPDATE AUTOMATIC ORDER
(minx [ANY]
ORDER BY rn =
CASE
WHEN x[CV(rn)] <= x[CV(rn) - 1] + 1 / 86400 THEN minx[CV(rn) - 1]
ELSE x[CV(rn)]
END))
ORDER BY 1;
asiminator
I also suggest clicking the "Request Attention" link above to have the admins reopen the question and we can all hammer out the best solution.
I also suggest clicking the "Request Attention" link above to have the admins reopen the question and we can all hammer out the best solution.
sdstuber, good catch,
more code has to be added to handle this, this is the partial code,
more code has to be added to handle this, this is the partial code,
select row_number() over (order by x) num, t.x, case when (next - x > 1/60/60/24 and x - previous > 1/60/60/24)
or (next is null and x - previous > 1/60/60/24)
or (next - x > 1/60/60/24 and previous is null)
then 1
else 0 end isolated
from (
select x, lead(x) over (order by x) next, lag(x) over (order by x) previous
from test
group by x
) t
where (next - x > 1/60/60/24 or next is null)
or (x - previous > 1/60/60/24 or previous is null)
I have submitted a request for attention for this. I don't know if the question will be reopened though, since the asker really should be the one to do so, but with a demonstrably incorrect answer accepted, hopefully this is a valid exception to the norm.
Then mine also still works :-)
yes, sorry, I should have stated yours was correct too.
I was recommending mine over yours simply because it was more efficient (less io)
so it will scale better on realistic datasets.
I was recommending mine over yours simply because it was more efficient (less io)
so it will scale better on realistic datasets.
ASKER
Hey...
For my case this query is giving the results expected. There is a unique key on the date and i never get one value at a time... they always come in groups.
Im hesitant to reopen the question when my need filled
Asim
For my case this query is giving the results expected. There is a unique key on the date and i never get one value at a time... they always come in groups.
Im hesitant to reopen the question when my need filled
Asim
then do not.
having explained the reasons why the holes in jamesgu's query can't happen, I will re-reverse my suggestions and again support jamesgu's post you already accepted.
I will add one caveat. "they always come in groups" , how can you enforce that?
and even if they do come in groups. Do they always come in groups of 1 second separation?
If not, what happens if you get
2008-10-21 15:16:19
2008-10-21 15:16:22
2008-10-21 15:16:25
You might consider those a group but if you pick a threshold of 1 second (like in the original question) they will fail with jamesgu's query.
Or are there other contraints we aren't aware of to prevent such a thing?
having explained the reasons why the holes in jamesgu's query can't happen, I will re-reverse my suggestions and again support jamesgu's post you already accepted.
I will add one caveat. "they always come in groups" , how can you enforce that?
and even if they do come in groups. Do they always come in groups of 1 second separation?
If not, what happens if you get
2008-10-21 15:16:19
2008-10-21 15:16:22
2008-10-21 15:16:25
You might consider those a group but if you pick a threshold of 1 second (like in the original question) they will fail with jamesgu's query.
Or are there other contraints we aren't aware of to prevent such a thing?
ASKER
You are right... there are a lot of other constraints. i simplified my question so as not to confuse the matter. I hate reading long winded complex questions on here. this is part of a huge table in a huge database which gets data from a custom measurement system. Plus the data is being scrubbed before it is loaded.
Thanks for all your help.
Thanks for all your help.
glad you got your answer, and thank you for returning to explain the exceptions that make the accepted answer correct.
that will help future readers of this PAQ and hopefully the other queries posted will help others too.
that will help future readers of this PAQ and hopefully the other queries posted will help others too.
Open in new window