• Status: Solved
• Priority: Medium
• Security: Public
• Views: 270

# simple query question

Say i have a table as follows:

create table test (x date);

test
____
2008-10-20 15:15:15
2008-10-20 15:15:16
2008-10-20 15:15:17
2008-10-20 15:15:18
2008-10-20 15:15:35
2008-10-20 15:15:36
2008-10-20 15:15:37
2008-10-20 15:15:38
2008-10-20 15:16:15
2008-10-20 15:16:16
2008-10-20 15:16:17
2008-10-20 15:16:18
2008-10-20 15:16:19

so there are 3 distinct periods... what i want is a query that will return the following:

sequence, start, end
1, 2008-10-20 15:15:15, 2008-10-20 15:15:18
2, 2008-10-20 15:15:35, 2008-10-20 15:15:38
3, 2008-10-20 15:16:15, 2008-10-20 15:16:19

the break between continuous records should be a number of seconds that i an choose.

0
asiminator
• 12
• 4
• 4
• +2
1 Solution

Commented:
The following query divides values in groups of 30 seconds.

``````select min(x), max(x)
from test
group by trunc(x,'mi') + case when to_number(to_char(x,'ss')) <= 30 then 30/60/60/24 else 60/60/60/24 end
order by 1;
``````
0

Commented:
not quite sure on the part - 'the break between continuous records should be a number of seconds that i an choose.'

can you give some more details?

try this for the 3 distinct periods
``````select s.num, t.x, s.x from (
select row_number() over (order by x) num, x from (
select x from test
where not exists (
select 1 from (select x - 1/60/60/24 as y from test)
where x = y
)
order by x
)
) s,
(
select row_number() over (order by x) num, x from (
select x from test
where not exists (
select 1 from (select x + 1/60/60/24 as y from test)
where x = y
)
order by x
)
) t
where s.num = t.num;
``````
0

Commented:
See attached.
distinct-periods.txt
0

Commented:
try these....  They should be very efficient (albeit new syntax for a lot of people) since they only need to query the data one time
``````SELECT DISTINCT DENSE_RANK() OVER (ORDER BY minx) seq, minx, MAX(x) OVER (PARTITION BY minx) maxx
FROM (SELECT *
FROM (SELECT x, TO_DATE(NULL) minx, TO_DATE(NULL) maxx, ROW_NUMBER() OVER (ORDER BY x) rn
FROM test)
MODEL
DIMENSION BY(rn)
MEASURES(x, minx)
RULES UPDATE AUTOMATIC ORDER
(minx [ANY]
ORDER BY x =
CASE
WHEN x[CV(rn)] <= x[CV(rn) - 1] + 1 / 86400 THEN minx[CV(rn) - 1]
ELSE x[CV(rn)]
END))

.....

SELECT ROWNUM seq, minx, maxx
FROM (  SELECT minx, MAX(x) maxx
FROM (SELECT *
FROM (SELECT x,
TO_DATE(NULL) minx,
TO_DATE(NULL) maxx,
ROW_NUMBER() OVER (ORDER BY x) rn
FROM test)
MODEL
DIMENSION BY(rn)
MEASURES(x, minx)
RULES UPDATE AUTOMATIC ORDER
(minx [ANY]
ORDER BY x =
CASE
WHEN x[CV(rn)] <= x[CV(rn) - 1] + 1 / 86400
THEN
minx[CV(rn) - 1]
ELSE
x[CV(rn)]
END))
GROUP BY minx)
``````
0

Commented:
if you want to adjust the threshold for neighboring data,  change the

1/86400 to N/86400 where N is whatever threshold, in seconds, you want
0

Commented:
you can remove the to_date(null)maxx,  that's a carryover from an earlier test
0

Author Commented:
im checking the solutiions and ill come back soon
0

Commented:
a simpler one
``````select ceil(num / 2), min(x), max(x)
from (
select row_number() over (order by x) num, t.x from (
select x, lead(x) over (order by x) next, lag(x) over (order by x) previous
from test
) t
where (next - x > 1/60/60/24 or next is null)
or (x - previous > 1/60/60/24 or previous is null)
) TEMP
group by ceil(num / 2)
;
``````
0

Commented:
jamesgu,  very nice!

More efficient than any proposed thus far
0

Commented:
by efficient, I don't just mean smaller code,
but actually doing less work to get the same output.

jamesgu's version performs fewer io (same as mine) and fewer sorts (less than anybody else's)
0

Author Commented:
Thanks to all that answered but this was by far the best solution.  Thank you all for yor efforts
0

Commented:
thanks sdstuber

and thanks asiminator, it was an interesting question (but not simple as in the title ;)
0

Commented:
oops!!!

Found a bug.  jamesgu's version doesn't handle isolated rows correctly.

for instance

given the above data in table test,  add these two rows....

insert into test (x) values (sysdate);
insert into test (x) values (sysdate+1);

I still like the cleverness in jamesgu's version, but I'm afraid I'll have to retract my recommendation and suggest my own again (post 22842370)

Fast is good but  correct is better.  :)
0

Commented:
also,  if it's possible that X might have non-unique values then I suggest using this...

``````  SELECT DISTINCT DENSE_RANK() OVER (ORDER BY minx) seq, minx, MAX(x) OVER (PARTITION BY minx) maxx
FROM (SELECT *
FROM (SELECT x, TO_DATE(NULL) minx, ROW_NUMBER() OVER (ORDER BY x) rn FROM test)
MODEL
DIMENSION BY(rn)
MEASURES(x, minx)
RULES UPDATE AUTOMATIC ORDER
(minx [ANY]
ORDER BY rn =
CASE
WHEN x[CV(rn)] <= x[CV(rn) - 1] + 1 / 86400 THEN minx[CV(rn) - 1]
ELSE x[CV(rn)]
END))
ORDER BY 1;
``````
0

Commented:
asiminator

I also suggest clicking the "Request Attention" link above to have the admins reopen the question and we can all hammer out the best solution.
0

Commented:
sdstuber, good catch,

more code has to be added to handle this, this is the partial code,
``````	select row_number() over (order by x) num, t.x, case when (next - x > 1/60/60/24 and x - previous > 1/60/60/24)
or (next is null and x - previous > 1/60/60/24)
or (next - x > 1/60/60/24 and previous is null)
then 1
else 0 end isolated
from (
select x, lead(x) over (order by x) next, lag(x) over (order by x) previous
from test
group by x
) t
where (next - x > 1/60/60/24 or next is null)
or (x - previous > 1/60/60/24 or previous is null)
``````
0

Commented:
I have submitted a request for attention for this.  I don't know if the question will be reopened though, since the asker really should be the one to do so, but with a demonstrably incorrect answer accepted, hopefully this is a valid exception to the norm.
0

Commented:
Then mine also still works :-)
0

Commented:
yes, sorry, I should have stated yours was correct too.

I was recommending mine over yours simply because it was more efficient (less io)
so it will scale better on realistic datasets.
0

Author Commented:
Hey...

For my case this query is giving the results  expected.  There is a unique key on the date and i never get one value at a time... they always come in groups.

Im hesitant to reopen the question when my need filled

Asim
0

Commented:
then do not.

having explained the reasons why the holes in jamesgu's query can't happen, I will re-reverse my suggestions and again support jamesgu's post you already accepted.

I will add one caveat.  "they always come in groups" ,  how can you enforce that?

and even if they do come in groups.  Do they always come in groups of 1 second separation?

If not,  what happens if you get

2008-10-21 15:16:19
2008-10-21 15:16:22
2008-10-21 15:16:25

You might consider those a group but if you pick a threshold of 1 second (like in the original question) they will fail with jamesgu's query.

Or are there other contraints we aren't aware of to prevent such a thing?

0

Author Commented:
You are right... there are a lot of other constraints.  i simplified my question so as not to confuse the matter.  I hate reading long winded complex questions on here.  this is part of a huge table in a huge database which gets data from a custom measurement system. Plus the data is being scrubbed before it is loaded.

0

Commented: