group by period

fiore45
fiore45 used Ask the Experts™
on
Hi EE,

I need to write a query that group by according to 3 days period in a month. As an example;

There are phone numbers and date_called fields in a table. I want to query duplicate phone numbers which in 3 days interval.  

phone_number     called_date

xxx yyyyyyy        2011-08-01
xxx yyyyyzz        2011-08-01
xxx yyyyzzz        2011-08-02
xxx yyyyyyy        2011-08-03
xxx yyyyyyy        2011-08-05
xxx yyyyyyy        2011-08-12

My query should appear following,

Number               Count
xxx yyyyyyy        2

Thank you.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2010

Commented:
fiore45,

Please take another shot at explaining your request, because your example is unclear.

:)

Patrick

Author

Commented:
:) okey. Let me shot another.

Select Count(phone_number) , phone_number from phone_list where date_called between @date1 and @date2 group by phone_number having Count(phone_number) > 1

It gives us that duplicates phone_numbers in a given period(date).

OK.

I want to modify this query to count by 3 days.

Suppose that given period is July.

It will show duplicates phone_numbers for

1,2,3 of July
4,5,6 of July
7,6,8 of July
........
29,30,31 of July

seperatly.

Thanks.
 
Top Expert 2010

Commented:
You wrote:

7,6,8 of July

Huh? :)

Anyway, you want three-day blocks, so what do you do about February, and the months with 31 days: the numbers of days in those months are not evenly divisible by three.

Also, it is still not at all clear how you get the result of

Number               Count
xxx yyyyyyy        2

based on your sample input.

Author

Commented:

:) sorry.

It was 7,8,9 of july :)

Anyway,

phone_number     called_date

xxx yyyyyyy        2011-08-01  => first 3 days    count = 1
xxx yyyyyzz        2011-08-01  => first 3 days    count = 1
xxx yyyyzzz        2011-08-02  => first 3 days    count = 1
xxx yyyyyyy        2011-08-03  => first 3 days    count = 2
xxx yyyyyyy        2011-08-05  => 2nd 3 days   count = 1
xxx yyyyyyy        2011-08-12  => 4th 3 days     count = 1

We can accept all months with 30 days, not too much important!
Top Expert 2010

Commented:
Not trying to be dense, but the sample output from your last comment is not at all like the sample output from the original question.

If the inputs are:

phone_number     called_date
------------------------------------------------------
xxx yyyyyyy        2011-08-01
xxx yyyyyzz        2011-08-01
xxx yyyyzzz        2011-08-02
xxx yyyyyyy        2011-08-03
xxx yyyyyyy        2011-08-05
xxx yyyyyyy        2011-08-12

Open in new window


then what, exactly, should the output be?

Author

Commented:
Ok.

phone_number     called_date
------------------------------------------------------
xxx yyyyyyy        2011-08-01
xxx yyyyyzz        2011-08-01
xxx yyyyzzz        2011-08-02
xxx yyyyyyy        2011-08-03
xxx yyyyyyy        2011-08-05
xxx yyyyyyy        2011-08-12

According to these data. Which number is in three-day block and how many times repeated?

Output should be,

Number               Count
xxx yyyyyyy        2


Thanks,
Top Expert 2010

Commented:
xxx yyyyyyy appears in three separate three-day blocks.  How, then, would the count equal 2?

And what xxx yyyyyzz and xxx yyyyzzz ?
Top Expert 2010

Commented:
That last should read:

And what about xxx yyyyyzz and xxx yyyyzzz ?

Author

Commented:
xxx yyyyyzz
xxx yyyyzzz

are those same number ? No.

And.

xxx yyyyyyy        2011-08-01
xxx yyyyyyy        2011-08-03

As I said, These are in first three block(1,2,3 of August). And count is equal to 2.

Thanks,
Qlemo"Batchelor", Developer and EE Topic Advisor
Top Expert 2015

Commented:
I would expect the output to show all numbers, and their respective count in the 3 days period. Could you be after the group with the most occurances in a 3 day period? Something like   max(count(*)) ? As said, normal grouping would show all numbers in each period:

xxx yyyyyyy        2      (2011-08-01, 2011-08-03)
xxx yyyyyzz        1      (2011-08-01)
xxx yyyyzzz        1      (2011-08-02)
xxx yyyyyyy        1      (2011-08-05)
xxx yyyyyyy        1      (2011-08-12)

So, what qualifies the first result to be the only one? Because it has the max. count of all? Or because it is greater than 1? Or ...?
Expert of the Quarter 2010
Expert of the Year 2010

Commented:
-- ignore this part
with tbl(phone_number,called_date) as
(
select 'xxx yyyyyyy', convert(datetime,'2011-08-01') union all
select 'xxx yyyyyzz', '2011-08-01' union all
select 'xxx yyyyzzz', '2011-08-02' union all
select 'xxx yyyyyyy', '2011-08-03' union all
select 'xxx yyyyyyy', '2011-08-05' union all
select 'xxx yyyyyyy', '2011-08-12')

-- your query looks something like this
select phone_number, count(*)
from tbl
group by convert(char(5), called_date, 2) -- YY.MM
	,day(called_date)/3 -- every 3 days
	,phone_number

Open in new window

Expert of the Quarter 2010
Expert of the Year 2010
Commented:
I did say "like" :)
select phone_number, count(*)
from tbl
group by convert(char(5), called_date, 2) -- YY.MM
	,(day(called_date)-1)/3 -- every 3 days, the "-1" is important
	,phone_number

Open in new window

Mark WillsTopic Advisor, Page Editor
Distinguished Expert 2018

Commented:
Interesting problem...

Are you really wanting it in groups of three days or calls made within three days ? They are quite different.

E.g.  Using the data :


xxx yyyyyyy  	2011-08-01 00:00:00.000
xxx yyyyyzz  	2011-08-01 00:00:00.000
xxx yyyyzzz  	2011-08-02 00:00:00.000
xxx yyyyzzz  	2011-08-04 00:00:00.000
xxx yyyyyyy  	2011-08-03 00:00:00.000
xxx yyyyyyy  	2011-08-05 00:00:00.000
xxx yyyyyyy  	2011-08-07 00:00:00.000
xxx yyyyyyy  	2011-08-12 00:00:00.000

Open in new window


We can see that two calls for xxx yyyyzzz were made within three days.

We can see that two calls for xxx yyyyyyy were made if we group 1,2,3 then 4,5,6 then 7,8,9 then 10,11,12. But in reality, there were two calls from the 5th within three days.

And of course there is a datetime potential (being the called_date I would imagine there is a time component). So is it more like 72 hours - or - we only care about whole days ?

Maybe "rolling" 3 days is required ? that way the xxx yyyyzzz calls would show up and the 5th+7th for xxx yyyyyyy would also show up.

But then, what if we have one phone number with calls on 1st,2nd,3rd,4th Aug ? Does that mean the 1st has 3 calls (1st,2nd,3rd) and the 2nd also has 3 calls (2nd, 3rd, 4th) and the 3rd has 2 calls (3rd, 4th), so, maybe the rolling period isnt so good even though it is a better indicator of call frequency.

And that is the basic problem (me thinks) are you really trying to measure three day lumps, or, call frequency ? And if really 3 day lumps, what criteria is used for 3 days ? surely not 1,2,3 then 4,5,6 what about the example of xxx yyyyzzz where there is definitely two calls made within 3days of each other.

Then what are we measuring ? the number of calls, or the number of times there are calls within three days ?

Interesting problem... Interesting answer...

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial