randomly assign grouped data to a group

I have the following table named PeopleLocation

PersonID           State                 zipCode
1000                   IL                        60067
2000                   IL                        60067
3000                   TX                      77045
4000                   IL                        60074
5000                   TX                       77045
6000                   TX                      77045
7000                   IL                        60074
8000                   TX                       77045
9000                   IL                         60067




What I want to do is write a query that will group the total number of people by state and zipcode then randomly assign a group value of 1, 2 or 3

(if the number is not divisible by 3 it is ok if group size is not equal or if there are no entries in a group

Using the sample above, I would like to have something like

PersonID           State                 zipCode     Group
1000                   IL                        60067        1
9000                   IL                        60067        2
2000                   IL                        60067        3
4000                   IL                        60074        1
7000                   IL                        60074        2
6000                   TX                      77045         1
5000                   TX                       77045        2
3000                   TX                      77045         3
8000                   TX                       77045        1

Hopefully this makes sense
johnnyg123Asked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
Habib PourfardConnect With a Mentor Software DeveloperCommented:
you can write it with use of row_number() to assign a group number to each group of state and zip code and to make it random you can order it by NEWID():

IF OBJECT_ID('tempdb.dbo.#PeopleLocation') IS NOT NULL
	DROP TABLE #PeopleLocation
GO

CREATE TABLE #PeopleLocation
(
	PersonID INT PRIMARY KEY,
	[State] VARCHAR(2),
	zipCode INT
)

INSERT INTO #PeopleLocation VALUES (1000,'IL', 60067 )
INSERT INTO #PeopleLocation VALUES (2000,'IL', 60067 )
INSERT INTO #PeopleLocation VALUES (3000,'TX', 77045 )
INSERT INTO #PeopleLocation VALUES (4000,'IL', 60074 )
INSERT INTO #PeopleLocation VALUES (5000,'TX', 77045 )
INSERT INTO #PeopleLocation VALUES (6000,'TX', 77045 )
INSERT INTO #PeopleLocation VALUES (7000,'IL', 60074 )
INSERT INTO #PeopleLocation VALUES (8000,'TX', 77045 )
INSERT INTO #PeopleLocation VALUES (9000,'IL',  60067)


SELECT *, ((ROW_NUMBER() OVER (PARTITION BY State, zipCode ORDER BY NEWID())) - 1) % 3 + 1 AS [Group] FROM #PeopleLocation
ORDER BY [State], zipCode

Open in new window

0
 
Aneesh RetnakaranDatabase AdministratorCommented:
;WITH c AS (
 SELECT PersonID, state, ZipCode, [Group]= ROW_NUMBER() OVER(PARTITION BY state ORDER BY zipcode )
 FROM yourTable
 )
 SELECT * FROM c
0
 
PaulCommented:
try these?
select
      *
      , (ABS(CHECKSUM(NewId())) % 3) + 1 as ri
      , NTILE(3) over(partition by [State], [zipCode] order by NewId()) as nt
from PeopleLocation

Open in new window


results(1)
| PERSONID | STATE | ZIPCODE | RI | NT |
|----------|-------|---------|----|----|
|     1000 |    IL |   60067 |  3 |  2 |
|     2000 |    IL |   60067 |  1 |  3 |
|     9000 |    IL |   60067 |  1 |  1 |
|     4000 |    IL |   60074 |  1 |  1 |
|     7000 |    IL |   60074 |  1 |  2 |
|     3000 |    TX |   77045 |  2 |  1 |
|     5000 |    TX |   77045 |  2 |  2 |
|     6000 |    TX |   77045 |  3 |  3 |
|     8000 |    TX |   77045 |  1 |  1 |

Open in new window


results(2)
| PERSONID | STATE | ZIPCODE | RI | NT |
|----------|-------|---------|----|----|
|     1000 |    IL |   60067 |  1 |  1 |
|     2000 |    IL |   60067 |  3 |  3 |
|     9000 |    IL |   60067 |  1 |  2 |
|     4000 |    IL |   60074 |  2 |  1 |
|     7000 |    IL |   60074 |  2 |  2 |
|     3000 |    TX |   77045 |  1 |  1 |
|     5000 |    TX |   77045 |  2 |  1 |
|     6000 |    TX |   77045 |  1 |  3 |
|     8000 |    TX |   77045 |  1 |  2 |
		

Open in new window


see:
http://sqlfiddle.com/#!3/f1b796/6
http://sqlfiddle.com/#!3/f1b796/7

details:
**MS SQL Server 2008 Schema Setup**:

    CREATE TABLE PeopleLocation
    	([PersonID] int, [State] varchar(2), [zipCode] int)
    ;
    	
    INSERT INTO PeopleLocation
    	([PersonID], [State], [zipCode])
    VALUES
    	(1000, 'IL', 60067),
    	(2000, 'IL', 60067),
    	(3000, 'TX', 77045),
    	(4000, 'IL', 60074),
    	(5000, 'TX', 77045),
    	(6000, 'TX', 77045),
    	(7000, 'IL', 60074),
    	(8000, 'TX', 77045),
    	(9000, 'IL', 60067)
    ;

**Query 1**:

    
    select
          *
          , (ABS(CHECKSUM(NewId())) % 3) + 1 as ri
          , NTILE(3) over(partition by [State], [zipCode] order by NewId()) as nt
    from PeopleLocation
    order by [State], [zipCode], [PersonID]
    

**[Results][2]**:
    
    | PERSONID | STATE | ZIPCODE | RI | NT |
    |----------|-------|---------|----|----|
    |     1000 |    IL |   60067 |  1 |  1 |
    |     2000 |    IL |   60067 |  3 |  3 |
    |     9000 |    IL |   60067 |  1 |  2 |
    |     4000 |    IL |   60074 |  2 |  1 |
    |     7000 |    IL |   60074 |  2 |  2 |
    |     3000 |    TX |   77045 |  1 |  1 |
    |     5000 |    TX |   77045 |  2 |  1 |
    |     6000 |    TX |   77045 |  1 |  3 |
    |     8000 |    TX |   77045 |  1 |  2 |



  [1]: http://sqlfiddle.com/#!3/f1b796/7

Open in new window

0
 
ZberteocCommented:
Try this:
select 
	PersonID,
	[State],
	zipCode,
	(ROW_NUMBER() OVER (PARTITION BY [State], zipCode ORDER BY PersonID)-1)%3+1 as [group]
from
	#PeopleLocation
order by
	[State],
	zipCode,
	[group],
	PersonID

Open in new window

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.