We help IT Professionals succeed at work.

MAX Count in Subquery

Hi! I'm looking at a crime dataset and I want to create a query that finds the block of each district (there are 22 total) with the highest number of gun-related crimes. I think I have to create a subquery but not really sure how to do this. I tried creating an alias with mycount but that also didn't work.

My most successful attempt is:
gun='%GUN%'
try:
    cursor.execute("SELECT block, count(block) AS MYCOUNT FROM crimes WHERE DESCRIPTION::text LIKE %s GROUP BY block ORDER BY count(block) DESC",[gun])
    rows4=cursor.fetchall()
except:
    db_connection.rollback()
Annotation-2020-02-16-095100.pngand it creates the attached file. I want to select only the 22 blocks that have the most gun-related crimes in their corresponding district.
Comment
Watch Question

CERTIFIED EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:

Please post what database and programming language you are using.


Can you also post sample data in text for and expected results?  That way we can set up a test case and provide tested SQL.

CERTIFIED EXPERT

Commented:
alter your query to include TOP 22

before:      cursor.execute("SELECT block, count(block)
after:         cursor.execute("SELECT TOP 22 block, count(block)

Author

Commented:
I'm using POSTGRESQL in jupyter notebook and the dataset can be found at https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2.

This is what I have now, but it seems super complicated:
  cursor.execute("SELECT \
                        crime_counts.block, \
                        crime_counts.district \
                   FROM \
                        (SELECT \
                            block, \
                            district, \
                            COUNT(*) AS crime_count \
                        FROM crimes \
                        WHERE description like '%GUN%' \
                        GROUP BY \
                            block, \
                            district) AS crime_counts \
                   LEFT JOIN \
                       (SELECT \
                           district, \
                           MAX(crime_count) AS c_c \
                        FROM (SELECT \
                                   block, \
                                   district, \
                                   COUNT(*) AS crime_count \
                                FROM \
                                    crimes \
                               WHERE description like '%GUN%' \
                               GROUP BY block, district) x \
                               GROUP BY district \
                               ORDER BY district) AS max_crimes  \
                    ON crime_counts.district = max_crimes.district \
                    AND crime_counts.crime_count = max_crimes.c_c \
                    ORDER BY crime_counts.district")

Author

Commented:
There are 22 districts and I need the blocks that have the highest number of gun-related crimes for that specific district. There are multiple blocks that satisfy this max requirement for each district. It would be ideal if the query returned district, blocks that have max_count, max_count for districts 1:22.
CERTIFIED EXPERT

Commented:
Below query uses DENSE_RANK function.  The goal is for the result-set to have a minimum of 22 rows, so each district is represented. However, if a district has several blocks having the same highest gun-related crime then show all those blocks too:
SELECT	a.block
,		a.MYCOUNT
FROM
(	SELECT	b.*
	,	dr = DENSE_RANK() OVER ( PARTITION BY b.block, b.MYCOUNT ORDER BY b.MYCOUNT DESC )
	FROM
	(	SELECT	block
		,	count(block) AS MYCOUNT
		FROM	crimes
		WHERE	DESCRIPTION::text LIKE '%GUN%'
		GROUP
		BY	block
	) b
) a
WHERE	a.dr = 1

Open in new window

CERTIFIED EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:

I've added the Postgresql Topic Area.


Something along the dense_rank John posted should be what you are looking for.  You just need to add district.


I was going to look at the data from the link you provided but canceled the download around 500M.


If you want tested SQL, please provide a subset of the data and the expectedresults from the data you provide.

CERTIFIED EXPERT
Commented:
A few observations:
1. the above query lists more rows than expected (it includes rows with lower counts than the top)
2. there are more than 22 districts in the data
3. the district number should be visible on output
SELECT district, block, crimeCNT 
  FROM (
         SELECT district, block, COUNT(*) AS crimeCNT, 
                DENSE_RANK() OVER (PARTITION BY district ORDER BY COUNT(*) DESC) AS TOPn
           FROM crimes WHERE DESCRIPTION::text LIKE '%GUN%' 
	      GROUP BY district, block) counts 
 WHERE TOPn = 1

Open in new window

Author

Commented:
Thank you for all of your help!