Avatar of Cameron Moore
Cameron Moore
 asked on

MAX Count in Subquery

Hi! I'm looking at a crime dataset and I want to create a query that finds the block of each district (there are 22 total) with the highest number of gun-related crimes. I think I have to create a subquery but not really sure how to do this. I tried creating an alias with mycount but that also didn't work.

My most successful attempt is:
gun='%GUN%'
try:
    cursor.execute("SELECT block, count(block) AS MYCOUNT FROM crimes WHERE DESCRIPTION::text LIKE %s GROUP BY block ORDER BY count(block) DESC",[gun])
    rows4=cursor.fetchall()
except:
    db_connection.rollback()
Annotation-2020-02-16-095100.pngand it creates the attached file. I want to select only the 22 blocks that have the most gun-related crimes in their corresponding district.
PostgreSQLSQL

Avatar of undefined
Last Comment
Cameron Moore

8/22/2022 - Mon
slightwv (䄆 Netminder)

Please post what database and programming language you are using.


Can you also post sample data in text for and expected results?  That way we can set up a test case and provide tested SQL.

John_Vidmar

alter your query to include TOP 22

before:      cursor.execute("SELECT block, count(block)
after:         cursor.execute("SELECT TOP 22 block, count(block)
Cameron Moore

ASKER
I'm using POSTGRESQL in jupyter notebook and the dataset can be found at https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2.

This is what I have now, but it seems super complicated:
  cursor.execute("SELECT \
                        crime_counts.block, \
                        crime_counts.district \
                   FROM \
                        (SELECT \
                            block, \
                            district, \
                            COUNT(*) AS crime_count \
                        FROM crimes \
                        WHERE description like '%GUN%' \
                        GROUP BY \
                            block, \
                            district) AS crime_counts \
                   LEFT JOIN \
                       (SELECT \
                           district, \
                           MAX(crime_count) AS c_c \
                        FROM (SELECT \
                                   block, \
                                   district, \
                                   COUNT(*) AS crime_count \
                                FROM \
                                    crimes \
                               WHERE description like '%GUN%' \
                               GROUP BY block, district) x \
                               GROUP BY district \
                               ORDER BY district) AS max_crimes  \
                    ON crime_counts.district = max_crimes.district \
                    AND crime_counts.crime_count = max_crimes.c_c \
                    ORDER BY crime_counts.district")
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
Cameron Moore

ASKER
There are 22 districts and I need the blocks that have the highest number of gun-related crimes for that specific district. There are multiple blocks that satisfy this max requirement for each district. It would be ideal if the query returned district, blocks that have max_count, max_count for districts 1:22.
John_Vidmar

Below query uses DENSE_RANK function.  The goal is for the result-set to have a minimum of 22 rows, so each district is represented. However, if a district has several blocks having the same highest gun-related crime then show all those blocks too:
SELECT	a.block
,		a.MYCOUNT
FROM
(	SELECT	b.*
	,	dr = DENSE_RANK() OVER ( PARTITION BY b.block, b.MYCOUNT ORDER BY b.MYCOUNT DESC )
	FROM
	(	SELECT	block
		,	count(block) AS MYCOUNT
		FROM	crimes
		WHERE	DESCRIPTION::text LIKE '%GUN%'
		GROUP
		BY	block
	) b
) a
WHERE	a.dr = 1

Open in new window

slightwv (䄆 Netminder)

I've added the Postgresql Topic Area.


Something along the dense_rank John posted should be what you are looking for.  You just need to add district.


I was going to look at the data from the link you provided but canceled the download around 500M.


If you want tested SQL, please provide a subset of the data and the expectedresults from the data you provide.

⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ASKER CERTIFIED SOLUTION
Pavel Celba

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Cameron Moore

ASKER
Thank you for all of your help!