asked on

MAX Count in Subquery

Hi! I'm looking at a crime dataset and I want to create a query that finds the block of each district (there are 22 total) with the highest number of gun-related crimes. I think I have to create a subquery but not really sure how to do this. I tried creating an alias with mycount but that also didn't work.

My most successful attempt is:
gun='%GUN%'
try:
cursor.execute("SELECT block, count(block) AS MYCOUNT FROM crimes WHERE DESCRIPTION::text LIKE %s GROUP BY block ORDER BY count(block) DESC",[gun])
rows4=cursor.fetchall()
except:
db_connection.rollback()

and it creates the attached file. I want to select only the 22 blocks that have the most gun-related crimes in their corresponding district.

slightwv (䄆 Netminder)

Please post what database and programming language you are using.

Can you also post sample data in text for and expected results? That way we can set up a test case and provide tested SQL.

John_Vidmar

alter your query to include TOP 22

before: cursor.execute("SELECT block, count(block)
after: cursor.execute("SELECT TOP 22 block, count(block)

Cameron Moore

ASKER

I'm using POSTGRESQL in jupyter notebook and the dataset can be found at https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2.

This is what I have now, but it seems super complicated:
cursor.execute("SELECT \
crime_counts.block, \
crime_counts.district \
FROM \
(SELECT \
block, \
district, \
COUNT(*) AS crime_count \
FROM crimes \
WHERE description like '%GUN%' \
GROUP BY \
block, \
district) AS crime_counts \
LEFT JOIN \
(SELECT \
district, \
MAX(crime_count) AS c_c \
FROM (SELECT \
block, \
district, \
COUNT(*) AS crime_count \
FROM \
crimes \
WHERE description like '%GUN%' \
GROUP BY block, district) x \
GROUP BY district \
ORDER BY district) AS max_crimes \
ON crime_counts.district = max_crimes.district \
AND crime_counts.crime_count = max_crimes.c_c \
ORDER BY crime_counts.district")

Cameron Moore

ASKER

There are 22 districts and I need the blocks that have the highest number of gun-related crimes for that specific district. There are multiple blocks that satisfy this max requirement for each district. It would be ideal if the query returned district, blocks that have max_count, max_count for districts 1:22.

John_Vidmar

Below query uses DENSE_RANK function. The goal is for the result-set to have a minimum of 22 rows, so each district is represented. However, if a district has several blocks having the same highest gun-related crime then show all those blocks too:

SELECT	a.block
,		a.MYCOUNT
FROM
(	SELECT	b.*
	,	dr = DENSE_RANK() OVER ( PARTITION BY b.block, b.MYCOUNT ORDER BY b.MYCOUNT DESC )
	FROM
	(	SELECT	block
		,	count(block) AS MYCOUNT
		FROM	crimes
		WHERE	DESCRIPTION::text LIKE '%GUN%'
		GROUP
		BY	block
	) b
) a
WHERE	a.dr = 1

Open in new window

slightwv (䄆 Netminder)

I've added the Postgresql Topic Area.

Something along the dense_rank John posted should be what you are looking for. You just need to add district.

I was going to look at the data from the link you provided but canceled the download around 500M.

If you want tested SQL, please provide a subset of the data and the expectedresults from the data you provide.

ASKER CERTIFIED SOLUTION

Pavel Celba

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Cameron Moore

ASKER

Thank you for all of your help!