Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

How to limit distinct or grouped results

Posted on 2011-02-11
10
Medium Priority
?
553 Views
Last Modified: 2012-05-11
Using SQLLite, I'm trying to limit a result set to an arbitrary number of rows. That is, if I have something like:

Theatre   Show               Time
1A            ExpertZilla    10:00 am
1A            ExpertZilla    11:00 am
1A            ExpertZilla    12:00 pm
1A            SQLHorrors     01:00 pm
1A            ExpertZilla    02:00 pm
1A            ExpertZilla    03:00 pm
1A            ExchangeThra   04:00 pm
1A            ExchangeThra   05:00 pm
1A            ExchangeThra   06:00 pm
1A            ExchangeThra   07:00 pm
1A            ExchangeThra   08:00 pm
1A            ExchangeThra   09:00 pm
1A            SQLHorrors     10:00 pm
1A            SQLHorrors     12:00 pm

What I'm looking for is: for any given theatre, for any given show, the top N times.
(I'll add 'start time' to my query later, for simplicity just assume the first N).

So, if N=3  I want to see:
1A            ExpertZilla    10:00 am
1A            ExpertZilla    11:00 am
1A            ExpertZilla    12:00 pm
1A            SQLHorrors     10:00 pm
1A            SQLHorrors     12:00 pm
1A            SQLHorrors     01:00 pm
1A            ExchangeThra   04:00 pm
1A            ExchangeThra   05:00 pm
1A            ExchangeThra   06:00 pm

I was thinking of a limit on a sub-select, butcan't quite work out how to get it to do what I need.

TIA.

EdB
0
Comment
Question by:edbored
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 41

Expert Comment

by:Sharath
ID: 34876969
If you are working in SQL Server 2005 or later, you can use ROW_NUMBER.
select Theatre,Show,Time
  from (
select Theatre,Show,Time,
       row_number() over (partition by Theatre,Show order by Time) rn
  from your_table) t1
 where rn<=3

Open in new window

0
 
LVL 41

Accepted Solution

by:
Sharath earned 1000 total points
ID: 34876974
If you are working in SQL Server 2000 or MySQL, you need to implement ROW_NUMBER like this.
select Theatre,Show,Time
  from (
select Theatre,Show,Time,
       (select count(*) from your_table as t1 
         where t1.Theatre = t2.Theatre and t1.Show = t2.Show and t1.Time <= t2.Time) as rn
  from your_table as t2) as t3
 where rn <= 3

Open in new window

0
 
LVL 1

Author Comment

by:edbored
ID: 34877183
I oversimplified...  I think the second form may work, but the performance is dismal.

There's actually a number of tables involved.  Here's the actual code (train schedule not theatres, but was easier to describe as theatres earlier!)

I created a view around a single stop_id (station).

Performance isn't great - see anything I could do different?

BTW - thanks for very quick response!

create view if not exists rlist  as 
SELECT          stoptimes.stop_id, 
                routes.route_id,
                Routes.route_long_name,
                Routes.route_short_name,
                Trips.trip_headsign,
                Stoptimes.departure_time
           FROM Trips
                JOIN StopTimes
                  ON Trips.trip_id = StopTimes.trip_id
                JOIN Routes
                  ON Routes.route_id = Trips.route_id
          WHERE StopTimes.stop_id = '15930'
               -- AND
               -- StopTimes.departure_time >= time( 'now', 'localtime', '-550 minutes' )
               -- AND
               -- StopTimes.departure_time <= time( 'now', 'localtime', '-460 minutes' )
          ORDER BY StopTimes.departure_time ;

--select * from rlist;

select stop_id, route_id, route_long_name, route_short_name, trip_headsign, departure_time
  from (
select stop_id, route_id, route_long_name, route_short_name, trip_headsign, departure_time,
       (select count(*) from rlist as t1 
         where t1.route_id=t2.route_id 
         and   t1.route_long_name=t2.route_long_name 
         and   t1.route_short_name=t2.route_short_name
         and   t1.trip_headsign=t2.trip_headsign
         and   t1.departure_time<=t2.departure_time) as rn
  from rlist as t2) as t3
 where rn <= 3
 order by departure_time

Open in new window

0
Will your db performance match your db growth?

In Percona’s white paper “Performance at Scale: Keeping Your Database on Its Toes,” we take a high-level approach to what you need to think about when planning for database scalability.

 
LVL 8

Expert Comment

by:raulggonzalez
ID: 34877951
Hi,

In sqlLite you have the LIMIT clause available, the same as MySql ...

Have you tried it?? You wouldn't need the workaround with the count and all that... the performance will be boosted up

http://www.sqlite.org/lang_select.html

cheers
0
 
LVL 1

Author Comment

by:edbored
ID: 34879211
Well, working with the same 'create view' as previous sample, I tried this:

select stop_id, 
       route_id, 
       route_long_name, 
       route_short_name, 
       trip_headsign, 
       departure_time
from (  select stop_id, 
               route_id, 
               route_long_name, 
               route_short_name, 
               trip_headsign, 
               departure_time
        from rlist as t1 
        where t1.route_id=t2.route_id 
        and   t1.route_long_name=t2.route_long_name 
        and   t1.route_short_name=t2.route_short_name
        and   t1.trip_headsign=t2.trip_headsign
        and   t1.departure_time<=t2.departure_time
        limit 3 
      ) as t2
 order by departure_time

Open in new window


I can't seem to figure out how to properly alias the main select in order to have T2.xxx recognized.


Thanks again...

EdB
0
 
LVL 8

Expert Comment

by:raulggonzalez
ID: 34879913
I don't get why you want to do it in 2 steps when I guess you can combine ORDER BY and LIMIT in the same query and get the same result???

Have you tried this?


select stop_id,
               route_id,
               route_long_name,
               route_short_name,
               trip_headsign,
               departure_time
        from rlist as t1
        where t1.route_id=t2.route_id
        and   t1.route_long_name=t2.route_long_name
        and   t1.route_short_name=t2.route_short_name
        and   t1.trip_headsign=t2.trip_headsign
        and   t1.departure_time<=t2.departure_time
order by departure_time limit 3
         


http://www.mysqlperformanceblog.com/2006/09/01/order-by-limit-performance-optimization/

Cheers
0
 
LVL 1

Author Comment

by:edbored
ID: 34879958
That was one of the first things I tried (without the VIEW).  

It only returns 3 records.

I'm trying to return a max of n (in this case 3) records for each stop_id.

In the simpler example of the theatre, I want to see a max of 3 records per show for a particular theatre.

That is, for each theatre - the next three showings of each film.

For train stations, the next three trains to arrive at a station (regardless of final destination).

Thx.

EdB
0
 
LVL 41

Expert Comment

by:Sharath
ID: 34880257
>> In the simpler example of the theatre, I want to see a max of 3 records per show for a particular theatre.

There is no other way until you generate a row number like I mentioned.
0
 
LVL 3

Assisted Solution

by:paulwquinn
paulwquinn earned 1000 total points
ID: 34891694
The first issue is the format you are using to store your showtimes: SQLite doesn't have a specific storage class for dates and/or times. They're normally stored as TEXT, REAL or INTEGER values. I assume you are using TEXT. SQLite date and time functions don't use/recognize 12-hour time strings with AM/PM, so if by "TOP" times you mean the first/next three times that will occur, it's a problem. "02:00 pm" will be listed before "10:00 am" if you order on the time column in a query. Similarly '12:00 pm' will be listed after every other time, including '02:00 pm', '03:00 pm', etc. You'll either have to extend SQLite with your own custom function or order the results elsewhere in your application.

Assuming you can ignore the above problem (a BIG and probably erroneous assumption... :^) ), we turn to the problem at hand. Unfortunately, LIMIT clauses can only appear at the end of an entire compound select statement. This means they can't be used with a GROUP BY clause to limit the number of rows returned for each theatre/show pair. They can't even be used within the simple SELECT clauses of a compound SELECT constructed using the UNION operator.

One posible alternative to achieve the result you're looking for is to use SQL to create SQL and build a series of simple SELECTs that you can run and then concatenate the result-sets together yourself elsewhere, e.g. in your application.

For example, I've attached a SQLite SQL script (get3listings.sql0 that you can run to get the type of output that you desire in a file called '3listings.txt'. You'll have to customize the table and column names appropriately for your schema.You can then (obviously) post-process (e.g. read) the file into your application. Conversely, you can use something like the SQLite C/C++ Interface to do the same thing completely inside your application code, i.e. no files required. If you're handling everything inside your own code, you could, of course,  simply read in the entire table in the desired order (ORDER BY theatre,show), then skip the records you don't want in your code.

 get3listings.sql
0
 
LVL 1

Author Closing Comment

by:edbored
ID: 34973006
Split points - first (Sharath) worked, but dismal performance (not the poster's fault though).

Second (paulwquinn) would probably work quite nicely, but not practical in this particular implementation.

Thanks to both.
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In part one, we reviewed the prerequisites required for installing SQL Server vNext. In this part we will explore how to install Microsoft's SQL Server on Ubuntu 16.04.
What if you have to shut down the entire Citrix infrastructure for hardware maintenance, software upgrades or "the unknown"? I developed this plan for "the unknown" and hope that it helps you as well. This article explains how to properly shut down …
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question