Solved

Grabbing latest 4 records of same number with SQL

Posted on 2013-01-22
20
322 Views
Last Modified: 2013-02-16
Hi I have a table of million records and the sample data would be like bellow:

TIMESTAMP               |CUSTOMER NAME
21/01/2013 22:22:23| Johan
21/01/2013 22:22:21| Johan
21/01/2013 22:22:25| Johan
21/01/2013 22:22:26| Johan
21/01/2013 22:22:27| Johan
21/01/2013 22:22:28| Johan
21/01/2013 22:22:29| Johan
21/01/2013 22:22:23| Mike
21/01/2013 22:22:21| Mike
21/01/2013 22:22:25| Mike
21/01/2013 22:22:26| Mike
21/01/2013 22:22:27| Mike
21/01/2013 22:22:28| Mike
21/01/2013 22:22:29| Mike

so what I need is a query in postgres sql where it should grab the latest 3 records of JOHN/MIKE from table, by latest what I mean is that it should select only 3 records of them that recorded recently

thank
0
Comment
Question by:hi4ppl
  • 10
  • 10
20 Comments
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 38807412
Checking the PostGreSQL documentation, as it has ROW_NUMBER() window function, you can use the syntax specified in this article to do this:
http://www.experts-exchange.com/Database/Miscellaneous/A_3203-DISTINCT-vs-GROUP-BY-and-why-does-it-not-work-for-my-query.html

select *
  from ( select CUSTOMER_NAME,  TIMESTAMP          
                 , ROW_NUMBER() OVER ( PARTITION BY CUSTOMER_NAME ORDER BY  TIMESTAMP          DESC ) rn
            FROM yourtable
 ) sq
where sq.rn <= 3 

Open in new window


a index on customer_name + timestamp should make sure you get the best performance.
0
 
LVL 1

Author Comment

by:hi4ppl
ID: 38809330
Hi,

I have one table so there is no join involved here all data is in one flat table, anyway this is possible by query or procedure language.

thanks for help
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 38809350
my query example should do this, there is no join involved.
the article is indeed about "joins", but that's only the usual case when you have "duplicates".
0
 
LVL 1

Author Comment

by:hi4ppl
ID: 38809382
Well when I run this I get zero result but yeah I have added data as well but it's not giving any result.

select *
  from ( select address,name,  TIMESTAMP          
                 , ROW_NUMBER() OVER ( PARTITION BY address.name ORDER BY  TIMESTAMP DESC ) rn
            FROM tablename
 ) sq
where sq.rn <= 3 and to_char(timestamp,'DD-MM-YYYY')='01-12-2012'

Open in new window


thanks for help
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 38809569
this query will only return results if users have such timestamp values as top 3.
not sure if that will be really the case?!
0
 
LVL 1

Author Comment

by:hi4ppl
ID: 38810102
Well partially yes but I would like to add to this that what I want is that go grab the latest three record of that username he might have done 1000 transaction but I want to grab the latest three from that day and im not sure if it does that... because there is no way that I don't have latest 3 record out million record which this query returns zero .

thanks
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 38810178
then you have to move that condition to the subquery
select *
  from ( select address,name,  TIMESTAMP          
                 , ROW_NUMBER() OVER ( PARTITION BY address.name ORDER BY  TIMESTAMP DESC ) rn
            FROM tablename
       WHERE to_char(timestamp,'DD-MM-YYYY')='01-12-2012'
 ) sq
where sq.rn <= 3   

Open in new window

0
 
LVL 1

Author Comment

by:hi4ppl
ID: 38810618
Well it might be a dummy question but this gives me out of like following: I didn't understand what it means? sorry im not that advance

address      | Name|TIME      | rn
somwhere |          | sometime | 2
somwhere |          | sometime | 1
somwhere |          | sometime | 3

so what does it means the main thing is which is name it does not shows and plus taking example above I given in my question I need something like bellow output:


Somaddress|21/01/2013 22:22:23| Johan
Somaddress|21/01/2013 22:22:21| Johan
Somaddress|21/01/2013 22:22:25| Johan
Somaddress|21/01/2013 22:22:23| Mike
Somaddress|21/01/2013 22:22:21| Mike
Somaddress|21/01/2013 22:22:25| Mike

above output is my goal
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 38810869
you can do that "finetuning" like this.
select address, timestamp, name
  from ( select address,name,  TIMESTAMP          
                 , ROW_NUMBER() OVER ( PARTITION BY address.name ORDER BY  TIMESTAMP DESC ) rn
            FROM tablename
       WHERE to_char(timestamp,'DD-MM-YYYY')='01-12-2012'
 ) sq
where sq.rn <= 3   
order by sq.name, rn 

Open in new window

0
 
LVL 1

Author Comment

by:hi4ppl
ID: 38810983
Hi thanks I get this error it says the column sq.name does not exist what exactly that column is? thanks
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 38811006
I presume you don't use the sql as I wrote it ...

the "sq." refers to the subquery alias ...
0
 
LVL 1

Author Comment

by:hi4ppl
ID: 38811885
Hi no I run it as it's no changes the only things I do is changing the fields and database name to it's actual
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 38813218
let's see... you put FROM tablename, but PARTITION BY address.name ...
so, there MUST be something you change in the query ... or you need to change  ...

select *
  from ( select address, name,  TIMESTAMP          
                 , ROW_NUMBER() OVER ( PARTITION BY address.name ORDER BY  TIMESTAMP DESC ) rn
            FROM address
    WHERE to_char(timestamp,'DD-MM-YYYY')='01-12-2012'
 ) sq
where rn <= 3
order by name, rn  

Open in new window


or, you changed the "." between address.name there into a comma (which is wrong in regards to the required results)


select *
  from ( select address, name,  TIMESTAMP          
                 , ROW_NUMBER() OVER ( PARTITION BY name ORDER BY  TIMESTAMP DESC ) rn
            FROM yourtablename
    WHERE to_char(timestamp,'DD-MM-YYYY')='01-12-2012'
 ) sq
where rn <= 3
order by name, rn  

Open in new window

0
 
LVL 1

Author Comment

by:hi4ppl
ID: 38848900
still it does not give me the result that I was asking I m running the same query but changing the files, I cannot post the exact fields if there is way I can send you PM? I'm not sure if there is any?
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 38848945
Check my profile
0
 
LVL 1

Author Comment

by:hi4ppl
ID: 38850742
Hi I have manage to make the changes on this but when I run I get the following I don't even know what RN means here I thought it shows number of records that this account have but when I check on the same date it has only one which shows the output.

amount	accountnumber	timestamp	rn
200	          4872036	1/1/2013 10:38	1
200	          4903993	1/1/2013 10:31	2
200	          2442270	1/1/2013 10:30	3
3000	7728031	        1/1/2013 23:59	1
3000	2161910	        1/1/2013 23:59	2
5000	7145383	        1/1/2013 23:58	3
9900	2145034	       1/1/2013 12:04	1

Open in new window


with the bellow changes on the query

select *
  from ( select amount, accountnumber,  TIMESTAMP          
                 , ROW_NUMBER() OVER ( PARTITION BY amount ORDER BY  TIMESTAMP DESC ) rn
            FROM accounts
    WHERE to_char(timestamp,'DD-MM-YYYY')='01-01-2013'
 ) sq
where rn <= 3
and amount>0 
order by amount, rn  

Open in new window


I'm not sure if I didn't make myself clear as there is misunderstand here but the result that i'm looking for from this would be bellow

amount	accountnumber	timestamp	
200	          4872036	1/1/2013 10:38	
500	          4872036	1/2/2013 10:38	
600	          4872036	1/3/2013 10:38	
200	          4903993	1/1/2013 10:31	
700	          4903993	1/3/2013 10:31	
600	          4903993	1/4/2013 10:31	

Open in new window


this should continue grabbing  3 records for each account the latest one  or the old one no matter how many records they have I need the three latest one

thanks for the help.
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 38850835
so you want to PARTITION BY account, and not by amount ...
also, you want to move the AMOUNT> 0 condition to the subquery, I think

RN is the alias for the ROW_NUMBER() expression.

select *
  from ( select amount, accountnumber,  TIMESTAMP          
                 , ROW_NUMBER() OVER ( PARTITION BY accountnumber ORDER BY  TIMESTAMP DESC ) rn
            FROM accounts
    WHERE to_char(timestamp,'DD-MM-YYYY')='01-01-2013'
    and amount>0 
 ) sq
where rn <= 3
order by accountnumber , rn  

Open in new window

0
 
LVL 1

Author Comment

by:hi4ppl
ID: 38851206
Okay thanks I got the result so the result that is coming in the rn what does that means like I have numbers their from 1-3 ?

thanks for help
0
 
LVL 142

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 300 total points
ID: 38851227
when you remove this line:
where rn <= 3
you should see what it means...

in short, the ROW_NUMBER() function establies a "row number" per "partition" (in this case, the account num), based on the ORDER BY criteria.

the subquery is written to make it possible to filter on that expression, so take only the "first 3" .

if you don't want the rn field to appear, replace the SELECT * by the SELECT amount, accountnumber,  TIMESTAMP  in the outer query

select amount, accountnumber,  TIMESTAMP 
  from ( select amount, accountnumber,  TIMESTAMP          
                 , ROW_NUMBER() OVER ( PARTITION BY accountnumber ORDER BY  TIMESTAMP DESC ) rn
            FROM accounts
    WHERE to_char(timestamp,'DD-MM-YYYY')='01-01-2013'
    and amount>0 
 ) sq
where rn <= 3
order by accountnumber , rn  

Open in new window

0
 
LVL 1

Author Closing Comment

by:hi4ppl
ID: 38896412
thanks
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Best database to use for Maps is PostgreSQL. This is an open source database. Comes as a package with most Linux OS. For more info visit the following site: http://www.postgresql.org/ (http://www.postgresql.org/) This requires some add-o…
Many developers have database experience, but are new to PostgreSQL. It has some truly inspiring capabilities. I have several years' experience with Microsoft's SQL Server. When I began working with MySQL, I wanted a quick-reference to MySQL (htt…
Steps to create a PostgreSQL RDS instance in the Amazon cloud. We will cover some of the default settings and show how to connect to the instance once it is up and running.
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now