asked on

Perform a join on a filtered table

This question is related to another open question I have in the SQL section.

Can someone let me know if the following is possible? Keep in mind that my base table (Table1) contains millions of records, so I'm looking for the most efficient way to achieve this.

To simplify things, let's say Table1 contains 3 fields. They are CustomerNumber, TransactionDate, and Comments. Numerous occurences of the same CustomerNumber can appear in the table. The data in Table1 might look like this:

CustomerNumber TransactionDate Comments
111 1/1/1999 This is an older comment for 111
111 12/31/2007 This is the newest comment for 111
222 6/30/2004 This is an older comment for 222
222 10/1/2006 This is a newer comment for 222
222 1/1/2008 This is the newest comment for 222

What I need:

I need the most up-to-date record for each CustomerNumber so that I'll wind up with unique occurences of each CustomerNumber. The reason I need this is because I will be joining on CustomerNumber from another table. People have already shown me how to get the most up-to-date record (by using MAX(TransactionDate), etc from Table1. Originally what I was going to do was get the most up-to-date record for each CustomerNumber in Table1 and dump them into a new table. Do I need to do this though? Or can I somehow run a query on Table1 to get the the unique CustomerNumbers (by selecting the most up to date record for each only) I'm talking about and then join my other table (which also contains a CustomerNumber field) on CustomerNumber of the 'results table/view' from the query I ran on Table1?

Mark Geerlings

"Originally what I was going to do was get the most up-to-date record for each CustomerNumber in Table1 and dump them into a new table."

Why? That looks like a *VERY* expensive (in terms of CPU cycles, disk I/O, disk space, etc.) way to get the job done.

Do you really need to copy the data to a new table, or do you just need to join the records needed in a query (in a report or screen for example)?

It is possible (and quite easy, and usually efficient) to use "MAX" in a sub-query to get just the latest record, then do a join on the resulting records to data from another table that has just one record for each CustomerNumber.

Jenkins

ASKER

This all has to do with creating a report only. Let me explain the situation a bit more and then maybe you can show me how to achieve it. Like I said, I need to get the most up-to-date record for each CustomerNumber from Table1 so that I'll wind up with a table (or temporary table or whatever) of unique CustomerNumbers. I then need to join (on CustomerNumber only) those results with a different table. That different table contains multiple occurences of CustomerNumber, which is fine. What I need is a 'list' of unique CustomerNumbers that I get from Table1 that I can use to join on the CustomerNumber of another table to produce a report. All I'm trying to do is come up with a table or a temporary table or whatever containing unique CustomerNumbers that I can use to join on the CustomerNumber field of a different table. It would seem to me though that every time I want to run a report that 'filtering' Table1 as you described and then joining on the filtered results would take a long time. Either way, please show me how I would do what you described.

Y SD

to get the unique customer id, from table 1, u can achieve in two ways....1. on your existing query where you use the max, i.e. "select customerid, max(transactiondate) from table1 group by customerid"
by adding the group by at the end, you will only have one occurance of the cusid. Or the other way is you run a query with the keyword "distinct" which will give you unique values of the existing field.
i.e. Select distinct(customerid) from table1
you can try along these lines to achieve exactly what you want...Regarding the temp table....If you are dealling with a very large amount of data, ie. millions of records then its is advisable you use one, to avoid running exception errors due to memory buffer size.

hope it helps

ASKER CERTIFIED SOLUTION

Y SD

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Jenkins

ASKER

OK. Let's say I run the following SQL on Table1 to get the results I need (originally I was going to INSERT these results into a new table but I'll leave the INSERT out for now). Something along the lines of the following will give me unique occurences of CustomerNumber (the most up to date record of each).

Select CustomerNumber,
MAX(TransactionDate),
Comments
From Table1
Group By CustomerNumber

OK. Now how would I join the CustomerNumber field from the results of the above query with the CustomerNumber field of my 'OtherTable'?
I need this all to run on one query; meaning a user might want to do a lookup on CustomerNumber '111'. So if her enters '111' for the query criteria, I want Table1 to be 'filtered' and then a join performed on the CustomerNumber of those filtered results with the CustomerNumber of my 'other table'.

SOLUTION

jwittenm

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

Mark Geerlings

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Mark Geerlings

Also, be careful with the "select distinct..." syntax in Oracle! The results may surprise you (unpleasantly!) and the performance (or lack thereof!) may seriously disappoint you. This depends on many things though, so you may or may not be surprised or disappointed.