Load a table with unique values

I'm simplifying my actual situation for illustrative purposes, so please read through everything I've written before you answer.

Let's say I have an existing table (Table1) containing 3 columns: Customer Number,  Transaction Date, and Transaction Number.  The data in the Table1 might look like this:

Customer Number                    Transaction Date                Transaction Number
        111                                          1/1/2000                                     1
        111                                       12/15/2002                                    2
        111                                         7/1/2005                                      3
        222                                        10/1/1999                                     1
        222                                          9/1/2007                                     2
        333                                         1/1/2007                                      1
        333                                         6/30/2007                                    2
As you can see in the above table, the same Customer Number can appear multiple times.  Also note that the Transaction Number relates to individual Customer Numbers. In other words, if a given customer purchases something for the first time, he is assigned a Transaction Number of 1. If that same customer purchases something  again, he is assigned a Transaction Number of 2. If a DIFFERENT customer comes in for the first time, he is assigned a Transaction Number of 1. If he comes in again, he is assigned a Transaction Number of 2, etc, etc.

What I need to do:

1.  I need to create a new table (I'll call it Table2) and load it with unique Customer Numbers (each Customer Number can only occur once) based on the most current record for each Customer Number ( using the Transaction Date field or the Transaction Number field - please show me both ways) in Table1.  I need to point out here that using the Customer Number field alone to select unique records is not good enough. I need to select the most current record ( identifiable by the values in the Transaction Date or Transaction Number columns)   So for a given Customer Number, I need to somehow be able to identify the most current date record or the highest numbered Transaction Number for that Customer (which would also indicate the most current record).

2. Nightly I need to update the table I created (Table2) with the most up-to-date records from Table1. This could include either entirely new Customer Numbers (which would have a Transaction Number of 1 for that day) or existing Customer Numbers with new transactions ( for example, the prior Transaction Number might have been 3 and now it's 4).

So in summary, I need to load an empty table with unique Customer Numbers based on the most current Transaction Date or Transaction Number (using either one will work but please show me both ways).

AND I need to update the new table (Table2) nightly with updated records from Table1.

Again I need to stress..........using the Customer Number field alone to select unique Customer Numbers is not good enough. I need to select the most current records from Table1 for each Customer Number. I've left out numerous fields from my actual table, so you can't see entirely what's going on. I only included fields needed to achieve what I'm trying to achieve.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Nick UpsonPrincipal Operations EngineerCommented:
The basic sql should be (syntax may not be 100%):

insert into table2 as
select customernumber, max(transactiondate) group by customer number
select customernumber, max(transactionnumber) group by customer number

the update is complex, I'd just use the above each night if you can

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial

the above solution will work only if you need those 2 columns (customer number and transaction date)
i assume your table have more columns, so i would go with this

insert into table2
select t1.*
from  table1 t1
  inner join
        (select customernumber, max(transactiondate) as tdate
          from table) as t2
  on t1.customernumber = t2.customernumber
 and t1.transactiondate = t2.tdate

this will do the intiall load

in order to do the nightly procedure i would suggest one of the following
1) you can use a trigger on table1 that will keep table2 updated, this will probably be the most simple solution, and it will speard the load during the day
2) every night you can delete all the records from table2 and built it again using the above sql statement (which by the way, if you are dealing with large amounts of data, would be more usefull to use some kind of load / import rather then the insert statement)

you can not update existing record using a simple sql since db2 don't support a join in an update statement
you will have to write a program for that
dbfromnewjerseyAuthor Commented:
momi  sabag,

I don't think that solution will work though because there is the potential of more than 1 transaction occuring on the same date for a given Customer Number. If you only check the  Transaction Date field, you're not necessarily getting the most up-to-date record.  Not only does the 'max' Transaction Date for a given Customer Number need to be found but also the 'max' Transaction Number on a particualr date.

For example:

Let's say that on 1/1/2007, two transactions were performed for Customer Number 555.  The records then might look something like this:

Customer Number                    Transaction Date                Transaction Number
        555                                          1/1/2007                                     1
        555                                          1/1/2007                                     2

I need to be able to get the most up-to-date record, which in the case of Customer Number 555 would be:
Customer Number                    Transaction Date                Transaction Number
        555                                          1/1/2007                                     2

Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!


Creating this view is easy:

==> select customernumber, max(transactionnumber) group by customer number

Once that view exists, you can always use it to select the appropriate rows from the base table. (And since the column values from that view always point to the appropriate latest row per customer, it's not clear why there's any need to pull those rows into a separate table. But if that's what you need to do, it's your choice. Keep in mind that Table2 will need to be cleared every night before putting the new rows in if I understand correctly.)

dbfromnewjerseyAuthor Commented:

The reason I need to pull those rows into a seperate table is because I need to use the Customer Number field to join another table. I need a table containing unique occurences of customer numbers. Since there are other fields involved, which I won't get into, I need the most up-to-date record for each customer number from the original table.  I will be joining on customer number only from the new table I'm creating with another table.  I figured in order to do this, I'd need to load a new table. Unless what you're saying is I can "filter" the records in the original table and then do a join on the filtered records. I don't know.
If you are storing the date as a date, then just because you only see mm/dd/yyyy doesn't mean that's all there is.  Set your nls_date_format to show similar to "dd-mon-yyyy hh24:mi:ss" to see the full date-time.  momi's solution should work fine for you.  If you are storing the date as a varchar2 well, it won't work and shame on you :)
dbfromnewjerseyAuthor Commented:
What do you mean it won't work? I have a Julian Date field also that I can use instead of calender date. All I need to check is the highest value in the Julian Date for each CustomerNumber as well as the highest TransactionNumber. What are you talking about?
I'm saying that if the date is stored as a date, it has information down to the subsecond level.  If you store the date as a varchar2 (some people do this, i.e., varchar2(10)='02/15/1960') then you wont have that information.  I didn't mean to put you on the defensive.  I'm just trying to help (as is everyone else who is responding to you).

I don't know what DB you're using, but the SQL should be similar enough. I did the following on a System i using DB2 at V5R4:

1. I created a custtrans table with these rows --

000001   01/01/08        1
000001   01/01/08        2
000001   01/02/08        3
000001   01/04/08        4 < max cust 000001
000002   01/01/08        1
000002   01/05/08        2 < max cust 000002
000003   01/06/08        1
000003   01/08/08        2
000003   01/10/08        3
000003   01/12/08        4
000003   01/12/08        5 < max cust 000003

2. I created this view over the custtrans table --

             SELECT CUSTNBR , max( TRANNBR ) as Maxtran
                FROM custtrans
                GROUP BY CUSTNBR

3. Now, when I run this query --

 ==>  SELECT * FROM custtrans a, maxtrans b
                 and A.TRANNBR = B.MAXTRAN

4. ...I get this result --

000001   01/04/08        4   000001         4
000002   01/05/08        2   000002         2
000003   01/12/08        5   000003         5

The b.custnbr and b.maxtran columns aren't really necessary in the result, but you can ignore them or drop them in a few ways.

Since the maxtrans view should update naturally as rows are added/deleted/updated in the base custtrans table, the join of the base table to the view should always return the largest transaction number for each customer and the columns from that row. No need to populate another table.

However, that last query certainly could be used to populate another table on a daily basis.

Another way you might approach this problem is with a correlated subquery.  Here's the logic --

Given this dataset:

Customer Number                    Transaction Date                Transaction Number
        111                                          1/1/2000                                     1
        111                                       12/15/2002                                    2
        111                                         7/1/2005                                      3
        222                                        10/1/1999                                     1
        222                                          9/1/2007                                     2
        333                                         1/1/2007                                      1
        333                                         6/30/2007                                    2

You want the following as your data:

Customer Number                    Transaction Date                Transaction Number
        111                                         7/1/2005                                      3
        222                                          9/1/2007                                     2
        333                                         6/30/2007                                    2

So long as that is the case, try this SQL:

Select mt.customer_number, mt.transaction_date, mt.transaction_number
  from <table> mt
 where (mt.customer_number, mt.transaction_date) in (Select st.customer_number, Max(st.transaction_date)
                                                       from <table> st
                                                      where mt.customer_number = st.customer_number
                                                      group by st.customer_number)

This worked for me in my test environment
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Query Syntax

From novice to tech pro — start learning today.