• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 467
  • Last Modified:

select oldest orderid for each of 1,000 customers?

Starting with a list containing 1-2,000 customer id's, I need to generate a query that gives me the orderid for each customer, of the earliest order associated with that customer.  So I have a two-part challenge:

The only way I can think of to retrieve the same list of customers is with a code-generated where clause listing each of the id's.  Doesn't smell right, so I thought I'd post it here for feedback.  S'ok?  Not ok? (SQL 2000)

Second, how to retrieve only the oldest orderid for each customer?  Where I also need to ignore orders that cost less than $10.

The fields are:
customers.customer_code (pk, varchar)
order_entry.customer_code (fk)
order_entry.job_number (pk, int)
order_entry.entry_date (datetime)
order_entry.totalprice (decimal)

Help much appreciated-
0
ottenm
Asked:
ottenm
  • 5
  • 3
  • 2
2 Solutions
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:

select c.*, o.*
from customers c
left join order_entry o
  on o.customer_code = c.customer_code
 and o.job_number = ( select top 1 i.job_number from order_entry i where i.customer_code = c.customer_code and i.totalprice >= 10 order by i.entry_date desc )
0
 
Scott PletcherSenior DBACommented:
If the job_numbers are always in order -- that is, a higher job number is always a later order -- you can do this:

SELECT c.customer_code, MAX(oe.job_number) AS [Latest Order]
FROM customers c
INNER JOIN order_entry oe ON c.customer_code = oe.customer_code
WHERE oe.totalprice >= 10
GROUP BY c.customer_code
ORDER BY c.customer_code
0
 
ottenmAuthor Commented:
Yes, both solutions will work because the job_numbers are increasing.  My mistake for leaving that out of the description.

Any concerns using the same query (or any query for that matter) with 1-2,000 tests in the where clause?  Like:

SELECT c.customer_code, MAX(oe.job_number) AS [Latest Order]
FROM customers c
INNER JOIN order_entry oe ON c.customer_code = oe.customer_code
WHERE oe.totalprice >= 10 and (c.customer_code = 'Cust1' OR c.customer_code = 'Cust2'
     OR c.customer_code = 'Cust3' OR c.customer_code = 'Cust4' etc etc etc)
GROUP BY c.customer_code
ORDER BY c.customer_code
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
I do not "like" the suggestion of ScottPletcher (it is not "wrong"! )
because it uses the "implicit" meaning of the higher job_number value for "older" entry_dates, and also only returns the job_number, while my suggestion returns the entire order_entry row ...
0
 
ottenmAuthor Commented:
Yeah, it's the old "but 98% of the time, the numbers *do* increase!" and then when it really matters, you hit the other 2% and have sad eyes in the boss' office.  Similarly for the job_number, too.  I really just need the job_number right now, but there are some more reports coming that might build off this one.

What about my 2,000-entry WHERE clause?  No complaints there?
0
 
Scott PletcherSenior DBACommented:
>> Any concerns using the same query (or any query for that matter) with 1-2,000 tests in the where clause? <<

Yes; that will not be good for performance.  Instead, create a table of the values with a clustered, unique index and INNER JOIN that table to the query.  SQL is optimized for joins.


Speaking of optimization, I wrote my query to satisfy the q as asked.  The original q asked for the orderid, not the entire row.  So, I was aiming for the most efficient way to do that.  Naturally my query could be modified to return the entire row, but that would reduce the efficiency, requiring joins and/or subqueries, as are already present in angel's query.  That query requires a join *and* a *correlated* subquery *with a sort*.

As to the job number, what else would a higher job number mean?  Particularly if, as is likely, an identity column is used to assign the number.  
0
 
Scott PletcherSenior DBACommented:
My other concern with datetime is that it may not be unique enough, esp. if edi or other electronic orders are used.  If multiple orders have the same datetime, which one is selected by the TOP 1 is random; you could get different results for each run.
0
 
Scott PletcherSenior DBACommented:
Is job_number an identity value?  If so, I fail to see how a row inserted later could ever have a lower number, but if you feel unsure, that's fine.

Sorry, I think like a DBA, and as such, I don't do hundred, thousands or possibly millions of extra I/Os -- with the inevitable big hit on performance -- unless I genuinely *need* to.
0
 
ottenmAuthor Commented:
These are some extremely helpful comments from both of you, and thanks for not only providing superb solutions but also taking the time to clarify the trade offs.

As for job numbers, for various reasons we sometimes (2% of the time) have jobs where we need to reset the entry_date to a later date, giving us a 'younger' job with a lower job number.

Again, tremendous help and much appreciated.
0
 
Scott PletcherSenior DBACommented:
>> where we need to reset the entry_date to a later date <<

Wow; OK, I've never seen that done before in a business setting, where the "entry_date" was changed to a later date.  That would explain the 2%.  [You really should rename the column, tho, since it's technically no longer an entry_date.]
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 5
  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now