Link to home
Start Free TrialLog in
Avatar of gary_j
gary_jFlag for United States of America

asked on

need unique rows

i have a table:
      order               char (8)
      seq                  tinyint
      hst_dt              long
      hst_tm             long
      promise_dt       long
      status              char (1)
      other columns
      id column         identity

This query: SELECT order,seq,hst_dt,hst_tm,promise_dt,status FROM table1 WHERE item = 'item_no'
                 ORDER BY hst_dt DESC, hst_tm DESC,order,seq

returns:
      order  seq  hst_dt          hst_tm  promise_dt  status
1     3        1     20040927     84308   20041011     N
2     2        1     20040918     90909  20040925     P
3     2        1     20040916     100000 20040925     P
4     2        1     20040912     111111 20040925     P
5     2        1     20040831     63148  20040925      P
6     2        1     20040821     85702   20040925     P
7     1        1     20040108     123456 20040228     P

How can I change my query such that only rows 1, 2, and 7 are returned.  I need only the most recent row per order + seq combination.

Thank you!
Avatar of mcmonap
mcmonap
Flag of United Kingdom of Great Britain and Northern Ireland image

Hi gary_j,

Perhaps something like this?:

SELECT order,seq,MAX(hst_dt),MAX(hst_tm),promise_dt,status
FROM table1
WHERE item = 'item_no'
GROUP BY order,seq,promise_dt,status
ORDER BY order,seq
Try this:

SELECT t1.order, t1.seq, t1.hst_dt, hst_tm, promise_dt, status
FROM table1 t1 INNER JOIN (
  SELECT order, seq, Max(hst_dt) as hst_dt
  FROM table1
  WHERE item = 'item_no'
  GROUP BY order, seq
) sub ON t1.order = sub.order AND t1.seq = sub.seq AND t1.hst_dt = sub.hst_dt
ORDER BY t1.hst_dt DESC, hst_tm DESC, order, seq
Avatar of ATAHAC
ATAHAC

Try this:

SELECT order,seq,hst_dt,hst_tm,promise_dt,status FROM table1 t1
WHERE item = 'item_no'
and not exists (select 1 from table1 t2 where t1.order = t2.order and t1.seq = t2.seq
and item = 'item_no' and t1.hst_dt <= t2.hst_dt and t1.hst_tm < t2.hst_tm)
ORDER BY hst_dt DESC, hst_tm DESC,order,seq
>SELECT order,seq,MAX(hst_dt),MAX(hst_tm),promise_dt,status

In SQL Server, this may return values for hst_dt and hst_tm from different rows.
It didn't in my tests, can you show me an example when it does?
Sure. TableA(pk, ColA, ColB, ColC)

pk, ColA, ColB, ColC
1, A, 10, 20
2, A, 20, 10

SELECT ColA, Max(ColB), Max(ColC) FROM TableA GROUP BY ColA

Results:
A, 20, 20
Avatar of gary_j

ASKER

mcmonap and ATAHAC answers did not return unique rows, although in both cases I saw no more than 2 rows per order+seq combination

I couldn't get jdlambert1 answer to run at all:
Server: Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'order'.
Server: Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'seq'.
Hi jdlambert1,

Thanks, I wasn't testing very well!
Ooops. Left the table aliases off order & seq in the ORDER BY:

SELECT t1.order, t1.seq, t1.hst_dt, hst_tm, promise_dt, status
FROM table1 t1 INNER JOIN (
  SELECT order, seq, Max(hst_dt) as hst_dt
  FROM table1
  WHERE item = 'item_no'
  GROUP BY order, seq
) sub ON t1.order = sub.order AND t1.seq = sub.seq AND t1.hst_dt = sub.hst_dt
ORDER BY t1.hst_dt DESC, hst_tm DESC, t1.order, t1.seq
Avatar of gary_j

ASKER

Thanks, jdlambert1, it works now, but there is one problem

If there are two rows with the same history date, it returns both of them.

Do I just add MAX(hst_tm) to the subquery?

Thank you
This means the combination of order, seq, and hst_dt don't uniquely identify a record. Just adding Max(hst_tm) to the subquery is extremely likely to provide misleading results.

There are several ways to handle this, most of them pretty ugly. Since you want the most recent record per order & seq, here's the best way, if ID is the table's primary key (and since it's defined with the Identity property):

SELECT t1.order, t1.seq, t1.hst_dt, hst_tm, promise_dt, status
FROM table1 t1 INNER JOIN (
  SELECT order, seq, Max(id) as id
  FROM table1
  WHERE item = 'item_no'
  GROUP BY order, seq
) sub ON t1.id = sub.id
ORDER BY t1.hst_dt DESC, hst_tm DESC, t1.order, t1.seq

This relies on the identity property, which ensures that the highest id is also the most recent, so it can substitute for the date column. It also relies on id being a primary key, which uniquely identifies each record.
If you have sql2k, or higher, Than I would use functions.....they simplify a lot your job...
Something like this:

SELECT
  a1.order,a1.seq,
  getFirst_hst_dt(order,seq), getFirst_hst_tm(order,seq), getFirst_promise_dt(order,seq),getFirst_status(order,seq)
FROM
  (select distinct order, seqfrom table1) a1
WHERE
  item = 'item_no'
ORDER BY hst_dt DESC, hst_tm DESC,order,seq

Now a sample function:

CREATE FUNCTION getFirst_hst_dt(@order int, @seq int)
RETURNS int
AS select top 1 hst_dt from table1 where order = @order and seq = @seq

---the same the other functions

Of course if you don't need the top 0, you can change the rule, like the top 1 but ordered in a certain order or so on....
EG:

CREATE FUNCTION getFirst_hst_dt(@order int, @seq int)
RETURNS int
AS select top 1 hst_dt from table1 where order = @order and seq = @seq order by order,seq

this seems to me the clearer one...because you can change easily teh rule by changing the functions content...but you don't have to affect the content of the other storep procedures that are using those functions....

good luck,
  xenon
xenon, gary needs the most recent record for each order & seq, so every value needs to come from the same row. While a single function could be created that could be applied here, I think it's likely to hurt performance compared to a sub-select, and you'd have to be very careful to ensure all values for each order & seq come from the same row.
As you may see in the function if you use the same conditions, the top 1 will return the values from the same row...so this is not an issue...

Regarding the speed...test and you will see that the top 1 will return very good speed,  much more better than using joins, especialy if you have index on the condition you use (in this case order and seq)....

I almost bet that the version with function and top 1 is much faster than the complex joins....

regards,
   xenon

PS: the problem is that the code is much larger, because you have more functions to implement...but these functions have the quality to be very eficient, because their content is a top 1  on an index...
CORRECTION:

my function was not ok..just the ideea...rewritten is:

CREATE FUNCTION getFirst_hst_dt(@order int, @seq int)
RETURNS int
AS
BEGIN
  declare @res int
  select top 1 @res = hst_dt from table1 where order = @order and seq = @seq order by order,seq
  return(@res)
END

....the rest is the same
Avatar of gary_j

ASKER

Because I'm a VB programmer and don't know much about SQL functions, and probably won't be able to institute them from VB code, I was trying to solve the remaining issue by concatenating date and time.

This statement works:
select  Max(convert(varchar,hst_dt) + right('000000' + convert(varchar,hst_tm),6)) from table1

but when I put it in the query:

SELECT t1.order, t1.seq, t1.hst_dt, hst_tm, promise_dt, status
FROM table1 t1 INNER JOIN (
  SELECT order, seq,  Max(convert(varchar,hst_dt) + right('000000' + convert(varchar,hst_tm),6))  as hst_dt
  FROM table1
  WHERE item = 'item_no'
  GROUP BY order, seq
) sub ON t1.order = sub.order AND t1.seq = sub.seq AND t1.hst_dt = sub.hst_dt
ORDER BY t1.hst_dt DESC, hst_tm DESC, t1.order, t1.seq

I get an error:  The conversion of the varchar value '20020911165750' overflowed an int column. Maximum integer value exceeded

Does anybody know why?  (Yes, both fields, date and time, are stored as integers)

Possibly due to not specifying a length for varchar. From BOL: "When n is not specified in a data definition or variable declaration statement, the default length is 1. When n is not specified with the CAST function, the default length is 30." This doesn't specify the CONVERT function, and I would have expected it to use the same default as CAST, which should be long enough, but in case it's different, try substituting this:

 SELECT order, seq,  Max(convert(varchar(100),hst_dt) + right('000000' + convert(varchar(100),hst_tm),6))  as hst_dt
Avatar of gary_j

ASKER

I specified 8 (8 digit date) and 6 (6 digit time) as a result of your suggestion, so the concatenated string is 20020911165750 as it was before, and got the same error.

Then I tried 100 for each and got the same string and the same error

I don't understand why it's acceptable as a "stand-alone" query but not as part of the sub-query.

ASKER CERTIFIED SOLUTION
Avatar of jdlambert1
jdlambert1
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of gary_j

ASKER

duh!  i should have caught that last little bit myself!

Thanks for all your help ...
I once spent all day troubleshooting code to ultimately find that a period was out of place. That still hurts.

That was before the days of EE, where it's easy to get a bunch of other eyes to give it a fresh look.