how to compare two consecutive rows in the same table in sql sever

i have a HUGE table which I need to query.
There are two rows (which are separated by a Unique identity), common 'category' column, a  'date' and a 'transaction code' column

now i need to extract 'category' when between two consecutive rows then  'date' difference is less that 1 day and the first row has 'transaction code' value as 'A' and the following row has value 'R' then that qualifies as a candidate

how do i do it?
manivineetAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

bchoorCommented:
few things:
1. is the identity an integer? if so, can you rely on a greater value to being the 2nd row
2. if you have four rows, would you look at 1,2 and 3,4, or would you look at 1,2, and 2,3 and 3,4?
3. can you provide maybe 10 records so we can see the dataset

BC
0
manivineetAuthor Commented:
@bchoor:
1. No the identity is not an integer
2. 1,2  2,3 3,4
3. No, because of security reasons.
0
Auric1983Commented:

You can use the Row_Number() function in SQL 2005 and 2008

for example

select Row_Number(),Name,Address from Contacts

0
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

cyberkiwiCommented:
between two consecutive rows.
Consecutive = ordered by date?
0
BodestoneCommented:
You could do:
--number the fields in date order
;WITH cte AS
{
   SELECT *, ROW_NUMBER() OVER(PARTITION BY DateField ORDER BY [Order] DESC) As [rowNum]
   FROM myTable
}
--JOIN on consecutive rows with matchign criteria
select cte1.uniqueID AS [firstID],
       cte1.category AS [firstCategory],
       cte2.uniqueID AS [secondID]
       cte2.category AS [secondCategory]
from cte1
JOIN cte2
  ON cte2.rowNum = cte1.rowNum + 1
  AND DATEDIFF(dd,cte1.dateField,cte2.dateField) < 1
  AND cte1.[transaction code]  = 'A'
  AND cte2.[transaction code]  = 'r'

Open in new window

0
cyberkiwiCommented:
select a.*, b.*
from tbl a
outer apply (
  select top 1 * from tbl c
  where c.category = a.category and c.date > a.date
  order by c.date asc) b
where a.transactioncode = 'A' and b.transactioncode = 'R'
  and dateadd(d, 1, A.date) > b.date
0
Raja Jegan RSQL Server DBA & Architect, EE Solution GuideCommented:
Hope this is what you need:
;with cte as (
SELECT unique_id, category, [date], [transaction], ROW_NUMBER() over ( partition by category order by date) rnum
From ur_table) 
select t1.unique_id, t1.category, t1.[date], t1.[transaction]
from cte t1 left join cte t2 on t1.category = t2.category and t1.rnum = t2.rnum - 1
where datediff(dd, t1.[date], t2.[date]) = 1 
and t1.[transaction] = 'A'
and t2.[transaction] = 'R'

Open in new window

0
cyberkiwiCommented:
@bodestone

  AND DATEDIFF(dd,cte1.dateField,cte2.dateField) < 1

datediff(dd, '20100101 23:59', '20100102 00:01') => 1
0
Raja Jegan RSQL Server DBA & Architect, EE Solution GuideCommented:
Small mistake:
;with cte as (
SELECT unique_id, category, [date], [transaction], ROW_NUMBER() over ( partition by category order by date) rnum
From ur_table) 
select t1.unique_id, t1.category, t1.[date], t1.[transaction]
from cte t1 left join cte t2 on t1.category = t2.category and t1.rnum = t2.rnum - 1
where datediff(dd, t1.[date], t2.[date]) <= 1 
and t1.[transaction] = 'A'
and t2.[transaction] = 'R'

Open in new window

0
Mark WillsTopic AdvisorCommented:
Well if they are consecutive rows, then doing any other "order by" will take them out of sequence... My understanding and you have reiterated is you want to compare row 1 with row 2 then row 2 with row 3 etc...

So, if using the row_number() function then it should be simply ordered by unique_id to keep the sequence. Assuming of course the "unique identity" is an identity column and not just a guid (unless added  using new sequential id).

What isnt clear is exactly what you want to extract.... You say category, but assume you want more than that....

There are two ways, but based on table size, we need to consider the impacts of that table size.

So, looking at CTE and Outer apply, and then a fairly straight subquery... The last one on a small data set seems to be most efficient. Of course it is based on the above assumptions, and also assumes there are indexes to suit (and we should probably talk about what indexes are available).





-- example 1 with Outer Apply
select a.*, b.*
from #tbl a
outer apply (
  select top 1 * from #tbl c
  where c.category = a.category 
  and c.date > a.date
  and c.unique_id > a.unique_id
  order by c.unique_id) b
where a.transactioncode = 'A' and b.transactioncode = 'R'
  and datediff(d, A.date, B.date) = 1

-- example 2 with CTE query 1

;with cte as (
SELECT unique_id, category, date, transactioncode, ROW_NUMBER() over (order by unique_id) rn
From #tbl) 
select t1.unique_id, t1.category, t2.unique_id
from cte t1 
left join cte t2 on t1.category = t2.category and t1.rn = t2.rn - 1
where datediff(dd, t1.[date], t2.[date]) = 1 
and t1.transactioncode = 'A'
and t2.transactioncode = 'R'

-- example 3 with simple subquery

select * from
(
select a.unique_id, a.category, (select min(unique_id) from #tbl c
             where c.category = a.category 
             and datediff(d, a.date, c.date) = 1
             and c.transactioncode = 'R'
             and c.unique_id > a.unique_id)  as next_unique_id
  
from #tbl a
where a.transactioncode = 'A'
) b 
where next_unique_id is not NULL

-- example 4 the above but as a CTE query

;with cte2 as (
select a.unique_id, a.category, (select min(unique_id) from #tbl c
             where c.category = a.category 
             and c.transactioncode = 'R'
             and datediff(d, a.date, c.date) = 1
             and c.unique_id > a.unique_id)  as next_unique_id
  
from #tbl a
where a.transactioncode = 'A'
) 
select * from cte2 where next_unique_id is not NULL

Open in new window

0
Mark WillsTopic AdvisorCommented:
ooops that example 1 above should have been :


select a.*, b.*
from #tbl a
outer apply (
select top 1 transactioncode,date from #tbl c
where c.category = a.category
and c.unique_id > a.unique_id
order by c.unique_id) b
where a.transactioncode = 'A'
and b.transactioncode = 'R'
and datediff(d, A.date, b.date) = 1
0
Mark WillsTopic AdvisorCommented:
And maybe that CTE query is best as :


;with cte2 as (
select a.unique_id, a.category, a.date, (select min(unique_id) from #tbl c where c.unique_id > a.unique_id) as next_unique_id
from #tbl a
where a.transactioncode = 'A'
)
select *
from cte2 a
inner join #tbl b on a.next_unique_id = b.unique_id and a.category = b.category
where datediff(d, a.date, b.date) = 1
and b.transactioncode = 'R'

Again, a lot of it depends in various indexes....
0
Mark WillsTopic AdvisorCommented:
OK, so got a bit bored and decided to add several hundred thousand rows to a table.

Not too sure of distribution of values over the range of categories, transactioncodes and dates, so made it pretty even throughout the range of data.

Then tried half a dozen different approaches, and analysed the data / table structures.

Assuming the unique identity was a clustered primary key (yeah big assumption), suggest creating the folowwing index :

CREATE NONCLUSTERED INDEX [idx_my_table_trans_cat_id] ON [dbo].[my_table]
(
      [transactioncode] ASC,
      [category] ASC,
      [unique_id] ASC
)
INCLUDE ( [date])
WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF)

and then the query :

;with cte2 as
(
  select a.unique_id, a.category, a.date,(select min(unique_id) from my_table c where c.unique_id > a.unique_id)  as next_unique_id
  from my_table a
  where a.transactioncode = 'A'
)
select *
from cte2 a
inner join my_table b on a.next_unique_id = b.unique_id and a.category = b.category and b.transactioncode = 'R'
where datediff(d, a.date, b.date) = 1


seems to work really well using the covering index without going back to raw data...

But then there are a few assumptions (as per prior posting as well), and really does need to be cleared up, though, pretty happy with the results given those assumptions thus far.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
manivineetAuthor Commented:
thanks
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server 2008

From novice to tech pro — start learning today.