SQL Server find next transaction by date

I have a transaction table as explained in the attached PDF. I need to find all receipts and then the corresponding shipments and report them on the same line.

Please see PDF for full details.

Find-Next-Trans.pdf
mossmisAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ralmadaCommented:
To start with, how can we tell these are two different transactions?

SERIAL_NUMBER	TRANS_TYPE	ORG	TRANS_TO	TRANS_FROM	TRANSACTION_DATE
A1234	RECEIPT	IT	MAIN_WHSE	CUST123	12/17/2012
A1234	SHIPMENT	IT	CUST888	MAIN_WHSE	12/22/2012
A1234	RECEIPT	IT	MAIN_WHSE	CUST888	01/04/2013

Open in new window


You should have an additional column identifying them, something like

Tran_ID	SERIAL_NUMBER	TRANS_TYPE	ORG	TRANS_TO	TRANS_FROM	TRANSACTION_DATE
1	A1234	RECEIPT	IT	MAIN_WHSE	CUST123	12/17/2012
1	A1234	SHIPMENT	IT	CUST888	MAIN_WHSE	12/22/2012
2	A1234	RECEIPT	IT	MAIN_WHSE	CUST888	01/04/2013

Open in new window


That will make any query much easier.

Then we can start talking about indexes.
0
Jared_SCommented:
This query pulls the result set you're after.

select isnull(r.serial_number, s.serial_number) as [Serial Number],
r.trans_from as [Received From], r.transaction_date as [Received Date],
s.trans_to as [Shipped To], s.transaction_date as [Ship Date]
from
(select serial_number,
trans_from ,
transaction_date
from sn_transaction where trans_type='receipt') r
full outer  join
(select serial_number,
trans_to,
transaction_date
from sn_transaction where trans_type='shipment') s
on r.serial_number = s.serial_number
and convert(datetime,r.transaction_date) <= convert(datetime,s.transaction_date)

I would put a clustered index on the transaction date and serial number.
0
ralmadaCommented:
>>This query pulls the result set you're after. <<

Hmmm, he has over 3 Million records, that query will be quite a dog.

Not to mention that he has a table design issue by not having an identifier that could couple the transactions.
0
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

mossmisAuthor Commented:
Jared, I will give your query a try shortly.

Ralmada, I do not have a unique identifier. I'm relying on the uniqueness of the serial number and date. The date is a date/time (to the millisecond). I am open to sugesstions here to improve on this also. Is there a way to have sql server automatically populate a new field in the system that is a unique identifier without me programming it in my applications?
0
mossmisAuthor Commented:
FYI,

I have two servers available to my disposal.

TEST: 2 Intel Xeon Quad Core CPUS @ 3ghz and 24gb ram

PROD: 4 Intel Xeon Six Cores @ 2.66ghz each and 64gb ram (32 usable (need to upgrade to enterprise OS))

I have resources, but I would like NOT to write code/apps carelessly just because I got the power to run queries.
0
ralmadaCommented:
>>Is there a way to have sql server automatically populate a new field in the system that is a unique identifier without me programming it in my applications?<<

Certainly it can be done. but it should be a one time thing. Going forward that identifier should be assigned from the application itself. Let me know if this is something your willing to consider.
0
ralmadaCommented:
for instance you could try


select
row_number() over (partition by Serial_Number, trans_to order by transaction_date) transid,
*
from sn_transaction

And see if this groups them correctly, then we can try convert that to an update.
0
Jared_SCommented:
Aren't you worried that altering his table may cause issues with the application?

If the report is ran every 6 months, then why not limit the date range queried? Using the index suggested, I would expect the sql optimizer would perform an index seek instead of a scan, making the query quite a bit "cheaper".  

I still don't think it's going to melt faces, but I think it's a viable solution.
How would you write the query if an identity was added?
0
ralmadaCommented:
>>altering his table may cause issues with the application?<<

Not really.
0
mossmisAuthor Commented:
ralmada

What should I be seeing happen with your transid query?
0
mossmisAuthor Commented:
Jared

I'm still tinkering with your query, currently it isn't lining up anything in one line together. I'm trying to see if I messed up renaming column names.

I also currently have about 20 million lines so I'm trying to limit the query by a small date range to test.
0
Jared_SCommented:
I just wrote it using the test data you posted, so the date range definitely will need limited with that many rows.

The query may perform better with Trans_Type included in the clustered index.
0
Scott PletcherSenior DBACommented:
The table itself could easily assign a unique, integer IDENTITY value to every column with NO change in the application.

But whether that's not done or not, what's best for this specific query (but not necessarily for the table as a whole) is a clustered index on ( TRANSACTION_DATE, SERIAL_NUMBER), as Jared_S noted earlier.

In general the table as a whole would process better with this type of data with that clustered index, but I'd review the existing SQL stats before making that change just to be sure.

If you decide to include the trans_type in the index, use a char(1) code ('R'|'S') rather than the full description.


Btw, NEVER compare dates this way:

convert(datetime,r.transaction_date) <= convert(datetime,s.transaction_date)

There is always a faster way to compare dates than CONVERT.
0
Jared_SCommented:
Scott, am I way off in thinking that without knowing how this application is coded and/or how version updates are done, that making alterations to the table structure is somewhat of a loose-cannon approach?
I'm not suggesting a change would be needed in the application to create the identity column on the database, just that the new table structure COULD impact the application.
0
Scott PletcherSenior DBACommented:
I'm assuming that the application is written to include actual column names for all SQL statements, not just "*".  In that case, adding an identity should be transparent to the app, although there's no 100% guarantee they're not doing something sloppy somewhere where it causes an issue.

But as I noted above, the top priority should be to review the current indexes and determine if a clustered index of ( TRANSACTION_DATE, SERIAL_NUMBER ) is best, because that would do far more for this query than any identity column ever could.
0
mossmisAuthor Commented:
The indexing issue will get a little more complex as the table I'm using  actually has double the fields than I'm revealing. They just aren't a crucial part of what I'm first trying to accomplish. The other fields are descriptive to the serial number like make/model/type. Also the fields trans_to and trans_from actually refer to IDs that I need to lookup in other tables for the actual name and addresses.

Thanks for mentioning the indexing problem I will have. I did find a report that looks up data access against a database and make recommendations for indexing that is helpful, but I might ask you later once I have the whole picture ready.

http://indexanalysis.codeplex.com/
0
mossmisAuthor Commented:
Jared,

Please review the excel file I attached. I illustrated what your query is doing.
Query-Issue.xlsx
0
Jared_SCommented:
The query doesn't work anymore because
multiple records in this data set validate for
"shipped serial number = received serial number" and "received date <= shipped date".

Going by the data in the samples, a windowing function like row_number or rank could be used, but would slow things down. It would also make the assumption that the shipment closest to the receipt date with a shared serial number is the appropriate match.

You mentioned there are other fields in the table. Is there anything else available that would help identify which shipment and receipt belong together?
0
Jared_SCommented:
If you don't have additional fields that can be of any service, and the assumption that the shortest time span between a shipment and receipt of a serial_number indicates that they belong together, then here is a query that you can test for speed.

select isnull(r.serial_number, s.serial_number) as [Serial Number],
r.trans_from as [Received From], r.transaction_date as [Received Date],
s.trans_to as [Shipped To], s.transaction_date as [Ship Date]
from
(select 
row_number() over (partition by serial_number order by transaction_date) as trial, 
serial_number,
trans_from ,
transaction_date 
from sn_transaction 
where trans_type='receipt'
) r
full outer  join
(select 
row_number() over (partition by serial_number order by transaction_date) as trial,
serial_number,
trans_to,
transaction_date 
from sn_transaction
where trans_type='shipment'
) s
on r.serial_number = s.serial_number 
and r.trial = s.trial 

Open in new window

0
mossmisAuthor Commented:
Here is a full list of fields

Serial_Number (Serial number of Product)
Description1 (Legacy Field from older system - New Transactions don't use this)
Description2 (Legacy Field from older system - New Transactions don't use this)
Transaction_Date (SYSDATE time stamp at time of transaction)
Trans_Type (Trans type ie. Receipt, Shipment, INTERNAL - Shipments and receipts are only used in report)
ORG (We have 3 companies using same system so I separate data by this field for each company)
ITEM_STATE (Informative field describing current production state of product)
PC (computer doing transaction)
SYS_USER (system user doing transaction)
NOTES (User added notes if necessary on transaction)
TRANS_TO (HOLDs ID of customer/vendor or Internal Warehouse name)
TRANS_FROM (HOLDs ID of customer/vendor or Internal Warehouse name)
MANUFACTURER (manufacturer of product)
MODEL (model of Product)

Essentially, no other fields will help this report. And yes, the closest date in order should be matched up.
0
Jared_SCommented:
Check out the query above that uses row_number...

Everybody has been talking about adding an identity column to the table to take some heavy lifting off of the sql engine.

An identity column just increments each row added to the table by 1 (unless you tell it to use another number).

Before you try using that, you'll want to make sure that it isn't going to affect the way your application runs (try it in a test environment if you have one).
You'll also want to think about the work flow - if shipment and receipt records for the same serial number weren't entered sequentially it could potentially give you bad data.
0
mossmisAuthor Commented:
Jared,

I have about 12 separate apps that read and write to the database of about 30 tables. This transaction table is by far the most important. I have gone through my programming and I am 99.9% positive that duplicate shipments or receipts can't be made. I  have a primary data table that holds current data for all serial numbers ever sold and that table has a primary key of serial number.

Adding a column to the transaction table shouldn't be an issue and the apps should function properly. If I have to program all my apps again to insert to a identity column, that could be a problem.

Your modified query worked well and only took a few minutes. I have one other addition to this: The report is run by a date parameters. So , I'm try to test the date range between 01-jan-13 and 08-jan-13. Any line with a shipment or receipt must show. Where can I add this in the code?

I tried at the end and it returned more than I wanted:

and ((r.transaction_date between '01-jan-13' and '08-jan-13') or (s.transaction_date between '01-jan-13' and '08-jan-13'))
0
Jared_SCommented:
Move it up into the derived tables like this (i've got my dates formatted differently on my server):

declare @startdate as datetime
declare @enddate as datetime

set @startdate = '01/01/2013'
set @enddate = '01/08/2013'

select isnull(r.serial_number, s.serial_number) as [Serial Number],
r.trans_from as [Received From], r.transaction_date as [Received Date],
s.trans_to as [Shipped To], s.transaction_date as [Ship Date]
from
(select
row_number() over (partition by serial_number order by transaction_date) as trial,
serial_number,
trans_from ,
transaction_date
from sn_transaction
where trans_type='receipt'
and transaction_date between @startdate and @enddate
) r
full outer  join
(select
row_number() over (partition by serial_number order by transaction_date) as trial,
serial_number,
trans_to,
transaction_date
from sn_transaction
where trans_type='shipment'
and transaction_date between @startdate and @enddate
) s
on r.serial_number = s.serial_number
and r.trial = s.trial
0
mossmisAuthor Commented:
I did try it there but this is what happens. Say I had the following data

SN             TRANS_TYPE       TRANS_TO         TRANS_FROM     TRANSACTION_DATE
A1234      RECEIPT              MAIN_WHSE         VENDOR2             12/21/2012
A1234      SHIPMENT       VENDOR6         MAIN_WHSE             1/5/2013

What I want to get is this

SN              Received From         RECEIVED          Shipped To        Ship Date
A1234        VENDOR2                  12/21/12           VENDOR6          1/5/13


What the query does is this
SN              Received From         RECEIVED          Shipped To        Ship Date
A1234        (null)                                                   VENDOR6          1/5/13


Because the parameter date doesn't fall within the date I provided, it nulls it out. Whether Date Shipped OR Received falls within date patameter, my goal is to return the full line.
0
Jared_SCommented:
Can you run this and give me an example of the extra records you're getting?


declare @startdate as datetime
declare @enddate as datetime

set @startdate = '01/01/2013'
set @enddate = '01/08/2013'

select isnull(r.serial_number, s.serial_number) as [Serial Number],
s.trial, r.trial,
r.trans_from as [Received From], r.transaction_date as [Received Date],
s.trans_to as [Shipped To], s.transaction_date as [Ship Date]
from
(select
row_number() over (partition by serial_number order by transaction_date) as trial,
serial_number,
trans_from ,
transaction_date
from sn_transaction
where trans_type='receipt'
) r
full outer  join
(select
row_number() over (partition by serial_number order by transaction_date) as trial,
serial_number,
trans_to,
transaction_date
from sn_transaction
where trans_type='shipment'
) s
on r.serial_number = s.serial_number
and r.trial = s.trial
where
r.transaction_date between @startdate and @enddate
or
s.transaction_date between @startdate and @enddate
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
mossmisAuthor Commented:
For some reason mine was returning all rows in the table associated with my ORG, 1.5 million when I'm only supposed to have 16,000 rows.

I didn't have the outer join you put in. what you provided above works rather well and I think  you nailed it.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
SSRS

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.