Solved

SQL Server find next transaction by date

Posted on 2013-01-07
26
414 Views
Last Modified: 2013-01-09
I have a transaction table as explained in the attached PDF. I need to find all receipts and then the corresponding shipments and report them on the same line.

Please see PDF for full details.

Find-Next-Trans.pdf
0
Comment
Question by:mossmis
  • 10
  • 9
  • 5
  • +1
26 Comments
 
LVL 41

Expert Comment

by:ralmada
ID: 38752097
To start with, how can we tell these are two different transactions?

SERIAL_NUMBER	TRANS_TYPE	ORG	TRANS_TO	TRANS_FROM	TRANSACTION_DATE
A1234	RECEIPT	IT	MAIN_WHSE	CUST123	12/17/2012
A1234	SHIPMENT	IT	CUST888	MAIN_WHSE	12/22/2012
A1234	RECEIPT	IT	MAIN_WHSE	CUST888	01/04/2013

Open in new window


You should have an additional column identifying them, something like

Tran_ID	SERIAL_NUMBER	TRANS_TYPE	ORG	TRANS_TO	TRANS_FROM	TRANSACTION_DATE
1	A1234	RECEIPT	IT	MAIN_WHSE	CUST123	12/17/2012
1	A1234	SHIPMENT	IT	CUST888	MAIN_WHSE	12/22/2012
2	A1234	RECEIPT	IT	MAIN_WHSE	CUST888	01/04/2013

Open in new window


That will make any query much easier.

Then we can start talking about indexes.
0
 
LVL 12

Expert Comment

by:Jared_S
ID: 38752180
This query pulls the result set you're after.

select isnull(r.serial_number, s.serial_number) as [Serial Number],
r.trans_from as [Received From], r.transaction_date as [Received Date],
s.trans_to as [Shipped To], s.transaction_date as [Ship Date]
from
(select serial_number,
trans_from ,
transaction_date
from sn_transaction where trans_type='receipt') r
full outer  join
(select serial_number,
trans_to,
transaction_date
from sn_transaction where trans_type='shipment') s
on r.serial_number = s.serial_number
and convert(datetime,r.transaction_date) <= convert(datetime,s.transaction_date)

I would put a clustered index on the transaction date and serial number.
0
 
LVL 41

Expert Comment

by:ralmada
ID: 38752396
>>This query pulls the result set you're after. <<

Hmmm, he has over 3 Million records, that query will be quite a dog.

Not to mention that he has a table design issue by not having an identifier that could couple the transactions.
0
 

Author Comment

by:mossmis
ID: 38752506
Jared, I will give your query a try shortly.

Ralmada, I do not have a unique identifier. I'm relying on the uniqueness of the serial number and date. The date is a date/time (to the millisecond). I am open to sugesstions here to improve on this also. Is there a way to have sql server automatically populate a new field in the system that is a unique identifier without me programming it in my applications?
0
 

Author Comment

by:mossmis
ID: 38752521
FYI,

I have two servers available to my disposal.

TEST: 2 Intel Xeon Quad Core CPUS @ 3ghz and 24gb ram

PROD: 4 Intel Xeon Six Cores @ 2.66ghz each and 64gb ram (32 usable (need to upgrade to enterprise OS))

I have resources, but I would like NOT to write code/apps carelessly just because I got the power to run queries.
0
 
LVL 41

Expert Comment

by:ralmada
ID: 38752544
>>Is there a way to have sql server automatically populate a new field in the system that is a unique identifier without me programming it in my applications?<<

Certainly it can be done. but it should be a one time thing. Going forward that identifier should be assigned from the application itself. Let me know if this is something your willing to consider.
0
 
LVL 41

Expert Comment

by:ralmada
ID: 38752570
for instance you could try


select
row_number() over (partition by Serial_Number, trans_to order by transaction_date) transid,
*
from sn_transaction

And see if this groups them correctly, then we can try convert that to an update.
0
 
LVL 12

Expert Comment

by:Jared_S
ID: 38752660
Aren't you worried that altering his table may cause issues with the application?

If the report is ran every 6 months, then why not limit the date range queried? Using the index suggested, I would expect the sql optimizer would perform an index seek instead of a scan, making the query quite a bit "cheaper".  

I still don't think it's going to melt faces, but I think it's a viable solution.
How would you write the query if an identity was added?
0
 
LVL 41

Expert Comment

by:ralmada
ID: 38752673
>>altering his table may cause issues with the application?<<

Not really.
0
 

Author Comment

by:mossmis
ID: 38752733
ralmada

What should I be seeing happen with your transid query?
0
 

Author Comment

by:mossmis
ID: 38752750
Jared

I'm still tinkering with your query, currently it isn't lining up anything in one line together. I'm trying to see if I messed up renaming column names.

I also currently have about 20 million lines so I'm trying to limit the query by a small date range to test.
0
 
LVL 12

Expert Comment

by:Jared_S
ID: 38752831
I just wrote it using the test data you posted, so the date range definitely will need limited with that many rows.

The query may perform better with Trans_Type included in the clustered index.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 38753081
The table itself could easily assign a unique, integer IDENTITY value to every column with NO change in the application.

But whether that's not done or not, what's best for this specific query (but not necessarily for the table as a whole) is a clustered index on ( TRANSACTION_DATE, SERIAL_NUMBER), as Jared_S noted earlier.

In general the table as a whole would process better with this type of data with that clustered index, but I'd review the existing SQL stats before making that change just to be sure.

If you decide to include the trans_type in the index, use a char(1) code ('R'|'S') rather than the full description.


Btw, NEVER compare dates this way:

convert(datetime,r.transaction_date) <= convert(datetime,s.transaction_date)

There is always a faster way to compare dates than CONVERT.
0
Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

 
LVL 12

Expert Comment

by:Jared_S
ID: 38753342
Scott, am I way off in thinking that without knowing how this application is coded and/or how version updates are done, that making alterations to the table structure is somewhat of a loose-cannon approach?
I'm not suggesting a change would be needed in the application to create the identity column on the database, just that the new table structure COULD impact the application.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 38755187
I'm assuming that the application is written to include actual column names for all SQL statements, not just "*".  In that case, adding an identity should be transparent to the app, although there's no 100% guarantee they're not doing something sloppy somewhere where it causes an issue.

But as I noted above, the top priority should be to review the current indexes and determine if a clustered index of ( TRANSACTION_DATE, SERIAL_NUMBER ) is best, because that would do far more for this query than any identity column ever could.
0
 

Author Comment

by:mossmis
ID: 38755460
The indexing issue will get a little more complex as the table I'm using  actually has double the fields than I'm revealing. They just aren't a crucial part of what I'm first trying to accomplish. The other fields are descriptive to the serial number like make/model/type. Also the fields trans_to and trans_from actually refer to IDs that I need to lookup in other tables for the actual name and addresses.

Thanks for mentioning the indexing problem I will have. I did find a report that looks up data access against a database and make recommendations for indexing that is helpful, but I might ask you later once I have the whole picture ready.

http://indexanalysis.codeplex.com/
0
 

Author Comment

by:mossmis
ID: 38755470
Jared,

Please review the excel file I attached. I illustrated what your query is doing.
Query-Issue.xlsx
0
 
LVL 12

Expert Comment

by:Jared_S
ID: 38755832
The query doesn't work anymore because
multiple records in this data set validate for
"shipped serial number = received serial number" and "received date <= shipped date".

Going by the data in the samples, a windowing function like row_number or rank could be used, but would slow things down. It would also make the assumption that the shipment closest to the receipt date with a shared serial number is the appropriate match.

You mentioned there are other fields in the table. Is there anything else available that would help identify which shipment and receipt belong together?
0
 
LVL 12

Expert Comment

by:Jared_S
ID: 38755894
If you don't have additional fields that can be of any service, and the assumption that the shortest time span between a shipment and receipt of a serial_number indicates that they belong together, then here is a query that you can test for speed.

select isnull(r.serial_number, s.serial_number) as [Serial Number],
r.trans_from as [Received From], r.transaction_date as [Received Date],
s.trans_to as [Shipped To], s.transaction_date as [Ship Date]
from
(select 
row_number() over (partition by serial_number order by transaction_date) as trial, 
serial_number,
trans_from ,
transaction_date 
from sn_transaction 
where trans_type='receipt'
) r
full outer  join
(select 
row_number() over (partition by serial_number order by transaction_date) as trial,
serial_number,
trans_to,
transaction_date 
from sn_transaction
where trans_type='shipment'
) s
on r.serial_number = s.serial_number 
and r.trial = s.trial 

Open in new window

0
 

Author Comment

by:mossmis
ID: 38755916
Here is a full list of fields

Serial_Number (Serial number of Product)
Description1 (Legacy Field from older system - New Transactions don't use this)
Description2 (Legacy Field from older system - New Transactions don't use this)
Transaction_Date (SYSDATE time stamp at time of transaction)
Trans_Type (Trans type ie. Receipt, Shipment, INTERNAL - Shipments and receipts are only used in report)
ORG (We have 3 companies using same system so I separate data by this field for each company)
ITEM_STATE (Informative field describing current production state of product)
PC (computer doing transaction)
SYS_USER (system user doing transaction)
NOTES (User added notes if necessary on transaction)
TRANS_TO (HOLDs ID of customer/vendor or Internal Warehouse name)
TRANS_FROM (HOLDs ID of customer/vendor or Internal Warehouse name)
MANUFACTURER (manufacturer of product)
MODEL (model of Product)

Essentially, no other fields will help this report. And yes, the closest date in order should be matched up.
0
 
LVL 12

Expert Comment

by:Jared_S
ID: 38755987
Check out the query above that uses row_number...

Everybody has been talking about adding an identity column to the table to take some heavy lifting off of the sql engine.

An identity column just increments each row added to the table by 1 (unless you tell it to use another number).

Before you try using that, you'll want to make sure that it isn't going to affect the way your application runs (try it in a test environment if you have one).
You'll also want to think about the work flow - if shipment and receipt records for the same serial number weren't entered sequentially it could potentially give you bad data.
0
 

Author Comment

by:mossmis
ID: 38756319
Jared,

I have about 12 separate apps that read and write to the database of about 30 tables. This transaction table is by far the most important. I have gone through my programming and I am 99.9% positive that duplicate shipments or receipts can't be made. I  have a primary data table that holds current data for all serial numbers ever sold and that table has a primary key of serial number.

Adding a column to the transaction table shouldn't be an issue and the apps should function properly. If I have to program all my apps again to insert to a identity column, that could be a problem.

Your modified query worked well and only took a few minutes. I have one other addition to this: The report is run by a date parameters. So , I'm try to test the date range between 01-jan-13 and 08-jan-13. Any line with a shipment or receipt must show. Where can I add this in the code?

I tried at the end and it returned more than I wanted:

and ((r.transaction_date between '01-jan-13' and '08-jan-13') or (s.transaction_date between '01-jan-13' and '08-jan-13'))
0
 
LVL 12

Expert Comment

by:Jared_S
ID: 38756555
Move it up into the derived tables like this (i've got my dates formatted differently on my server):

declare @startdate as datetime
declare @enddate as datetime

set @startdate = '01/01/2013'
set @enddate = '01/08/2013'

select isnull(r.serial_number, s.serial_number) as [Serial Number],
r.trans_from as [Received From], r.transaction_date as [Received Date],
s.trans_to as [Shipped To], s.transaction_date as [Ship Date]
from
(select
row_number() over (partition by serial_number order by transaction_date) as trial,
serial_number,
trans_from ,
transaction_date
from sn_transaction
where trans_type='receipt'
and transaction_date between @startdate and @enddate
) r
full outer  join
(select
row_number() over (partition by serial_number order by transaction_date) as trial,
serial_number,
trans_to,
transaction_date
from sn_transaction
where trans_type='shipment'
and transaction_date between @startdate and @enddate
) s
on r.serial_number = s.serial_number
and r.trial = s.trial
0
 

Author Comment

by:mossmis
ID: 38758705
I did try it there but this is what happens. Say I had the following data

SN             TRANS_TYPE       TRANS_TO         TRANS_FROM     TRANSACTION_DATE
A1234      RECEIPT              MAIN_WHSE         VENDOR2             12/21/2012
A1234      SHIPMENT       VENDOR6         MAIN_WHSE             1/5/2013

What I want to get is this

SN              Received From         RECEIVED          Shipped To        Ship Date
A1234        VENDOR2                  12/21/12           VENDOR6          1/5/13


What the query does is this
SN              Received From         RECEIVED          Shipped To        Ship Date
A1234        (null)                                                   VENDOR6          1/5/13


Because the parameter date doesn't fall within the date I provided, it nulls it out. Whether Date Shipped OR Received falls within date patameter, my goal is to return the full line.
0
 
LVL 12

Accepted Solution

by:
Jared_S earned 500 total points
ID: 38758850
Can you run this and give me an example of the extra records you're getting?


declare @startdate as datetime
declare @enddate as datetime

set @startdate = '01/01/2013'
set @enddate = '01/08/2013'

select isnull(r.serial_number, s.serial_number) as [Serial Number],
s.trial, r.trial,
r.trans_from as [Received From], r.transaction_date as [Received Date],
s.trans_to as [Shipped To], s.transaction_date as [Ship Date]
from
(select
row_number() over (partition by serial_number order by transaction_date) as trial,
serial_number,
trans_from ,
transaction_date
from sn_transaction
where trans_type='receipt'
) r
full outer  join
(select
row_number() over (partition by serial_number order by transaction_date) as trial,
serial_number,
trans_to,
transaction_date
from sn_transaction
where trans_type='shipment'
) s
on r.serial_number = s.serial_number
and r.trial = s.trial
where
r.transaction_date between @startdate and @enddate
or
s.transaction_date between @startdate and @enddate
0
 

Author Comment

by:mossmis
ID: 38759066
For some reason mine was returning all rows in the table associated with my ORG, 1.5 million when I'm only supposed to have 16,000 rows.

I didn't have the outer join you put in. what you provided above works rather well and I think  you nailed it.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Suggested Solutions

Hi, I have heard from my friends that it’s not possible to create Label Printing report using SSRS. I am amazed after hearing this words not possible in SSRS. I googled lot and found that it is possible to some of people know about the Report Bui…
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

26 Experts available now in Live!

Get 1:1 Help Now