Solved

tSQL reporting problem (grouping vs detailed)

Posted on 2008-06-10
26
353 Views
Last Modified: 2008-06-25
I have 2 reports that I run on a daily basis.  One is a detailed report of 'self trades'.  The other is a summation of the same 'self trades'.  so if I have 100 self trades for this trader on this endpoint and symbol in the detailed, I would have one record in the summation with a total selftrade count of 100.  The detailed rpt runs, loading current day to working tables, then the summation runs off the same working tables.  It has come to my attention that the detailed and the summation are not equating.  
Here is a specific example:

from the summation:
endpoint/trader/symbol/totalvol/totalTrades/SelfTradeVol/TotalSelfTrade/SelfTradeVolRatio/SelfTradeRatio
CBOT   abc        ZTU8      5738     1782        99      50       0.017253398     0.028058361
CBOT   def        ZTU8       6216     2166       58       38       0.009330759     0.01754386

yet in the detailed report, there are 19 trades for trader abc with a volume of 63, and 15 trades for the trader def, with a volume of 32.  The two together, a total volume of 95.  see the selftrade count and volume for each id --  
the summation:
         count      volume
abc     50           99
def     38           58

the detailed
         count      volume
abc      19          63
def       15          32

- my summation says i have 50 trades by abc on that symbol and endpoint.   my detailed has only 19.
- my summation says i have 38 trades by def on that symbol and endpoint.    my detailed has only 15.

this is my detailed logic:
CREATE proc detailedProc
AS
SET NOCOUNT ON

DELETE workingDB.dbo.epSelfTrade
INSERT INTO workingDB.dbo.epSelfTrade SELECT DISTINCT EndPoint FROM SERVER1.database1.dbo.EPGroups WHERE EndPoint NOT IN (
'....a list of endpoints we ignore for this report....')

delete workingDB.dbo.SelfTrades
/* only non-spread trades */
insert workingDB.dbo.SelfTrades
select trader,acct,prevorderno,endpoint,symbol,side,quantity,price,tradetime,clearingfee,lflag,client, orderno,origorderno,basketid,basketnumber,quantity,executedquantity,leavesquantity,ordertype, limitprice,avgfillprice,orderstatus,stopprice,timeinforce,cntrpartyID,msgID,hostname,ordermarking,
inserttime,source
from database.dbo.ordertable a with (nolock)
WHERE tradetime >= LEFT(GETDATE(),11)
AND EXISTS (SELECT 1 FROM workingDB.dbo.epSelfTrade WHERE endpoint = a.endpoint)
AND symbol NOT LIKE '%-%'
AND NOT EXISTS (select 1 orderno from database.dbo.ordertable b with (nolock) where a.origorderno = b.origorderno
AND tradetime >= LEFT(GETDATE(),11)
AND EXISTS (SELECT 1 FROM workingDB.dbo.epSelfTrade WHERE endpoint = b.endpoint)
AND symbol LIKE '%-%'   )
/* only spread trades */
insert workingDB.dbo.SelfTrades
SELECT trader,acct,prevorderno,endpoint,symbol,side,quantity,price,tradetime,clearingfee,lflag,client, orderno,origorderno,basketid,basketnumber,quantity,executedquantity,leavesquantity,ordertype, limitprice,avgfillprice,orderstatus,stopprice,timeinforce,cntrpartyID,msgID,hostname,ordermarking,
inserttime,source
FROM database.dbo.ordertable t1 WITH (NOLOCK)
WHERE tradetime >= LEFT(GETDATE(),11)
AND EXISTS (SELECT 1 FROM workingDB.dbo.epSelfTrade WHERE endpoint = t1.endpoint)
AND symbol LIKE '%-%'

SELECT DISTINCT b.endpoint,b.trader AS Btrader, b.acct AS BAcctID,b.Client AS BClientID, b.OrigOrderNo AS BOrigOrderNo,s.trader AS Strader,s.acct AS SAcctID,s.Client AS SClientID, s.OrigOrderNo AS SOrigOrderNo,b.Symbol, b.Quantity, b.Price,
CONVERT(VARCHAR(10),b.tradetime,101)+' '+SUBSTRING(CONVERT(VARCHAR(20),b.tradetime,100),13,7) AS tradetime
FROM workingDB.dbo.SelfTrades b WITH (NOLOCK)
INNER JOIN workingDB.dbo.SelfTrades s WITH (NOLOCK)
  ON s.endpoint = b.endpoint
  AND s.Quantity = b.Quantity
  AND s.Price = b.Price
  AND s.tradetime = b.tradetime
  AND s.side = 'S'
  AND b.side = 'B'
ORDER BY EndPoint,Btrader,b.Symbol

SET NOCOUNT OFF
GO

--summation procedure
CREATE proc dbo.summationProc
AS

SET NOCOUNT ON

SELECT q1.EndPoint,q1.Trader,q1.Symbol,q1.TotalVol,q1.TotalTrades,q2.SelfTradeVol,q2.TotalSelfTrades,
CAST(q2.SelfTradeVol AS DECIMAL(15,3))/CAST(q1.TotalVol AS DECIMAL(15,3)) AS SelfTradeVolRatio,
CAST(q2.TotalSelfTrades AS DECIMAL(15,3))/CAST(q1.TotalTrades AS DECIMAL(15,3)) AS TotalSelfTradeRatio

FROM (
  SELECT a.EndPoint,a.Trader,a.Symbol,SUM(quantity) AS TotalVol,COUNT(*) AS TotalTrades
  FROM database.dbo.ordertable a WITH (NOLOCK)
  WHERE tradetime >= LEFT(GETDATE(),11)
  AND EXISTS (SELECT 1 FROM workingDB.dbo.epSelfTrade WHERE endpoint = a.EndPoint)
  GROUP BY a.EndPoint,a.Trader,a.Symbol )  AS q1
  INNER JOIN
  (SELECT b.EndPoint,b.Trader,b.Symbol,SUM(b.quantity)AS SelfTradeVol,COUNT(*) AS TotalSelfTrades
  FROM workingDB.dbo.SelfTrades b WITH (NOLOCK)
  INNER JOIN workingDB.dbo.SelfTrades s WITH (NOLOCK)
        ON s.EndPoint = b.EndPoint
      AND s.Quantity = b.Quantity
        AND s.Price = b.Price
        AND s.tradetime = b.tradetime
        AND s.Side = 'S'
      AND b.Side = 'B'
   GROUP BY b.EndPoint,b.Trader,b.Symbol ) AS q2
   ON q1.Trader = q2.Trader
   AND q1.symbol = q2.symbol
   ORDER BY q1.EndPoint,q1.Trader,q1.Symbol

SET NOCOUNT OFF
GO



I know it's kind of lengthy.  I apologize for that.  Does anybody see what may be causing this variance?
0
Comment
Question by:dbaSQL
  • 15
  • 11
26 Comments
 
LVL 17

Author Comment

by:dbaSQL
ID: 21751091
Any input at all is very much appreciated.
0
 
LVL 8

Expert Comment

by:srnar
ID: 21751758
Yeah it is a bit lengthy. Can you publish some create tables and insert statements to run your procedures? How are you comparing two results?
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 21752188
the reports are sent out via email, as .csv's.  the comparison is simply walking thru one, then looking at the other to ensure it all adds up.  i've got both reports in front of me now, the example i gave is legit.  it doesn't add up.
0
 
LVL 8

Expert Comment

by:srnar
ID: 21753753
Filter on SelfTrades table is the same in both cases - so it is not the problem.

The detailed SQL is using distinct clause. Why? Can you run the query without DISTINCT?
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 21754198
yes, let me try this
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 21754439
well, without the distinct, i may be getting the data, BUT now i'm getting a lot of dupes in the detailed report, so it's hard to tell.  

for example, this one is in there 12 times:
EndPoint/BTrader/BAcctID/BClientID/BOrigOrderNo/STrader/SAcctID/SClientID/SOrigOrderNo/Symbol/Quantity/Price/tradeTime

CBOT abc  8135  zt-01 05Mtgprap  abc 8135  zt-01  05Mtgprat  ZTU8  1  105.64  06/05/2008 10:24AM

this one is in there twice:
CBOT  jkl  8135  zn-02  05Mtgv2js  jkl      8135  zn-03      05Mth8ykt  ZNU8  1  113.17  06/05/2008 11:01AM

that's not going to work

any thoughts what may be doing this?  no changes other than commenting out DISTINCT

0
 
LVL 8

Expert Comment

by:srnar
ID: 21754653
You hit the problem. Your summary query does "group by" and publishes aggregation of details. I assume you need another aggregation but on the more detailed level then your summary.

You need to rewrite the detailed query to have sum or avg on quantity, price and group by on endPoint, Btrader etc.
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 21758624
I'm not sure I follow what you are suggesting, srnar.
0
 
LVL 8

Expert Comment

by:srnar
ID: 21758693
Sorry for confusing. I'm not able to help you more with the problem without looking into data. I try to find some more time and generate some fake data.

Otherwise we are only spending our time.
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 21758703
not confusing, really, i'm just not certain what you are suggesting.  i'd be happy to show you some data, i'm just not certain how to do so.  i don't think a cut/paste of a ton of records (detailed/summary) will go well here.
0
 
LVL 8

Expert Comment

by:srnar
ID: 21759160
3 things will be welcome:

1. create script of SelfTrades table -  can be scripted from Management studio
2. create script of ordertable -  can be also scripted from Management studio
3. select top 100 * from ordertable - can be exported into Excel via clipboard from Management studio
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 21759201
yes, srnar, i know how to script the tables.  and, i fully intended to do so, along with a capture of both the detailed data and summary data, on a specific trader/endpoint/symbol.  thus, we've got one example to work with in both detailed and summary

but.  i'd like to avoid posting it up here, if possible.  

i can genericize the table def, and attach those, but what of the data?
0
 
LVL 8

Expert Comment

by:srnar
ID: 21759410
Sorry to underestimate you, dbaSQL. It is 100 records - you can change the sensitive data (names, quantities) to stay anonymous - can be done in 10 minutes. You can send it to my mail srnar at volny.cz
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 17

Author Comment

by:dbaSQL
ID: 21759771
oh no, you didn't underestimate.  i just wanted you to know i was pretty comfy w/the table def and data and alll, i just didn't want to post up here.  doing this now.  
0
 
LVL 8

Accepted Solution

by:
srnar earned 500 total points
ID: 21786979
If you want to compare Quantity and Trade count from detail to summary you have to replace DISTINCT with aggregations - see 2 new columns newQuantity, newTrades and the long GROUP BY member set (GROUP BY b.endpoint,b.trader, b.acct,b.Client, b.OrigOrderNo,s.trader,s.acct,s.Client, s.OrigOrderNo,b.Symbol, b.Price, b.tradetime).

I hope it will be running - your samples were still far from perfect - so I had to heavily change a lot of code and data. Described change will give you when comparing same numbers - but I do not know whether they are correct - the ordertable has no natural key. Also joining on quantity or price columns is not good practice.

Also usage of identity and another column like a primary key is a bad design pattern - see this constraint:

      CONSTRAINT [clidx_OrderTable_sequencenumber_timeofexec] UNIQUE  CLUSTERED
      (
            [SequenceNumber],
            [TradeTime])

Yet I hope our effort was not useless. Enjoy.
SELECT b.endpoint,b.trader AS Btrader, b.acct AS BAcctID,b.Client AS BClientID, b.OrigOrderNo AS BOrigOrderNo,s.trader AS Strader,s.acct AS SAcctID,s.Client AS SClientID, s.OrigOrderNo AS SOrigOrderNo,b.Symbol, b.Price,

CONVERT(VARCHAR(10),b.tradetime,101)+' '+SUBSTRING(CONVERT(VARCHAR(20),b.tradetime,100),13,7) AS tradetime,

--new

SUM(b.Quantity) as newQuantity,

COUNT(*) AS newTrades

FROM workingDB.dbo.SelfTrades b WITH (NOLOCK)

INNER JOIN workingDB.dbo.SelfTrades s WITH (NOLOCK)

  ON s.endpoint = b.endpoint

  AND s.Quantity = b.Quantity

  AND s.Price = b.Price

  AND s.tradetime = b.tradetime

  AND s.side = 'S'

  AND b.side = 'B'

--new  

GROUP BY b.endpoint,b.trader, b.acct,b.Client, b.OrigOrderNo,s.trader,s.acct,s.Client, s.OrigOrderNo,b.Symbol, b.Price,

b.tradetime

--existing

ORDER BY EndPoint,Btrader,b.Symbol

Open in new window

0
 
LVL 17

Author Comment

by:dbaSQL
ID: 21788448
>>Also joining on quantity or price columns is not good practice.

This particular report is of 'selftrades', which are not the norm, or standard, trading practice.  It must be joined on quantity and price in order to identify the trades we are watching for.
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 21804035
still having problems w/this.  are you avail?
0
 
LVL 8

Expert Comment

by:srnar
ID: 21809924
Not returning good data?
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 21811645
well, i'm not sure.  they are questioning this:
SUM(b.Quantity) as newQuantity,
COUNT(*) AS newTrades

and as i go in to check the data, i've got to question it, too. given that it is in the detailed report, not the summation, is this actually just a count of the selftrades?  
0
 
LVL 8

Expert Comment

by:srnar
ID: 21811705
Yes -  newTrades = count of trades that have this same values:

b.endpoint,b.trader, b.acct,b.Client, b.OrigOrderNo,s.trader,s.acct,s.Client, s.OrigOrderNo,b.Symbol, b.Price, b.tradetime

which is more detailed than set (b.EndPoint,b.Trader,b.Symbol) used in your sum proc
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 21811754
>>which is more detailed than set (b.EndPoint,b.Trader,b.Symbol) used in your sum proc
can you elaborate for me, please, srnar?
0
 
LVL 8

Expert Comment

by:srnar
ID: 21811903
see 5th row of the following statement

....SELECT a.EndPoint,a.Trader,a.Symbol,SUM(quantity) AS TotalVol,COUNT(*) AS TotalTrades
  FROM database.dbo.ordertable a WITH (NOLOCK)
  WHERE tradetime >= LEFT(GETDATE(),11)
  AND EXISTS (SELECT 1 FROM workingDB.dbo.epSelfTrade WHERE endpoint = a.EndPoint)
  GROUP BY a.EndPoint,a.Trader,a.Symbol )  AS q1
  INNER JOIN
  (SELECT b.EndPoint,b.Trader,b.Symbol,SUM(b.quantity)AS SelfTradeVol,COUNT(*) AS TotalSelfTrades
  FROM workingDB.dbo.SelfTrades b WITH (NOLOCK)
  INNER JOIN workingDB.dbo.SelfTrades s WITH (NOLOCK)
        ON s.EndPoint = b.EndPoint
      AND s.Quantity = b.Quantity
        AND s.Price = b.Price
        AND s.tradetime = b.tradetime
        AND s.Side = 'S'
      AND b.Side = 'B'
   GROUP BY b.EndPoint,b.Trader,b.Symbol ) AS q2
   ON q1.Trader = q2.Trader
   AND q1.symbol = q2.symbol
   ORDER BY q1.EndPoint,q1.Trader,q1.Symbol
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 21812275
yes, i can see the differences between the two, i'm just looking for a better understanding of the change
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 21865050
I still don't have a thorough understanding of the change.  Please advise if you are able.
0
 
LVL 8

Expert Comment

by:srnar
ID: 21865316
dbaSQL, simply to avoid duplicity you have to group rows on your detail report - values than can be aggregated (used SUM, COUNT function) -

SUM(b.Quantity) as newQuantity,
COUNT(*) AS newTrades

for more you can visit:
http://msdn.microsoft.com/en-us/library/ms173454.aspx
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 21865418
Thank you.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now