So I am fairly new to DB design but have an application that I am working on which I would like to collect opinions about the DB design for.
I will attempt to simplify the description to the relevant parts but if you want more info, just ask.
The application stores information about poker hands. The current architecture includes tables for:
- Player: a small table (5 columns) with name, casinoID, etc
- Game: a small table (23 columns) with general game info (e.g. date, type of game, etc)
- GameTable: a small table (7 columns) with info about the table (e.g. limit being played, etc)
For each game, there will be 1-10 players who each sit in a seat:
- Seat: a small table (7) with non-calculated info about their seat (e.g. position, how much $ they started with, etc)
Now my application does a lot of work to perform calculations about the play of the hand. This data is currently stored in a separate table called:
SeatCalc: a VERY large table (~175 columns. Most are int or decimal).
All of these tables contain an ID (int) which is a primary key, IDENTITY and a clustered index on this field.
A typical user query is to find all of the seats/seatcalcs for a given player and then group the data using SUMs. Her is an actual query:
SELECT dbo.GameTable.MinValue AS Min,dbo.GameTable.MaxValue AS Max,dbo.GameTable.GameType AS GameType,Count_Big(*) AS KnownHands,SUM(dbo.SeatCalc.Summary_WonHand) AS WonHand,SUM(dbo.SeatCalc.Pre_ChoseToEnter) AS Pre_ChoseToEnter,SUM(dbo.SeatCalc.Summary_WonHand_and_SawFlop) AS WonHandAndSawFlop,SUM(dbo.SeatCalc.Summary_Net) AS Net,SUM(dbo.SeatCalc.Summary_NetAsBB) AS NetAsBB,SUM(dbo.SeatCalc.Flop_SawFlop) AS SawFlop,SUM(dbo.SeatCalc.River_WentToShowdown) AS WentToShowdown,SUM(dbo.SeatCalc.Summary_WonHand_and_WentToShowdown) AS WonHandAndWentToShowdown,SUM(dbo.SeatCalc.Pre_Raised) AS Pre_Raised,SUM(dbo.SeatCalc.Pos_SB) AS WasSB,SUM(dbo.SeatCalc.Pre_Count_Raises) AS Pre_Count_Raises,SUM(dbo.SeatCalc.Pre_Count_Bets) AS Pre_Count_Bets,SUM(dbo.SeatCalc.Pre_Count_Calls) AS Pre_Count_Calls,SUM(dbo.SeatCalc.Flop_Count_Raises) AS Flop_Count_Raises,SUM(dbo.SeatCalc.Flop_Count_Bets) AS Flop_Count_Bets,SUM(dbo.SeatCalc.Flop_Count_Calls) AS Flop_Count_Calls,SUM(dbo.SeatCalc.Turn_Count_Raises) AS Turn_Count_Raises,SUM(dbo.SeatCalc.Turn_Count_Bets) AS Turn_Count_Bets,SUM(dbo.SeatCalc.Turn_Count_Calls) AS Turn_Count_Calls,SUM(dbo.SeatCalc.River_Count_Raises) AS River_Count_Raises,SUM(dbo.SeatCalc.River_Count_Bets) AS River_Count_Bets,SUM(dbo.SeatCalc.River_Count_Calls) AS River_Count_Calls
FROM dbo.SeatCalc INNER JOIN dbo.Seat ON dbo.SeatCalc.SeatID = dbo.Seat.SeatID INNER JOIN dbo.Player ON dbo.Seat.PlayerID = dbo.Player.PlayerID INNER JOIN dbo.Game ON dbo.Seat.GameID = dbo.Game.GameID INNER JOIN dbo.GameTable ON dbo.Game.GameTableID = dbo.GameTable.GameTableID
WHERE (dbo.Player.PlayerID = 7) AND (dbo.Game.TournamentGame= 0) AND (dbo.Game.PokerType= 1)
GROUP BY dbo.GameTable.MaxValue,dbo.GameTable.MinValue,dbo.GameTable.GameType
Now, everything basically works for my application but I started doing some stress testing and am not loving the results. It may be that this is just the way things are or I may need to redesign the table, indexes or something. I had ~250k hands which meant around ~1.2M seats/seatcalcs and a DB which was ~1.25GB. Queries were taking 45 seconds or so to complete.
1) Now what is a reasonable expectation for a query which needs to find ~200k rows and then sum a bunch of fields within those rows into groups? This felt too long but it may be that I have unreasonable expectations.
2) There is a 1:1 mapping between Seats and SeatCalcs. This was done because they are very different logically and I will occasionally wipe the entire SeatCalc row and recalculate it (whereas the Seat row is basically never touched once it is added). One idea I read was to merge these tables into a single larger table ...which would sacrifice my nice logical division but I would do so if I could expect much better performance. Comments?
3) On the other hand, I have read to divide a large table (e.g. vertical partitioning) to get more cache hits. There are some logical places that I could divide the SeatCalc table into separate tables. This would however require even more complex JOIN statements...
4) I started playing with indexed views but I am not really sure what I am trying to accomplish. One problem (I believe) is that the Seat/SeatCalc rows are not logically arranged by how they are typically queried (e.g. by playerID). When I am running the above query for the player with ~220k seats of data, his seatIDs are 1,7,13,25,etc...because when data is inserted, it is inserted on a hand by hand basis and all of the other player's data is mixed in. I have no idea if this matters to the underlying architecture but if I wasn't doing this using MSSQL, I would try to be keeping the underlying data organized by playerID. But I have no idea how to specify this...
So...This is pretty much an open-ended start to the question. I expect to start a discussion and reward points based on the person/people who help me understand the tactics to try to optimize...