asked on

General Database Strategy - Efficient storage of fitness data

Hello,

Very new to SQL, and would very much appreciate a couple of pointers on proper database strategy for my application.

I'm desiging a mySQL database to store details on athletes and workout data
- Athlete info such as name, email, etc...
- Workout data
* Top level summary - cumulative time, average heart rate
* Detailed workout data includes sample time and instantaneous heart rate data (e.g. every 5 seconds or 10 seconds). Some workouts may include up to 4-5 different data types at this sampling rate, whereas other workouts will only have 1-2.

The database must be scalable, and robust enough to store thousands of athletes, each who may have hundreds of workouts, and each workout may have hundreds of datapoints. One obvious required table stores athlete particulars. What I'm less sure about is how to best and most efficiently store workout data.

For workout data, one table could include a top level summary of workouts: athleteID, cumulate time, average heart rate, etc... and other summary data.

My biggest question is how/where should I store the large amount of detailed workout data (e.g. hundreds of datapoints of sample time and heart rate).

If I create a single table to store ALL workout data points for all athletes and all workouts, it would seem to be an unmanageable table with millions of rows. It would require a dedicated column for workoutID, which seems a bit wasteful. Even more wasteful might be the fact that many workouts will only include 1-2 data types (fields), but others will have 4-5 - so lots of empty cells.

If I create a table for each workout, I could have thousands of workout tables. This seems complicated to create and recall so many tables, each with a different name (perhaps each named with a unique workoutID). I suppose in this case I would also need a place to store a growing list of workoutIDs for each athlete - and so my athlete table would become inefficient if I kept that a single table, since certain athletes would have many more workoutID columns than others.

Thanks so much for any general strategy tips.

Kim Ryan

I would suggest that a single table to store ALL workout data points is the best way to go. This will give you more flexibility if you want to combine queries across more tha one athelete at a time. Gerneal dbase theory is that like concepts (workouts) should be stored in the same table. By providing a unique id you can always partion the results down to each work out, so they can still function as if they where seperate or virtual tables. It will be easier to manage and back up the single table though.

If you are worried about space, look at efficient data sizing. For example, heart rate could be stored as an unsigned int, as your max value should not be more than 256 (I hope!) If you declare all columns as NOT NULL there is less overhead for empty values.

freag34

ASKER

Thanks, some very good tips. A few follow-up questions so I understand the efficiency better:

1) Say I use one large table for tens of millions of workout datapoints. One field would be sessionID - all datapoints from one session have the same sessionID. How fast (very rough ballpark) will it be to extract one set of a couple of thousand workout datapoints using a sessionID query from a table of tens of millions of rows? Tens of milliseconds, hundreds of milliseconds,more?

2) Can my table go up to hundreds of millions of rows without concern? If that becomes a concern, I could reasonably easily reduce the table size by a factor of 26 by having 26 tables, one for each letter of the alphabet of user's last name? Is that a reasonable idea or newbie idea :)

With the previous advice in mind along with further research, I have come up with a proposed database. Can I please get some comments on the efficiency and reasonableness of the below:

USER
      UserID      PRIMARY_KEY,
      LastName
      FirstName

SESSION
      SessionID      PRIMARY_KEY
      UserID
      SessionDate
      Duration
      NumLaps

LAP
      LapID      PRIMARY_KEY
      SessionID
      LapNumber
      Duration
      Distance
      AveHR
      AveSpeed

DATAPOINT
      DataPointID      PRIMARY_KEY
      LapID
      SessionID
      SampleTime
      HeartRate
      Speed
      etc…

Two Questions re: table above:

1) Under LAP, I propose having AveHR and AveSpeed to speed up obtaining this data. These numbers could be calculated from DATAPOINT queries, but I would like the user to have a near-instantaneous experience, and having summary data such as averages stored in LAP would help? Is this a reasonable idea?

2) Under DATAPOINT I propose having both LapID and SessionID. Only LapID would be strictly necessary because SessionID can be obtained from the LAP table, however having both would make for much easier queries on DATAPOINT I believe - for example to quickly obtain all datapoints from one session would only take one query instead of many. Is this reasonable?

ASKER CERTIFIED SOLUTION

Kim Ryan

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

freag34

ASKER

thanks very much, great tips