Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 450
  • Last Modified:

General Database Strategy - Efficient storage of fitness data


Very new to SQL, and would very much appreciate a couple of pointers on proper database strategy for my application.

I'm desiging a mySQL database to store details on athletes and workout data
- Athlete info such as name, email, etc...
- Workout data
   * Top level summary - cumulative time, average heart rate
   * Detailed workout data includes sample time and instantaneous heart rate data (e.g. every 5 seconds or 10 seconds). Some workouts may include up to 4-5 different data types at this sampling rate, whereas other workouts will only have 1-2.

The database must be scalable, and robust enough to store thousands of athletes, each who may have hundreds of workouts, and each workout may have hundreds of datapoints. One obvious required table stores athlete particulars. What I'm less sure about is how to best and most efficiently store workout data.

For workout data, one table could include a top level summary of workouts: athleteID, cumulate time, average heart rate, etc... and other summary data.

My biggest question is how/where should I store the large amount of detailed workout data (e.g. hundreds of datapoints of sample time and heart rate).

If I create a single table to store ALL workout data points for all athletes and all workouts, it would seem to be an unmanageable table with millions of rows. It would require a dedicated column for workoutID, which seems a bit wasteful. Even more wasteful might be the fact that many workouts will only include 1-2 data types (fields), but others will have 4-5 - so lots of empty cells.

If I create a table for each workout, I could have thousands of workout tables. This seems complicated to create and recall so many tables, each with a different name (perhaps each named with a unique workoutID). I suppose in this case I would also need a place to store a growing list of workoutIDs for each athlete - and so my athlete table would become inefficient if I kept that a single table, since certain athletes would have many more workoutID columns than others.

Thanks so much for any general strategy tips.
  • 2
  • 2
1 Solution
Kim RyanIT ConsultantCommented:
I would suggest that a single table to store ALL workout data points is the best way to go. This will give you more flexibility if you want to combine queries across more tha one athelete at a time. Gerneal dbase theory is that like concepts (workouts) should be stored in the same table. By providing a unique id you can always partion the results down to each work out, so they can still function as if they where seperate or virtual tables. It will be easier to manage  and back up the single table though.

If you are worried about space, look at efficient data sizing. For example, heart rate could be stored as an unsigned int, as your max value should not be more than 256 (I hope!) If you declare all columns as NOT NULL there is less overhead for empty values.

freag34Author Commented:
Thanks, some very good tips. A few follow-up questions so I understand the efficiency better:

 1) Say I use one large table for tens of millions of workout datapoints. One field would be sessionID - all datapoints from one session have the same sessionID. How fast (very rough ballpark) will it be to extract one set of a couple of thousand workout datapoints using a sessionID query from a table of tens of millions of rows? Tens of milliseconds, hundreds of milliseconds,more?

2) Can my table go up to hundreds of millions of rows without concern? If that becomes a concern, I could reasonably easily reduce the table size by a factor of 26 by having 26 tables, one for each letter of the alphabet of user's last name? Is that a reasonable idea or newbie idea :)

With the previous advice in mind along with further research, I have come up with a proposed database. Can I please get some comments on the efficiency and reasonableness of the below:

      UserID      PRIMARY_KEY,
      SessionID      PRIMARY_KEY
      LapID      PRIMARY_KEY
      DataPointID      PRIMARY_KEY

Two Questions re: table above:

1) Under LAP, I propose having AveHR and AveSpeed to speed up obtaining this data. These numbers could be calculated from DATAPOINT queries, but I would like the user to have a near-instantaneous experience, and having summary data such as averages stored in LAP would help? Is this a reasonable idea?

2) Under DATAPOINT I propose having both LapID and SessionID. Only LapID would be strictly necessary because SessionID can be obtained from the LAP table, however having both would make for much easier queries on DATAPOINT I believe - for example to quickly obtain all datapoints from one session would only take one query instead of many. Is this reasonable?
Kim RyanIT ConsultantCommented:
1) I would guess at least 100s of milli seconds, possibly even a few seconds, as you are talking about a large amount of data. Of course it all depends how good your indexing, caching and hardware is, as well as how contiguous the data is.

2) My SQL can handle I think up to 4 billion rows. Its roguhgly the same amount of disk space, wether tables are split or not. Remember you can always archive off old data to a text back up. One apporach is to have a rolling window of say the last 100 days of data for speedy access. Then all the historical data preceeding this can be shunted to another database for the more complex queries that you can spend time wating for.

1) Yes, it is a good idea to store frequently accessed data. Even though it leads to redundant data, query speed is sometimes more important. This is referred to as denormilization.

2) Again a good idea. By doing this you allow for more relations between tables, the key to an effecient design.
freag34Author Commented:
thanks very much, great tips

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now