General Database Strategy - Efficient storage of fitness data

Posted on 2006-06-04
Last Modified: 2013-11-15

Very new to SQL, and would very much appreciate a couple of pointers on proper database strategy for my application.

I'm desiging a mySQL database to store details on athletes and workout data
- Athlete info such as name, email, etc...
- Workout data
   * Top level summary - cumulative time, average heart rate
   * Detailed workout data includes sample time and instantaneous heart rate data (e.g. every 5 seconds or 10 seconds). Some workouts may include up to 4-5 different data types at this sampling rate, whereas other workouts will only have 1-2.

The database must be scalable, and robust enough to store thousands of athletes, each who may have hundreds of workouts, and each workout may have hundreds of datapoints. One obvious required table stores athlete particulars. What I'm less sure about is how to best and most efficiently store workout data.

For workout data, one table could include a top level summary of workouts: athleteID, cumulate time, average heart rate, etc... and other summary data.

My biggest question is how/where should I store the large amount of detailed workout data (e.g. hundreds of datapoints of sample time and heart rate).

If I create a single table to store ALL workout data points for all athletes and all workouts, it would seem to be an unmanageable table with millions of rows. It would require a dedicated column for workoutID, which seems a bit wasteful. Even more wasteful might be the fact that many workouts will only include 1-2 data types (fields), but others will have 4-5 - so lots of empty cells.

If I create a table for each workout, I could have thousands of workout tables. This seems complicated to create and recall so many tables, each with a different name (perhaps each named with a unique workoutID). I suppose in this case I would also need a place to store a growing list of workoutIDs for each athlete - and so my athlete table would become inefficient if I kept that a single table, since certain athletes would have many more workoutID columns than others.

Thanks so much for any general strategy tips.
Question by:freag34
    LVL 19

    Expert Comment

    I would suggest that a single table to store ALL workout data points is the best way to go. This will give you more flexibility if you want to combine queries across more tha one athelete at a time. Gerneal dbase theory is that like concepts (workouts) should be stored in the same table. By providing a unique id you can always partion the results down to each work out, so they can still function as if they where seperate or virtual tables. It will be easier to manage  and back up the single table though.

    If you are worried about space, look at efficient data sizing. For example, heart rate could be stored as an unsigned int, as your max value should not be more than 256 (I hope!) If you declare all columns as NOT NULL there is less overhead for empty values.


    Author Comment

    Thanks, some very good tips. A few follow-up questions so I understand the efficiency better:

     1) Say I use one large table for tens of millions of workout datapoints. One field would be sessionID - all datapoints from one session have the same sessionID. How fast (very rough ballpark) will it be to extract one set of a couple of thousand workout datapoints using a sessionID query from a table of tens of millions of rows? Tens of milliseconds, hundreds of milliseconds,more?

    2) Can my table go up to hundreds of millions of rows without concern? If that becomes a concern, I could reasonably easily reduce the table size by a factor of 26 by having 26 tables, one for each letter of the alphabet of user's last name? Is that a reasonable idea or newbie idea :)

    With the previous advice in mind along with further research, I have come up with a proposed database. Can I please get some comments on the efficiency and reasonableness of the below:

          UserID      PRIMARY_KEY,
          SessionID      PRIMARY_KEY
          LapID      PRIMARY_KEY
          DataPointID      PRIMARY_KEY

    Two Questions re: table above:

    1) Under LAP, I propose having AveHR and AveSpeed to speed up obtaining this data. These numbers could be calculated from DATAPOINT queries, but I would like the user to have a near-instantaneous experience, and having summary data such as averages stored in LAP would help? Is this a reasonable idea?

    2) Under DATAPOINT I propose having both LapID and SessionID. Only LapID would be strictly necessary because SessionID can be obtained from the LAP table, however having both would make for much easier queries on DATAPOINT I believe - for example to quickly obtain all datapoints from one session would only take one query instead of many. Is this reasonable?
    LVL 19

    Accepted Solution

    1) I would guess at least 100s of milli seconds, possibly even a few seconds, as you are talking about a large amount of data. Of course it all depends how good your indexing, caching and hardware is, as well as how contiguous the data is.

    2) My SQL can handle I think up to 4 billion rows. Its roguhgly the same amount of disk space, wether tables are split or not. Remember you can always archive off old data to a text back up. One apporach is to have a rolling window of say the last 100 days of data for speedy access. Then all the historical data preceeding this can be shunted to another database for the more complex queries that you can spend time wating for.

    1) Yes, it is a good idea to store frequently accessed data. Even though it leads to redundant data, query speed is sometimes more important. This is referred to as denormilization.

    2) Again a good idea. By doing this you allow for more relations between tables, the key to an effecient design.

    Author Comment

    thanks very much, great tips

    Featured Post

    Threat Intelligence Starter Resources

    Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

    Join & Write a Comment

    Storage devices are generally used to save the data or sometime transfer the data from one computer system to another system. However, sometimes user accidentally erased their important data from the Storage devices. Users have to know how data reco…
    A Bare Metal Image backup allows for the restore of an entire system to a similar or dissimilar hardware. They are highly useful for migrations and disaster recovery. Bare Metal Image backups support Full and Incremental backups. Differential backup…
    This tutorial will walk an individual through the steps necessary to enable the VMware\Hyper-V licensed feature of Backup Exec 2012. In addition, how to add a VMware server and configure a backup job. The first step is to acquire the necessary licen…
    This tutorial will walk an individual through configuring a drive on a Windows Server 2008 to perform shadow copies in order to quickly recover deleted files and folders. Click on Start and then select Computer to view the available drives on the se…

    728 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now