mobile app back end database with large number of entires

Posted on 2013-01-12
Last Modified: 2014-11-12
I am looking to develop a mobile phone application that will need to send data to a back-end database every 30 seconds. The data is relatively simple and consist of a userid, location information and a few simple numbers. If I base my estimates on having 100,000 users and that the app is used 1 hour per day then that is potentially 12 million rows per day and we need to keep the data for at least 2 years meaning we have potentially 8.7 billion rows of data. The database logically is very simple with a user table and a larger table with the captured data in it. There may be other tables with relationships to the user table but it will always be quite simple We then need to perform normal SQL queries and also time series analysis on it using standard and bespoke tools we will develop. I am very familiar with relational databases such as SQl Server , Oracle and MySQl but was wondering if there are better solutions for this since I assume other applications have already built similar back-ends for data storage

Ideally since we are just starting up we want a free or low initial cost solution.

Thanks for any help

Question by:prices1964
  • 2
LVL 25

Accepted Solution

Tomas Helgi Johannsson earned 450 total points
ID: 38771825

For a such magnitude of data I strongly recommend that you use partitioning, both for maintenance and also for the queries as the queries will take advantage of the partition pruning which gives you optimal performance of such large data.

Also consider using InnoDB engine instead of MyISAM as you will get row-level locking in InnoDB vs locking on the entire table in MyISAM when executing queries.

     Tomas Helgi
LVL 29

Expert Comment

ID: 38772359
You should consider another frequency for updates; although you need data to be captured twice a minute on the mobile, you probably need a lower frequency for the transmission to the database, and so the burden to the database would be lower, since you would have a single sql overhead for several records, instead of 1 for each record.

As an example: for an application I am working with, GPS data is captured by the vehicle-embarked box twice a minute, but sent to the monitoring system only once every 3 minutes: the overhead of data transmission / bandwidth consumption / sql update happens 6 times less than if we were capturing everys 30 seconds.
By doing so you may occasionally lose some data (eg if the mobile is out of network at transmission time): you need to assess the best interval for your needs.

Of course, this is possible only if the mobile can keep the data for the chosen interval (or even more).
- If you are writing a native app, this should not be a problem.
- If you are writing a web app in HTML5 flavor, you can use the local mobile database except AFAIK for windows phone, since its IE browser does not handle this.

Author Comment

ID: 38772430

thanks for that suggestion, it makes a lot of sense

in terms of then querying the data for statistical analysis would a normal database such as SQL server / oracle be the best solution ?
LVL 29

Expert Comment

ID: 38772473
For statistical analysis with big volumes of data, sql  in some cases is not the best solution: its normal forms avoid logical and physical redundancy (consequently keeping the data volume to its minimum), but "OLAP" analytics processes when dealing with huge numbers of records are most effective with hypercubes of redundant data (so that eg, a city name is available directly in the record rather than thru a relation with an additional table).

Note in addition that if you sql-query huge volumes of data directly on your production database, the one that collects your permanent data, you might eat all of the CPU for your statistics, thus preventing the database to be updated (even if it is partitioned)

A good strategy might be to collect / record data in sql (partitioned if needed), then to backup it to some other place (at a peaceful time, eg, at 1am). The backup data can then be placed on another machine (remember, statistics and analytics are cpu intensive) where it will be used in sql if the performance is OK, otherwise translated into a denormalized hypercube.

If you partition your sql data, consider amongst the different solution the partition on date (eg, create a new database every 24 hours so that new data work only on the data of today, not on the last x days). In that case conversion to denormalized data will simply add to the previous hypercubes.

You have probably done the maths to compute the rate at which your data will be created, and what will be the average additional daily volume. When this becomes large, remember that the time needed to add records to N existing records is, at best, proportional to Log (N)  and very often worse than that...

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

APEX (Application Express) is used to develop a web application from Oracle. SQL Workshop is one of the tools that comes with Oracle APEX to query or modify the database objects or to make any changes to the structure.
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now