• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 802
  • Last Modified:

Billion Records Database

I have to make a recommendation about hardware.
What would you need to run a database with a billion records? One table, billion records. Hardware? Software? OS?

MS SQL? Windows? Would that work? And the hardware?

2 Solutions
Aneesh RetnakaranDatabase AdministratorCommented:
I would suggest you may go thru the following minimum requirements

I would suggest
 Atlease 2GB RAM, CPU: P4 2.8GHz, HDD: 100gb  SCSI RAID 5, Windows 2003, SQL server 2000 or 2005
CPU: Dual Xeon 3.06GHz, RAM: 2GB, HDD: 100gb  SCSI RAID 5
I do it with Informix on SCO Unix. A bit outdated, but effective.

Chris B
It's impossible to answer this question without knowing something about the specifics of your application. How large is your record size? How is the data to be queried? How will updates and inserts occour? What response times do your users expect? Where does the data come from? Is it entered directly into the database or fed from other sources?

You mention you have one large table. It's worth considering whether you could more effectively split that into a relational structure. For example, if your table contains sales data, and every row contains a product code, product description and price, it would probably be worth seperating these into a seperate Products table and just store the produce code (or a seperate numeric key) on the main table. On the other hand, if it's a warehousing application where you just want raw querying speed, you might be better off leaving it in a flat structure as lookups take time.

When you get into the specifics, there are a lot of variables you need to consider. If your record size is about 100 bytes, that will fill up 100GB before indexes, and indexes can take a considerable amount of extra space. But if your record size grows towards 1000 bytes, then you're entering the terabyte range, which is certainly achievable on standard RAID, but you might want to start looking at specialised SAN solutions.

How quickly users want to get data back is also an important consideration. It's certainly possible to build systems that will find any record in under a second, but if that's not required, why go to the expense? I've encountered systems where there was a requirement to archove data for years, but it was only occasionally needed to retrieve when something needed to be verified. In this case it more sense to use compressed files to allow the maximum amount of data to be archived in the minimum space. We used one file for each day's data, with the date encoded in the filename. We were able to do this because queries always were for a specified date. It took several minutes for the application to unpack the relevant file and find the required records, which were then transferred to a conventional relational database to allow the user to query in more detail. The advantage was the we could archive massive amounts of data without a huge database.

On the other side of the equation, if your users want to look at aggregations and trends and aren't so interested in individual records, you need to look at OLAP solutions. These allow users to ask questions like "show me total sales for each region and product in the last quarter." Have a look at the MS OLAP Server (part of SQL Server) or Hyperion Essbase for this sort of application. http://www.hyperion.com/essbase/

One database worth considering for storing massive amounts amounts of data is Teradata. Generally it's a solution where hardware and software are sold together running massively parallel systems, and is popular for enterprise data warehousing applications. http://www.teradata.com/

Tell us a little more about your application and we might be able to give more specific advice.

Featured Post

Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now