Billion Records Database

Posted on 2006-05-05
Last Modified: 2008-03-03
I have to make a recommendation about hardware.
What would you need to run a database with a billion records? One table, billion records. Hardware? Software? OS?

MS SQL? Windows? Would that work? And the hardware?

Question by:waterzap
    LVL 75

    Assisted Solution

    by:Aneesh Retnakaran
    I would suggest you may go thru the following minimum requirements

    I would suggest
     Atlease 2GB RAM, CPU: P4 2.8GHz, HDD: 100gb  SCSI RAID 5, Windows 2003, SQL server 2000 or 2005
    CPU: Dual Xeon 3.06GHz, RAM: 2GB, HDD: 100gb  SCSI RAID 5
    LVL 28

    Expert Comment

    I do it with Informix on SCO Unix. A bit outdated, but effective.

    Chris B
    LVL 5

    Accepted Solution

    It's impossible to answer this question without knowing something about the specifics of your application. How large is your record size? How is the data to be queried? How will updates and inserts occour? What response times do your users expect? Where does the data come from? Is it entered directly into the database or fed from other sources?

    You mention you have one large table. It's worth considering whether you could more effectively split that into a relational structure. For example, if your table contains sales data, and every row contains a product code, product description and price, it would probably be worth seperating these into a seperate Products table and just store the produce code (or a seperate numeric key) on the main table. On the other hand, if it's a warehousing application where you just want raw querying speed, you might be better off leaving it in a flat structure as lookups take time.

    When you get into the specifics, there are a lot of variables you need to consider. If your record size is about 100 bytes, that will fill up 100GB before indexes, and indexes can take a considerable amount of extra space. But if your record size grows towards 1000 bytes, then you're entering the terabyte range, which is certainly achievable on standard RAID, but you might want to start looking at specialised SAN solutions.

    How quickly users want to get data back is also an important consideration. It's certainly possible to build systems that will find any record in under a second, but if that's not required, why go to the expense? I've encountered systems where there was a requirement to archove data for years, but it was only occasionally needed to retrieve when something needed to be verified. In this case it more sense to use compressed files to allow the maximum amount of data to be archived in the minimum space. We used one file for each day's data, with the date encoded in the filename. We were able to do this because queries always were for a specified date. It took several minutes for the application to unpack the relevant file and find the required records, which were then transferred to a conventional relational database to allow the user to query in more detail. The advantage was the we could archive massive amounts of data without a huge database.

    On the other side of the equation, if your users want to look at aggregations and trends and aren't so interested in individual records, you need to look at OLAP solutions. These allow users to ask questions like "show me total sales for each region and product in the last quarter." Have a look at the MS OLAP Server (part of SQL Server) or Hyperion Essbase for this sort of application.

    One database worth considering for storing massive amounts amounts of data is Teradata. Generally it's a solution where hardware and software are sold together running massively parallel systems, and is popular for enterprise data warehousing applications.

    Tell us a little more about your application and we might be able to give more specific advice.

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Highfive + Dolby Voice = No More Audio Complaints!

    Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

    Database tuning – How to start and what to tune. This question is frequently asked by many people, both online and offline. There is no hard and fast rule-of-thumb for performance tuning, however, before beginning the tuning process one should a…
    Entering time in Microsoft Access can be difficult. An input mask often bothers users more than helping them and won't catch all typing errors. This article shows how to create a textbox for 24-hour time input with full validation politely catching …
    Video by: Steve
    Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
    Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

    760 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    13 Experts available now in Live!

    Get 1:1 Help Now