Go Premium for a chance to win a PS4. Enter to Win


how does facebook / twitter store data?

Posted on 2016-02-04
Medium Priority
Last Modified: 2016-03-03
What are some of the technologies that facebook / twitter use to store and search their massive data graphs?
I'm looking for something that will have capabilities to search massive amounts of data that will be growing constantly.
Question by:Squadless
  • 2
LVL 13

Accepted Solution

Russell Fox earned 2000 total points
ID: 41450097
That's...a big question. The most common solution is a "NoSql" database: graph, document, wide column, object, etc. Take a look a NoSQL-Database.org for a list of NoSQL databases out there, each with plusses and minuses. I'm currently learning about MongoDB, a JSON document store, for a project at work. The basic idea is that instead of having to buy better and better (i.e. more expensive) servers to run your relational database, you have a cluster of cheap(er) servers, all connected and talking to each other, with multiple copies of the same data spread all over the cluster. If one machine catches on fire, the rest just keep working and no data is lost. If your startup suddenly has a 500% increase in customers, you just add more machines and they automatically divvy up the data and processing needs. It's kind of a rabbit hole right now because dozens of vendors all want to corner the market.

Author Comment

ID: 41451212
customer % increase will not be significant, but i will be holding transactional data for years.  There will be a need to - at any point in time - search the entire tree for a specific transaction or detail of it, there are about 50k transactions per day currently but that will be going up exponentially.
LVL 13

Assisted Solution

by:Russell Fox
Russell Fox earned 2000 total points
ID: 41451970
Another consideration then is data compression: I know Mongo's latest version has very good compression to keep the documents small on disk, which not only saves space but reduces i/o because the drive heads don't have to move as much. The trade-off is that the CPU has to work harder to compress/decompress the data. I'm sure most of the other solutions have similar capabilities. You'll also want to be very conscientious about how you structure your data - whenever you're thinking about how to store it, first consider how you're going to be retrieving it. That will help direct your indexing strategy.

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Lotus Notes has been used since a very long time as an e-mail client and is very popular because of it's unmatched security. In this article we are going to learn about  RRV Bucket corruption and understand various methods to Fix "RRV Bucket Corrupt…
Backups and Disaster RecoveryIn this post, we’ll look at strategies for backups and disaster recovery.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…

782 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question