Storing massave data for users configuration

Hello Experts--

When building a service and having users in the 100's of thousands, what is the most efficient way to store the data? the data im talking about would be things like favorite color or image in the back ground, font they like to use, a company logo, rows sorted a certain way all this data adds up and to my knowledge it would slow down the program? groups and memberships. would it be better to store it in a database or XML file ? or am I missing a new technology that has come out? any ideas would help me dearly so i can stop stressing about this. Thanks guys.

Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Dave BaldwinFixer of ProblemsCommented:
It starts with having the fastest hardware available and a team of engineers to maintain it and the software.  Google and Facebook use just about every known technology but they have the resources to create optimized versions of their web servers and databases and even programming languages.  A technology called Hadoop for...
for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.
Hadoop and related technologies run on groups of servers, not just individual computers.
Ray PaseurCommented:
If you already have hundreds of thousands of users and you're already storing this data, then you already know the answer, but I'll try to add my 2 cents to the overall questions of data storage, performance and application design.

Text and numeric data is very concise and dense.  An image that contains the same information is about 10x larger, and a video with the same information is about 100x larger.  These orders of magnitude are only rules of thumb, but they are close  enough for understanding the broad issues.  Concise information belongs in the database.  Larger information belongs in the file system; put a URL pointer into the database so that your database can be the canonical reference point.  It either contains the information you need or it points to the information you need.

If you've never analyzed database performance, find a local community college and take a class about database administration.  There are a number of principles to follow -- good indexes, succinct queries, etc.  It's a semester of learning and we can't teach you here, but I can give you a sense of its importance: Whenever a web site has performance issues, the cause is always in the I/O subsystem and in modern web applications that almost always means the database.  So make sure you have a good understanding of what each and every query is doing.  At scale, you will probably want to denormalize the tables and throw hardware at the problem areas.  For a better understanding of the issues, make a Google search for the exact phrase, "Should I Normalize My Database?" and read the interesting commentary on both sides of the issue.  You want to read all of the links in the first page of search results.

Databases and XML files are different technologies for different purposes.  Databases are relational; XML is hierarchical.  Database are for storage and retrieval of data; XML is a markup language for data transport (and has been mostly supplanted by JSON which is more efficient).  We don't store data in XML files.  You shouldn't either.

HTTP requests to a server generally create a workload on the server.  This workload can be timed, quantified, measured and optimized.  One of the most valuable optimization tools is cache.  Consider a complex query that required searching several database tables (or even several databases).  Perhaps this request is very slow - two seconds or more in the database engines.  You would not want to rerun this complex search if you could avoid it.  Instead you would want to cache the results of the search and serve subsequent requests from the cache.  You may want to learn about MemcacheD and Redis (Google 'em).  Used correctly they can cut the server workload in half.  The principles of cache are explained in this article.

The design principles of modern object oriented programming are your friend.  Learn about the SOLID concepts and use the Model-View-Controller design to separate the functionality of your system into modular components.  With greater modularity comes greater flexibility, including the flexibility to optimize performance.

You may want to think about structuring your application in the form of a core of knowledge and a view layer for presentation.  Perhaps the Model in your Model-View-Controller design should be organized as an API, receiving requests and returning the data, but unconcerned about the details of output formatting.  API design has numerous advantages, one of the biggest is that you can have a programming staff working on the View and another programming staff working on the Model -- at the same time.  All they have to do is agree on the interface (contract) between these components.
APIs have different ways of responding to requests.  Today's best practices are RESTful APIs that respond in JSON.  Some examples are available through this article.

Hope that helps, ~Ray
F PCommented:
Relational database models exist for this purpose and would be the best solution over XML. You need to separate out, extract, the pieces of the user which you want to have related and make tables out of them. If you have a user table, it wouldn't include anything but a reference to the security table object which can define the security there, and even though you have several hundred thousand users, you just need to make sure that you have the proper indexes and your queries are optimized to work within your model. Functions, Stored Procedures, Events, and other parts of the database are extremely helpful, and a DBA would be able to manage this for you if you're unfamiliar.  Also make sure you use the proper storage engine in the DB, and you keep the maintenance up through partitioning and truncating. You also might want to offload archival data into WORM media or tables for reference only.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Web Development

From novice to tech pro — start learning today.