Files hosting servers setup

janime used Ask the Experts™
we are contemplating to start building our files sharing portal and I am curios to see what would experts suggest in terms of the servers setup - mainly disk space.

- For now we decided to go with Linux/Unix OS, MySQL, PHP.

What we approx. expect to have within 2 years:
A. 5,000 members - each member account will have approx. 2 GB of dedicated space
B. Files stored will be each up to 50Mb (zipped files will be split into smaller units). We will not store/share any big files like movies. It will be mostly pictures, graphics, flash, small videos.
C. Our pages will mostly contain pictures. Users will be browsing through many pictures (mostly thumbnails). As for the traffic not sure yet, let's say we will have 20,000 visits a day.

- What would you suggest for the server setup (processor, RAM, drives). For now we plan to go with RAID 1.
- How is it technically done, I mean storing the files across 3 or 4 servers.. Or how the storage (disk space) should be set up?
- We will need something which will support fast uploading and downloading files. What connection should we go with 100 Mbs or 1Gbs? What bandwidth to expect? What data center to use (mainly for North America users).

Please, think of it from the "growing perspective" point of view - starting out with one server and trying to reach our first 1,000 members.

Thank you.
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Plan for the Purchase of good servers like HP Gen 8 servers which supports all the levels of Raids.

Configure your Network for Maximum Bandwidth for Users and servers end 1 GBPS would be idle.

Raid 1 is mirroring which will need the same size of Disk Capacity .. All the data will be Mirrored and has redundancy for user data .

Raid 5 will be Idle in your situation where data will be Striped and Parity will be used to store data .. Minimum 3Hard disk is required for Raid 5..


Thank you guys, both comments are useful.

Still waiting for another inputs/suggestions..

What I'd like to hear is what exactly the best setup should be to start with so we won't be having any problems to keep ADDING SPACE (drives) as we go on.
I just don't feel it's a good idea to spend lots of money and buy a new top notch server while we are still developing and gaining our new members (or maybe yes if that's a necessity).
So again I am looking more at some good STARTING point.

Based on your posts, for now we have agreed on using 100Mbit/1GB port and working with a provider that can continuously/gradually increase the bandwidth according to our needs (we are starting off with 10TB).

You can buy servers (for quite cheap) that have a lot of drive bays.

For example Pogo ( has a server ( that can store up to 72TB itself for only about $5K.

So you could buy 2 of these (for redundancy) and only populate the first couple of drives.  As your needs grow, you just add drives and rebuild the RAID on the fly (you can do all of this without taking the server offline - performance will just drop a bit during the rebuild).

As long as you plan to stay smallish this should work fine.

What we found was that scaling a solution like this to support millions of monthly users was very very hard.  The issue is that all users require access (potentially) to all files, so you need some sort of network storage solution.  We tried NFS and a few other approaches.  All struggle badly to provide good performance (through the network) as you reach high loads.  Obviously for redundancy you need to write each file to both (or <n> servers) and so each write generates a lot of load.  The network traffic tends to rise faster than the linear number of users - so all is great until suddenly you're over the limit and you can't solve it without a full redesign...all while traffic is rising daily.

This is why there are companies like Netapp that sell NAS or SAN solutions that start at $100K and go up from there.  It's a hard problem.

So before you get too deep into this, also do the math to use a third party for this storage.  We eventually went with Amazon's S3 which substantially reduced our storage costs and gave us effectively infinite scalability for free.  There are lots of other cloud storage providers to consider, depending on your needs, but S3 is a solid place to start.  Our users are unaware of this choice - to them they upload a file to our servers (which we then forward to Amazon) and then we use a CDN provider for downloading the files, with Amazon as the origin server.  The end user never sees "Amazon" in the equation.

It all scales well and was much cheaper than the build-your-own that we started with.  It also means I can sleep at night because scaling this file I/O layer was proving our biggest headache as we went from a few thousand users to millions.

Hope that helps,



Thank you Doug! Finally a very valuable input!
This is something that I was expecting to hear.

I'll wait for one or two more replies, but Doug, you have definitely earned some of the points.


Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial