Need LOTS-O-STORAGE space.  Can you help design it?

Helixx used Ask the Experts™
We are putting together a business plan for a new venture and there is one part that I need to get some clarity on.  Online storage.  This project might take up 500-800 TB of space.  I am a LAN network guy and none of my networks has anything over 2 TB so my questions is, what is the best way to create an environment that can handle that amount of data? Most of the files will be 500-600MB.  I have 5Gbps bandwidth that we can easily increase and plenty or physical storage space and power.

Direct questions:
1) I am most familiar with Windows but the 2TB volume size limit is a problem. Or is it?
2) I am not afraid of LINUX but what limitations would I run into with that?
3) What would the hardware setup on a large storage system like this look like?  I.E. what components would I need to put together?
4) I have a pretty strong feeling that in the end, we will have to hire a company to design/build something like this like IBM or someone but I want some info so I am not just accepting the recommendation blindly.

Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Lee W, MVPTechnology and Business Process Advisor
Most Valuable Expert 2013
I would suggest you do step #4 then before accepting the recommendation, you run it by others.

Frankly, for that kind of storage, you're probably looking at several NAS devices as being your best bet.  They will be more easily expanded and companies like Network Appliance will provide monitoring and maintenance.  Of course, they aren't cheap, but price doesn't (or at least shouldn't) be THAT much of a concern for a venture of this size.  Presumably, you need reliability - companies like Network Appliance can give that to you.
Dude you are going to need a serious SAN array
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Top Expert 2007

Call EMC for some help on this.

I hope this helps !
There are way too many questions in designing something like this to be realistically answered in a forum like this. I used to work for EMC and agree that they have the expertise, as do vendor companies like IBM, HP, Hitachi, etc. as well as consulting houses like Deloitte, Pomeroy, MTI, and many others.

A few general questions to consider:
1) How many servers will need access (just one, or many)?
2) What type of I/O throughput will be expected? Read-heavy, Write-heavy, or balanced?
3) Do you really need to see up to 800 TB in one volume? I've never seen anything like that, although maybe it's been done. Usually SAN Array volumes are maybe 500GB at most, then larger volumes are built on the OS by a logical volume manager.
4) How much money is able to be spent on this - are they willing to spend enough to do it *right*??

With a small number of servers, you may not need fibre channel switches (go direct-attached) but you need to do more than just allocate gobs of storage on the array, you need to profile your I/O patterns, do the RAID layouts based on acceptable risk, performance and cost, and discuss issues like HBA (host bus adapter) redundancy, failover, and load balancing. Lots of planning and decisions will be involved.
We have a  few 20TB setups (a few DS arrays) on IBM AIX, it works great on raid 10. Very reliable but it has a good cost factor. increasing the few hundred number of TB's is not a big deal.
Sun is slightly lesser expensive and as much reliable.
You can check other options as well. IBM & SUN give hardware & software support which is more than enough for any troubles (if at all) you might have once the setup is up & running. However  If you want to involve somebody to help you setup that as well, IBM & SUN are not the type (either too costly for customized services or not available) , EMC will be a better choice then.
For really huge filesystems like that you don't really want to put it on Windows.

If this is not just for an intellectual exercise but for an enterprise-class filesystem you probably should take the advice of the Experts that have suggested working with a major player in the storage field, to architect a solution for you.  

If money is tight, and you have time and energy, you could possibly "roll your own" - but you should put it on Linux, using one of the industrial-strength filesystems like JFS, or if proprietary works for you, AIX, which has JFS as its native filesystem IIRC.  It will regardless need large array cages.
Expanding on the filesystem thing, and why I suggested JFS - if you google "filesystems" you will come up with several pages that show a chart of all the filesystems developed so far (up to the date of the chart, of course) and what the limitations are.

JFS is the only one, to my knowledge, that has a theoretical *volume* limit in the multi-petabyte range, and if you want to grow the volume, will accept additional extents.

I think Sun's newly-opened-up filesystem might be a good alternative but I don't think it scales quite to that degree, unless I'm mistaken.
SGI (Silicon Graphics) has XFS:  (I thought I'd mention this one because I just accepted a job with them ;)

XFS File System
    Journaled 64-bit file system for IRIX®, with guaranteed file system consistency
Product Span:
    Available as a layered product on all systems which run IRIX 5.3 or later, except IP4 and IP6 platforms
Max. File Size:
   **** Designed to scale to 9 million TB with current hardware supporting scalability to 8000 TB on IRIX. Linux-64, 2 TB Max File Size. Solaris and Windows NT undergoing scalability testing.
Max. File System Size:
   **** Designed to scale to 18 million TB with current hardware supporting scalability to 8000 TB on IRIX. Linux-64, 500 file systems of 2 TB each. Solaris and Windows NT undergoing scalability testing.
Top Expert 2014
Are you building a newspaper archive? Just about the only thing I can think that needs this amount of storage. If so then there is indexing software out there that will allow you to spread it onto multiple volumes so the individual volume size limit doesn't come into effect. BTW, NTFS isn't limited to 2TB, just that each (virtual) disk in a stripe is limited to 2TB using the 32 bit OS, 64 bit handles larger individual disks.

You're unlikely to use a single SAN array system for this anyway, HP EVA maxes out at 120TB, the XP (HDS) goes to 300TB although you can chain SANs behind it for more. Maybe a NetApp filer is the way to go, although the internal storage capacity isn't that large you can hang a few SANs off it as back end storage.
the emc Symmetrix scales to over a petabyte in a single system... may be the way to go for hardware instead of aggregating SAN systems.
ShineOn - yes, the EMC Symmetrix DMX-3 can do this in one frame (multiple cabinets). This would make scaling up easier and possibly more cost-effective than having to buy multiple storage 'heads' (backplanes, controllers and cache memory)...although, more backend controllers and cache have to be added as more disk is added at certain thresholds. Still, the operating firmware/software would only have to be purchased once, which could be a significant number if one had to buy say, 10 of another type of array. It would be interesting to see what the numbers looked like in that kind of comparison.
The biggest part of your need is missing, which is performance.

How fast is the needed access?  If you will be using databases such as Oracle, the system will need fast access disks.

There may be solutions other than a SAN is you are doing something like archive storage where a few seconds could be taken to get to the data.  Things like tape, or optical storage robots would cost a lot less than the amount of disk space you are thinking about.

Also, don't confuse network bandwidth with storage space.  Your bandwidth needs will be determined by the success of your venture, while your storage space is determined by your product.  For example if you are selling music or video, you will still need the storage space even if you don't sell any...

A business class SAN for medium performance will be around 250,000 dollars for 20TB.  And although there will be some economy of scale, you will still be looking at millions of dollars worth of hardware.

As important as the storage hardware, is the environment.  You will need a quality server room, with cooling and proper power, and to insure business continuity, backup power.

I certainly wouldn't use windows for this, but then I wouldn't use windows for any server environment, although it makes a fine desktop...

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial