centos - large data LVM questions

I have to put together a storage system for 20TB - 50TB data with a CentOS head server.

Can someone explain the best way to provision and configure the storage?

The data will be almost exclusively read-only, with growth of 3-5TB/yr.

I will have a SAS-connected SAN with either 1TB or 2TB disks.  The SAN has a max of 16 disks in a RAID5 (or RAID6) array, so I assume I'll need more than 1 array at the storage level.

This means several LUNS presented to the OS.

Assuming I use the 1TB disks, and create 3 different RAID 5 arrays, I'll have 45TB of usable space.

Do I create smaller LUNS to present to the OS, and LVM to create a single large storage pool?

Does the entire Volume Group max out at 16TB?  or is each LV a max of 16TB?
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Seth SimmonsSr. Systems AdministratorCommented:
...I'll have 45TB of usable space

how do you figure?
if that san supports up to 16 disks, you will only have some 14tb usable with 1tb drives; about 28tb with 2tb drives and a bit less of both if raid 6

Does the entire Volume Group max out at 16TB?

depends on what version and what file system
RHEL/CentOS 6 has a 16tb file system limit for ext3/4 (due to version of e2fsprogs package)
you can format xfs beyond 16tb which should be very good for mostly static data

the more you break up your raid groups, the less usable space you will have since you will accumulate more parity drives
snowdog_2112Author Commented:
Thanks for the response - let me clarify my questions:

The san supports 16 disks in a *single* raid5 array, not 16 disks total.  I will need expansion shelves to accommodate  48 (or more with hot spares), but can configure 3 arrays of 16 disks.

I'm more concerned with how to address the disk space from the OS.

Assuming I have 45TB of usable disk space - how do I best configure that in the OS?

Will I need several LVM's smaller than 16TB and break up the data?
Best is not to allocate all at the same time. You can add LUN of say your data +50%, then extend it AS/IF needed.
Why dont you get something like FreeNAS with de-duplicating filesystem, and NFS support so you get maximum kick from your storage?
IT Pros Agree: AI and Machine Learning Key

We’d all like to think our company’s data is well protected, but when you ask IT professionals they admit the data probably is not as safe as it could be.

snowdog_2112Author Commented:
Same question applies....it doesn't matter *when* I break the 16TB barrier.  I *will* break it.

The initial data seed will be >16TB (didn't mention in OP, sorry), so I need more than 16TB right away.

The question is - *HOW* do I extend it past 16TB, not AS/IF I will need to.

Your question is not entirely clear.

As long as you are on 64 bit CPU and recent OS/software versions you are not going to hit lvm limits.

If you had

3x  (16x1TB disks in RAID5), you could create 3x 15TB PVs, create a single VG, and a single LV. Then you need a file system and XFS  would be OK.

I would probably personally prefer 8 disks per RAID5 LUN. And multiple dual port HBAs for both multi patching and load-balancing. For this to work best, you need to check how your SAN storage supports LUN failover.

And none of this will work that great if you have zillions of tiny files.  You need files of a decent size.

CentOS 7 will default to XFS, that solves all your concerns.
snowdog_2112Author Commented:
Thanks for the replies - you may have answered my question without knowing.

In short - I need *more than 16TB* usable space in a single "directory" - which may contain a single DB larger than 16TB or a "zillion" smaller files.

pitoren: I can create 3 PV's (i.e. the storage presents the LUN's to the OS) of 15TB, then a single volume group from the PV's, and a single Logical Volume of 45TB, but I need to format the partition with XFS.
(the storage is a dual-controller SAN with multi-pathing to the OS, and the SAN supports up to 16 disks per RAID-5 LUN).

gheist: you're saying if my OS is CentOS 7,  it will default to XFS for such a LV.

Please confirm - I'll break out points
CentOS7 defaults to XFS for any install. You can still choose ext2 if you think filesystem log is a virtue, or ext4 to stay in stone age.

I think ghesit got his ext2 and ext4 mixed up there.

In RHEL7 the 16TB limitation has been removed for ext4, so you can use xfs or ext4 for RHEL7.

RHE6 your only options is xfs.

I'd strongly suggest to go with xfs - Redhat support it to 500 TB or something. In RHEL7 its their default filesystem.

Be real careful if you have a single directory with zillions of small files - every time a file is created or deleted the directory inode has to be updated. There's a limit how fast things can go in that scenario.

good luck

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
He wants to exceed 50TB, so RHEL7 EXT4 will not suffice.
gheist: That's not written in the post, which starts with

"I have to put together a storage system for 20TB - 50TB data with a CentOS head server."

and in another post the OP suggest he\ll use 3x15TB LUNs.

But as I said, I'd suggest to use xfs, with the caveats written. You can obviously grow both xfs and ext4, to their respective limits, if more storage becomes availalble needed.

And think about how you are going to back the data up.  (If you think "I don't need backups" I don't want to be in your shoes if something goes wrong).

Anyway they need to partition data.. About data locality, map-reduce and all that stuff.
snowdog_2112Author Commented:
Thanks for the replies - sorry for my delays.  This has been a back-burner/need-it-now/back-burner case.

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.