Solved

Can you explain this statment I found regarding raid0 and ext2/3 block group sizes??

Posted on 2009-04-03
3
329 Views
Last Modified: 2013-12-15
I've found this article all over the web that describes how to keep an ext2 file system (laid over a two disk RAID0) from having all of the starts of the block groups on one disk. However, I can't find any definitive information as to how the math works here or whether it is even still relevant for ext3.

Here is one of the places I've seen it: http://tldp.org/HOWTO/Software-RAID-HOWTO-5.html

Here is the section in question:
-----------------------------------------------------------------------------------------
RAID-0 with ext2

There is more disk activity at the beginning of ext2fs block groups. On a single disk, that does not matter, but it can hurt RAID0, if all block groups happen to begin on the same disk. Example:

With 4k stripe size and 4k block size, each block occupies one stripe. With two disks, the stripe-#disk-product is 2*4k=8k. The default block group size is 32768 blocks, so all block groups start on disk 0, which can easily become a hot spot, thus reducing overall performance. Unfortunately, the block group size can only be set in steps of 8 blocks (32k when using 4k blocks), so you can not avoid the problem by adjusting the block group size with the -g option of mkfs(8).

If you add a disk, the stripe-#disk-product is 12, so the first block group starts on disk 0, the second block group starts on disk 2 and the third on disk 1. The load caused by disk activity at the block group beginnings spreads over all disks.

In case you can not add a disk, try a stripe size of 32k. The stripe-#disk-product is 64k. Since you can change the block group size in steps of 8 blocks (32k), using a block group size of 32760 solves the problem.

Additionally, the block group boundaries should fall on stripe boundaries. That is no problem in the examples above, but it could easily happen with larger stripe sizes.
----------------------------------------------------------------------------------------------

I've tried various modifications to a methodology that basically tries to divide some form of the size of the block group by the size of the stripe or the "disk stripe product". Both by the numbers and logically though, I can't seem to grasp what makes some of the above schemes okay and some not. I also don't get, if we're splitting the block group to get the next one start on the second disk...how on earth I'm supposed to keep block group boundaries aligned with stripe boundaries!

TIA for any insight.
0
Comment
Question by:nekatreven
  • 2
3 Comments
 
LVL 7

Expert Comment

by:computerfixins
ID: 24064536
er, well, i think what size files will your system be using is more important?

You need to determine the average file size used; when i say "used" i mean files that are being read/write to hdd at any given time.

Typically bigger is better on striping on read performance, i use around 512k for my RAID5 gamer computer...  i/o is around 90MB/s non buffered, writes are almost 75% slower (more due to my cheapy raid controller)

The rule o thumb is

file_size_average / (2xhow_many_drives_in_raid)

4024k file sizes / (2 x 4drives) = 503k

making a stripe to small is always much worse then making it too big... So error on side of bigger...



0
 

Author Comment

by:nekatreven
ID: 24064756
Well I think using raid5 with an odd number of disks inherently solves the problem for many setups. This is in cases where the first block group starts on disk 1 and ends (many stripes later) on disk 2...thereby making the next group start on disk 3.

This is actually going to be a raid50. We have a hardware raid card exporting 2 raid5s that are each several TB. The best stripe size I've found for sequential writes on them is 64k (which gives ~120Mbyte/s, depending on the chosen stripe for the raid0 over them.) The read is always over 200Mbyte/s so I'm mainly looking to optimize writes.

AFAIK ext2 and 3 both use the first part of the block groups frequently when determining where a file is located within that group. With the raid0 the groups will be evenly distributed across the block group, it seems like block groups will always start on the first raid5 and end on the second, which would in face cause uneven access between the raid0 members. I'm sure this is only causing a marginal loss in performance (if any at all)...but at this point it is as much about me wanting to know the answer as it is about optimizing performance.

In the example from the article, I can see how reducing the block group size by 8 blocks would cause the next group to start just before end of the second raid5 (many stripes later than the group started)...but eventually...you will have block groups that start and stop half way through a stripe (and obviously that means you'll also have block groups spanned across stripe boundaries). With that in mind...I do NOT understand how the article can then turn around and say not to get stripe and block group boundaries out of alignment.
0
 

Accepted Solution

by:
nekatreven earned 0 total points
ID: 24100570
I have found the solution. Regardless how this RAID/ext wisdom was logically validated and applied in the past, I have discovered that it is outdated knowledge. It now bothers me that so many RAID articles mention it, because from what I can find it no longer applies at all.

From the mke2fs(8) man page:(found at http://linux.die.net/man/8/mke2fs)-----------------------------------------------------------------------------------------------------
-g blocks-per-group: Specify the number of blocks in a block group. There is generally no reason the user to ever set this parameter, as the default is optimal for the filesystem. (For administrators who are creating filesystems on RAID arrays, it is preferable to use the stride RAID parameter as part of the -R option rather than manipulating the number of blocks per group.) This option is generally used by developers who are developing test cases.
-----------------------------------------------------------------------------------------------------

So according to this, the stride option takes the original issue into account and resolves it internally. This removes the need for an administrator to consider or modify how the block groups are laid out. I was already using the stride option in its correct capacity...so I did not have to make any changes.

I STILL do not understand how the original logic worked without knocking the stripes and block groups out of alignment...but I am sure glad I don't have to care!
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Create your own, high-performance VM backup appliance by installing NAKIVO Backup & Replication directly onto a Synology NAS!
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now