Can you explain this statment I found regarding raid0 and ext2/3 block group sizes??

Posted on 2009-04-03
Last Modified: 2013-12-15
I've found this article all over the web that describes how to keep an ext2 file system (laid over a two disk RAID0) from having all of the starts of the block groups on one disk. However, I can't find any definitive information as to how the math works here or whether it is even still relevant for ext3.

Here is one of the places I've seen it:

Here is the section in question:
RAID-0 with ext2

There is more disk activity at the beginning of ext2fs block groups. On a single disk, that does not matter, but it can hurt RAID0, if all block groups happen to begin on the same disk. Example:

With 4k stripe size and 4k block size, each block occupies one stripe. With two disks, the stripe-#disk-product is 2*4k=8k. The default block group size is 32768 blocks, so all block groups start on disk 0, which can easily become a hot spot, thus reducing overall performance. Unfortunately, the block group size can only be set in steps of 8 blocks (32k when using 4k blocks), so you can not avoid the problem by adjusting the block group size with the -g option of mkfs(8).

If you add a disk, the stripe-#disk-product is 12, so the first block group starts on disk 0, the second block group starts on disk 2 and the third on disk 1. The load caused by disk activity at the block group beginnings spreads over all disks.

In case you can not add a disk, try a stripe size of 32k. The stripe-#disk-product is 64k. Since you can change the block group size in steps of 8 blocks (32k), using a block group size of 32760 solves the problem.

Additionally, the block group boundaries should fall on stripe boundaries. That is no problem in the examples above, but it could easily happen with larger stripe sizes.

I've tried various modifications to a methodology that basically tries to divide some form of the size of the block group by the size of the stripe or the "disk stripe product". Both by the numbers and logically though, I can't seem to grasp what makes some of the above schemes okay and some not. I also don't get, if we're splitting the block group to get the next one start on the second on earth I'm supposed to keep block group boundaries aligned with stripe boundaries!

TIA for any insight.
Question by:nekatreven
  • 2

Expert Comment

ID: 24064536
er, well, i think what size files will your system be using is more important?

You need to determine the average file size used; when i say "used" i mean files that are being read/write to hdd at any given time.

Typically bigger is better on striping on read performance, i use around 512k for my RAID5 gamer computer...  i/o is around 90MB/s non buffered, writes are almost 75% slower (more due to my cheapy raid controller)

The rule o thumb is

file_size_average / (2xhow_many_drives_in_raid)

4024k file sizes / (2 x 4drives) = 503k

making a stripe to small is always much worse then making it too big... So error on side of bigger...


Author Comment

ID: 24064756
Well I think using raid5 with an odd number of disks inherently solves the problem for many setups. This is in cases where the first block group starts on disk 1 and ends (many stripes later) on disk 2...thereby making the next group start on disk 3.

This is actually going to be a raid50. We have a hardware raid card exporting 2 raid5s that are each several TB. The best stripe size I've found for sequential writes on them is 64k (which gives ~120Mbyte/s, depending on the chosen stripe for the raid0 over them.) The read is always over 200Mbyte/s so I'm mainly looking to optimize writes.

AFAIK ext2 and 3 both use the first part of the block groups frequently when determining where a file is located within that group. With the raid0 the groups will be evenly distributed across the block group, it seems like block groups will always start on the first raid5 and end on the second, which would in face cause uneven access between the raid0 members. I'm sure this is only causing a marginal loss in performance (if any at all)...but at this point it is as much about me wanting to know the answer as it is about optimizing performance.

In the example from the article, I can see how reducing the block group size by 8 blocks would cause the next group to start just before end of the second raid5 (many stripes later than the group started)...but will have block groups that start and stop half way through a stripe (and obviously that means you'll also have block groups spanned across stripe boundaries). With that in mind...I do NOT understand how the article can then turn around and say not to get stripe and block group boundaries out of alignment.

Accepted Solution

nekatreven earned 0 total points
ID: 24100570
I have found the solution. Regardless how this RAID/ext wisdom was logically validated and applied in the past, I have discovered that it is outdated knowledge. It now bothers me that so many RAID articles mention it, because from what I can find it no longer applies at all.

From the mke2fs(8) man page:(found at
-g blocks-per-group: Specify the number of blocks in a block group. There is generally no reason the user to ever set this parameter, as the default is optimal for the filesystem. (For administrators who are creating filesystems on RAID arrays, it is preferable to use the stride RAID parameter as part of the -R option rather than manipulating the number of blocks per group.) This option is generally used by developers who are developing test cases.

So according to this, the stride option takes the original issue into account and resolves it internally. This removes the need for an administrator to consider or modify how the block groups are laid out. I was already using the stride option in its correct I did not have to make any changes.

I STILL do not understand how the original logic worked without knocking the stripes and block groups out of alignment...but I am sure glad I don't have to care!

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

I previously wrote an article addressing the use of UBCD4WIN and SARDU. All are great, but I have always been an advocate of SARDU. Recently it was suggested that I go back and take a look at Easy2Boot in comparison.
Fine Tune your automatic Updates for Ubuntu / Debian
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question