Storage Array Stripe Size, ZFS Recordsize, Oracle Block Size

G-N
G-N used Ask the Experts™
on
Good Afternoon Everyone,
I am trying to determine the best way forward to provisioning a Sun T5240 Server running Guest LDoms, Sun StorageTek 6140 Array and Oracle 10.2.0.4.
Background: We will commission new LDOM server environments that will host Oracle 10g Database Instances – the current DB block_size set for separate individual databases is 8K and 16K
We will use the StorageTek 6140 array for all space requirements
Database sizes range from 200MB to 800MB
Read and Write ratio ranges and I would go with 50/50 at this time

My Questions Are:
1.      What is the best RAID level we should use on the 6140 (RAID1 / RAID5)
2.      Should one let ZFS manage the RAID and rather present the whole disk on the array as a single LUN to the Control Domain – Allocate that LUN to the Guest and then use ZFS to create and manage the RAID level
3.      When we create a logical drive on the 6140 – what will be the best Stripe Size given the below recommendations on the ZFS recordsize

4. My Main Concern here is how to provision the Stripe Size on the underlying disks on the array before presneting them to the Control Domain

Match Oracle Solaris ZFS record size to Oracle database block size - The general rule is to set recordsize = db_block_size for the file system that contains the Oracle data files.
When the db_block_size is less than the page size of the server, 8 KB on SPARC systems and 4 KB on x64 systems, set the record size to the page size. On SPARC systems, typical record sizes for large databases are as follows:

File System      Record Size
Table data      8KB
Redo logs      128KB (Default)
Index files      8KB
Undo data      128 KB (default) or sometimes 8 KB
Temp data      128 KB (default)
Archive data      128 KB (default) compression on
 
Many thanks and I look forward to your replies
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
DavidPresident
Top Expert 2010
Commented:
You really need to go ZFS, and get a JBOD array. This will provide significantly better performance, facilitate hot snapshots, RAIDZ2 (like RAID6, but better), and a significant set of flexibility when it comes to provisioning and reprovisining.

Also if you are running the OpenSolaris, you have built-in dedup along with compression.  ZFS will be much more efficient as it will cache I/Os better and distribute IO requests more evenly.  
If you want a answer specific to the STK, then throw in specifics on the disk drives.  Performance characteristics vary considerably depending on drive type.
Craig SharpLead Enginneer - Unix Server Team
Commented:
What I do is present whole luns to the server and build zfs on top.  I let the array (EMC in our case) handle all the raid configurations.  The ZFS file systems are just standard. .  Each lun on the back end is comprised of either raid5 or raid10.  We use raid 5 for the Oracle data areas and raid10 for the redo logs.

As far as the ZFS recordsize, I use 128K for logs and hotbackups and 8K for Oracle data areas.  This is per the Oracle ZFS best practices.

I would have to research the strip size question further as EMC is quite ambiguous on how they calculate this value.

Hope this helps.

Craig
DavidPresident
Top Expert 2010

Commented:
Well, if EMC is doing the RAID work, then you're really going to have to take it up with them.  Since their storage is essentially a black box, then you don't really know enough to do proper tuning. Best you can do is benchmark.  At least with ZFS, you can easily try different permutations
How to Generate Services Revenue the Easiest Way

This Tuesday! Learn key insights about modern cyber protection services & gain practical strategies to skyrocket business:

- What it takes to build a cloud service portfolio
- How to determine which services will help your unique business grow
- Various use-cases and examples

Craig SharpLead Enginneer - Unix Server Team

Commented:
He is running the Oracle / Sun array, not EMC.  Just a side note, the new EMC Celerra arrays are fully open.  Its nice to be able to see right to the disk level of the raid groups, etc.  I digress....

Not knowing the Sun array, it is capable of calculating the stripe size that is optimal for the size of disk, raid level and block size of data being written?  If so then let the array make the stripe calculation.  If not then since this is a Sun array, I would defer to Sun to advise you on proper sizing.
DavidPresident
Top Expert 2010

Commented:
What is the Model of the Sun array, they OEM quite a few products based on several different controller engines.
G-N

Author

Commented:
Many Thanks to those who have responsed. As I am new to thid website I am trying to add other comments to questions asked - I hope I am not infact closing this open discussion.

We have already purchased the StorageTek 6140 Array and will need to utilize it.
The Array comes fully stacked with 16 X 300GB 15K 3.5" disks, 2 X Disks will be used as Spares and the rest for RAID, possibly 5 and 10
2 X Controllers managing the disks each with 1 GB cache

I feel that getting right down to the disk layer and stripe size unit you could and should better understand how all the elements of a running system fit togther and I think the array disk layer is very important in the decision making process

We have the ability to setup and configure the stripe size of any RAID configuration on the 6140 - you could go as far as creating your own profile to accomodate your required settings and sizes, amount of disks and so on

Does anyone know when does Oracle flush to disk and if it actually does flush whats is the size "trigger" - is this an oracle tunable parameter? - this will help in the decision making process of the underlying stripe size

Thanks
G-N

Author

Commented:
I would like to emphasize on Craig’s comments:

“Each lun on the back end is comprised of either raid5 or raid10.  We use raid 5 for the Oracle data areas and raid10 for the redo logs.
As far as the ZFS recordsize, I use 128K for logs and hotbackups and 8K for Oracle data areas.  This is per the Oracle ZFS best practices.”

We will look to implement a similar configuration as stated by Craig above – the differences being:
1.      Possibility use RAID10 for all data and storage requirements
2.      Separate Redo pool possibly consisting of a single mirrored disk
3.      Separate Archive pool possibly consisting of a single mirrored disk and zfs options set (Compression)

The below tables identifies the best values for Oracle block sizes running on Solaris – BUT STILL!!! I am struggling to find any valuable information on the underlying DISK – LUN, STRIPE UNIT SIZE

Can someone clarify the differences between – Stripe Size and Segment size

As discussed yesterday we can create profile with the desired stripe sizes and segment sizes

Interesting details below

• Match Oracle Solaris ZFS record size to Oracle database block size - The general rule is to set recordsize = db_block_size for the file system that contains the Oracle data files.
• When the db_block_size is less than the page size of the server, 8 KB on SPARC systems and 4 KB on x64 systems, set the record size to the page size. On SPARC systems, typical record sizes for large databases are as follows:

File System      Record Size
Table data      8KB
Redo logs      128KB (Default)
Index files      8KB
Undo data      128 KB (default) or sometimes 8 KB
Temp data      128 KB (default)
Archive data      128 KB (default) compression on

Look forward to some more responses – thanks to those that have responded
<trollmode>
Why wasting time on optimizing a 800MB database instead of buying 1GB RAM for all data to stay in memory (some SQL would allows for a full scan of all tables at database starting time) ?
</trollmode>

1/ Best raid level
RAID 1/10 for all area expecting some volume of random writes
RAID 5 for backup area only (raid 5 write penalty has a too huge impact on your storage performance otherwise)

2/ Zfs ?
Why not ? Go if your operation team know it enough !

3/4/ Stripe size
Using RAID 1/10 means you don't really care of the underlying stripe size.
Of course, aligning your partition to a stripe size multiple is a cheap win but I will rather use a LARGE stripe size (512KB for example) using RAID1/10 storage subsystem because it will slightly decrease the IOPS capacity but really increase the sequential throughput capability

Regarding RAID 5/6/50/60, you should not use it for anything else than backup or WriteOnceReadMany usages, but if you do, you should equal your stripe size with your client io size to allow for your large HBA cache (backed by a battery) for some optimization of the io writes
G-N

Author

Commented:
Question has not had many responses.

Have decided a design

Thanks to those who did respond
Hi, would you be kind enough to share the design with us ?

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial