Storage 102: more concepts of the IT enterprise storage

Carlos IjalbaSenior DevOps Engineer
CERTIFIED EXPERT
IT systems technologist, who likes to tinker with common IT problems: HW, software, virtualisation, cloud, networking & DevOps infra.
Published:
Updated:
This article is an update and follow-up of my previous article:   Storage 101: common concepts in the IT enterprise storage

This time, I expand on more frequently used storage concepts.
The purpose of this article series is to describe acronyms and abbreviations used commonly in the storage marketplace, so we can understand better this technologies. 

Some of the descriptions are just basic explanations of the terms (similar to a dictionary), but some others are a bit more detailed (like FPGAs as Storage Controllers, or SSD is Flash but Flash is not always SSD).
 
Table of Contents
SDDC - Storage Defined DataCenter
SDS - Software Defined Storage
JBOD
Storage Pools
RAID groups
VDisk
MDisk
Logical Volume Manager (LVM)
SAN Installation Considerations
SAN Requirements
Thin Provisioning
Thick Provisioning
Storage Oversubscription
Storage Overcommitment
Storage Over Provisioning 
SSD is Flash, but Flash is not always SSD...
FPGAs as Storage Controllers
FC vs iSCSI

SSDC

SDDC or Software Defined Data Center is the definition of a fully virtualized Data Center: Hardware (Servers), Networking (Switches, Firewalls, Routers), and Storage (JBOD, SAN and NAS).

SSDC is currently where the future of IT is evolving due to its better use of resources, granularity and unification of different platforms and vendors.

To better understand it, we can say that it will bring to the storage what VMware, Hyper-V and KVM have brought to the Server and OS space.
 SDS

SDS or Software Defined Storage is the term used to designate software based storage virtualizers.

SDS creates a SAN by virtualizing the storage found in servers or storage cabinets (SAN and NAS), and manages them all like a single pool of storage, easing administration and maximizing storage resources.

Examples of SDS products are VMware's vSAN, EMC ViPR & ScaleIO, IBM Spectrum Scale and Microsoft's Windows Storage Server.
  • vSAN has its own definition in this document, but it permits creating a virtual SAN with local storage from the ESXi hosts.
  • ViPR unifies storage from servers JBODs and storage cabinets from EMC, HDS, IBM, DELL, Oracle, Solidfire and NetApp.
  • Spectrum Scale uses GPFS to unify storage from NFS, SMB, HDFS, S3, and JBOD creating a single storage entity of all the storage of the organization, both local and cloud based.
  • Windows Storage Server lets us create a NAS from JBODs and other NAS.

JBOD

JBOD is an acronym for Just a Bunch Of Disks, usually related to the disks used by the servers which are local to them and that haven't been configured with RAID protection, but today we can also have dedicated JBOD enclosures, used to build SDS and NAS appliances (See SDS definition). 

Storage Pools

At the core of the storage subsystems there are disks (SAS, NL-SAS, SSD), but each disk by itself is just a hardware piece like a bolt. When disks are grouped together, they become a part of disk set, which are known by many names, depending on vendors: Storage Pools (IBM, EMC), Disk Aggregates (NetApp). It is usually a logical disc group, which then can be divided in RAID sets. 

Generally speaking, in the storage world, a storage pool is simply an aggregation of storage devices: disks, tapes, or files. For example a storage cabinet can use SSDs, SAS disks and NL-SAS to form a storage pool. You might separate them in three different storage pools for performance reasons. Under Tivoli Storage Manager or Spectrum Protect, you might have a storage pool formed by disk, another by tape and another by WORM discs, and migrate data between them to accommodate different retention periods or for audits (LOPD, SOX, Internal Security Audits, etc).

RAID groups

RAID, or redundant array of inexpensive disks, is a technology to deliver storage virtualization over a group of disks. The idea is to group various disks together to deliver a single or several virtual disks, that can be bigger than the original disks, to offer myriad benefits like recovery from disk failure, read speed, write speed, expansion, reduction, backup and replication, or cloning.

Today there are nine standard RAID algorithms or RAID sets, from RAID 0 to RAID 6, and combinations of RAID 0 and 1, known as RAID 10 and RAID 0+1. There are also RAID sets modified by manufacturers and vendors for special purposes, or to offer advanced or extended functionalities, like NetApps RAID-DP, IBMs VSR, Linux MD RAID10, or LSI RAID5E and RAID6E. To know better the added benefits of this non-standard RAID sets, refer to each vendor's documentation.

VDisk 

Virtual Disk: Names given by various appliances to RAID Volumes or LUNs. Used by Oracle, Sun, StorageTek, IBM, and LSI.

MDisk

Managed Disk: Name given by various appliances to RAID Volumes or LUNs. Used by IBM Storage.

Logical Volume Manager (LVM)

UNIX/Linux storage layer to provide storage virtualization. It permits managing FileSystems over multiple disks,
increasing and decreasing a FS (it can be on-the-fly or cold, depending on the LVM version and/or implementation). LVMs can also provide protection features such as mirroring, encryption and snapshots. The feature set present on the LVMs are dependent on the OS release and version.

SAN Installation Considerations

Installing a SAN requires attention to detail and careful planning; it's not something that can be done "on the spur of the moment". A project plan to take into account hardware, software, storage, and in some cases even applications must be prepared in advance, to guarantee its success.

The most important part is to study and meet the vendors' requirements and components compatibility to save you from unwanted surprises.

SAN Requirements

To integrate all components of a SAN seamlessly, vendors hardware and software compatibility requirements must be met, which include the following:
  • HBAs (firmware version, driver version, and latest patches)
  • Switch/es (firmware, FCPs, port licenses, dual PSUs or UPS protection)
  • Fiber Cables (the correct type to be used: Monomode, Multimode, and adapters: LX, SX)
  • Storage (firmware, management software, and latest patches)

And the most important part: Make up a map (Visio, Dia, Draw, etc) in advance with the connections, so it's easier to implement, cable and label at installation time, as well as to help produce a final documentation of the infrastructure.
    
As any project, remember the sixpence rule: Planning and Preparation Prevents Pretty-Poor Performance.

Thin Provisioning

Thin Provisioning is a technology that offers virtualized storage to a host on-demand from a storage pool. That is, a storage cabinet might offer a LUN of 100 GB of disk space to a server, but will only allocate blocks as the server uses them, in which case if the server only uses 40 GB, the server "thinks" that has 60 GB of space free, and this space is what the storage cabinet can use for other LUNs.

Thin provisioning is faster when allocating a new volume to a server since it creates an empty volume, and once the server starts using the volume blocks they get allocated and formatted in real-time, which causes a small performance hit.

Thin provisioning is a great way to save space, since most OS disks are rarely used at 100%, but if you do have space problems and most of your local OS disks are nearly full, avoid using thin provisioning, because if one of the volumes goes AWOL (an application problem creating huge logs, or a DB which cannot archive transaction logs, etc) and fills up, it might affect another thin volumes on the storage pool. If the one becomes full, the whole storage pool will become read-only (for safety and integrity reasons), perhaps stopping all the VMs using the storage pool.

So, IT IS a great technology, but with serious impact in availability if not monitored closely.

Thick Provisioning

Thick Provisioning is the name given to a Not-Thin Provisioned volume (basically what we used to call a LUN before Thin 
Provisioning technology). A Thick provisioned LUN is fully assigned and formatted on creation time, which makes a time delay, but once the server starts using it, it's faster than a thin volume, since it is already prepared.

Storage Over Subscription

Storage Over Subscription is the condition reached after giving more virtual storage than there is physically available. It is closely related to thin provisioning.

Thin provisioning is capable of presenting volumes bigger than what they really are, and storage over subscription is the condition reached when we have actually given more storage than available.

NOTE: VMware also calls Storage Over Subscription "Storage Over Provisioning" (explained in another point of this article), but for SSD manufacturers they are quite different storage concepts.

NAS_Iji_02.jpgStorage Over Commitment

For some providers (like VMware) Storage Over Commitment is the same concept as Storage Over Subscription; for others over commitment is the condition reached when storage has been over subscribed and it has actually been used (hence committed). In any case, both concepts can be considered synonyms.

Storage Over Provisioning

Storage Over Provisioning is a technology that sets aside extra free space on a SSD device, so its controller can later be used to increase SSD performance and lifespan by using a fixed portion of the SSD as buffer to perform operations, and to replace damaged cells within the SSD.

Basically enabling Over Provisioning on a SSD will cut the total SSD capacity in a percentage (usually 10%), to dedicate it to operations acceleration and to improve reliability, so it is a recommended best practice on SSD devices (usually on single SSD devices, as All-Flash storage cabinets might have other technologies to perform the same result).

SSD is Flash, but Flash is not always SSD...

SSD disks are built upon flash memory MLC or SLC chips (solid-state chips) packed in a hard drive's case, so traditional HD cabinets can be upgraded by replacing HDs for SSDs. The performance improvement when replacing HDs for SSDs is BIG. However HDs have a connection bus (SATA, NL-SAS, etc) that puts a bottleneck on the newer, faster access memory chips.

To overcome this bottlenecks, another ways to deliver solid-state storage at faster speeds appeared.

PCIe IOPs accelerator cards pack solid-state chips in a PCIe card, taking advantage of the faster PCIe bus to deliver internal storage to be used for accelerating IOP intensive applications like databases and business intelligence cubes.

All-flash storage cabinets use solid-state chips placed on modules (similar to RAM DIMMS) which might also pack additional hardware controllers to deliver RAID protection at chip level, rather than SSD level. This modules are then plugged to a purpose-built bus with a higher throughput and lower latency than existing all-purpose I/O buses.

Now the performance improvement when replacing HDs for solid-state systems with a purpose-built bus is HUGE.

FPGAs as Storage Controllers

Some performance improvements introduced by hardware vendors are replacing CPUs and storage controllers by FPGAs (field-programmable gate arrays). This kind of processor is designed for a low number of operations instead of the hundreds of instructions supported by a CPU, but the operations are optimized to perform huge parallel processes in a single clock, outperforming CPUs by orders of magnitude.

FPGAs therefore can run at lower clock speeds and consume a lot less power than CPUs, but perform the tasks they where programmed for much more efficiently than using general-purpose CPUs to do the same task. Companies that employ FPGAs for storage acceleration are Xilinx, IBM, and Altera (part of the Intel Group).

FC vs iSCSI

The fastest connection available actually for storage is Fiber of 16Gb (there's also 40Gb fiber but this is only used for Ethernet switch interconnects). It is still quite expensive, so 8Gb FC is more widely used as of today, but usually new implementations will use 16Gb. 32Gb FC (32GFC) should come out during this year 2016. Fiber is used for speed and distance as there's nothing faster when travelling on a wire.

VSAN Note: VMware's ESXi case thinks that VSAN can use PCIe-SSD cards connected directly on the host through an internal bus, so this should be faster than FC comms, and without HBA overhead (so this will actually be faster than using an externally attached SAN).

10Gb ethernet is comparable to 8Gb FC in speed (this is so because ethernet uses TCP protocol, with fragmenting packets, packet re-send, error checking, and FC uses FCP protocol), but iSCSI is cheaper and you can always use multiple connections, or LACP, and its a lot easier to install and administer than FCP.

Recommendations:
 
  • Usually go for iSCSI, unless you already have a FC infrastructure and knowledge, and want to carry on using it rather than learning something new yet again.
  • If you need the best performance available, then go for FCP 16Gb upwards or consider WMware VSAN with a lot of SSDs.


Thank you for reading my article, feel free to leave me some feedback regarding the content or to recommend future work. If you liked this article please click the big "Good Article?" button at the bottom.

I look forward to hearing from you. -- Carlos Ijalba ( LinkedIn )
4
2,429 Views
Carlos IjalbaSenior DevOps Engineer
CERTIFIED EXPERT
IT systems technologist, who likes to tinker with common IT problems: HW, software, virtualisation, cloud, networking & DevOps infra.

Comments (3)

Carlos IjalbaSenior DevOps Engineer
CERTIFIED EXPERT
Top Expert 2014

Author

Commented:
This article is an update and follow-up of my previous article:   Storage 101: common concepts in the IT enterprise storage.
Kyle SantosQuality Assurance
CERTIFIED EXPERT

Commented:
Great job!
Albert WidjajaIT Professional
CERTIFIED EXPERT

Commented:
Thanks for publishing such a great article here Carlos !

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.