Data caching, performance and security

Published:
I guess that all of us know that caching the data usually increase the performance, but I worried if all of us are aware about the risk that caching the data provides and how to minimize this.  That’s the reason why I decided to write this short article about it

Let us analyze the situation
We are using an HDD (Hard Disk Drive) device connected to our OS via an iSCSI server.
The server presents this device as an LU (Logical Unit) through iSCSI having a RAID controller.
The HDD LU has been formatted using NTFS.

So a simple communication scheme will therefore look this way:

client OS (device formatted with NTFS)
<->
iSCSI initiator
<->
iSCSI target
<->
RAID controller.

Now we can look closer into the RAID controller configuration. Most of the RAID controller hardware provides caching – please keep in mind that we are analyzing a configuration where volatile memory RAM is used for cache. We could meet many type of names for such functionality calls such as Write Back (WB) cache, Unit Write Cache or just Cache.  Unfortunately a lot of RAID controllers have this function even if they don’t have a BBU (Battery Backup Unit).

What form of BBU is necessary in the situation where cache is used on the RAID controller? Let's see: The OS writes the data to the device connected through the iSCSI and waits for confirmation that the operation has finished successfully. The iSCSI initiator sends the data via the LU to the iSCSI target which then sends the data to the RAID device. This is the climax point because the iSCSI target gets confirmation from RAID controller that the data has been written successfully and sends this confirmation information back to the iSCSI initiator which then sends this information back to the OS. However, at this point in time the data are not yet on disk drives connected to the RAID controller but are still in the cache.  So if we will face problems with power supply at this point in time then we will lose the data. In this situation, to minimize the risk of losing data, it is necessary to use a UPS for the whole server machine and it will be best to use a RAID controller with BBU. Thus maximizes data protection without sacrificing performance.

The type of scenario where cache could be used and could potentially cause some risk of data lost is configuration of LU in iSCSI target. Similar to the RAID configuration, we are able to set up a WB in an LU configuration while adding it to iSCSI target, or otherwise turn off the Write-Through (WT which is in opposite to WB). The write sequence and waiting for confirmation will be similar but shorter: The OS writes the data to the device connected through iSCSI and waits for confirmation that the operation has finished successfully. The iSCSI initiator sends the data via the LU to the iSCSI target, which automatically sends back confirmation to the iSCSI initiator that the write operation has finished successfully. In this case only a UPS can minimize the risk of lost data.

Of course, I will not describe here about combination of using redundancy of power supply or UPS because this is not the goal of this article.

Let's closer look into the OS device
It is formatted with NTFS and connected to this system through the iSCSI initiator. A few times I have faced problems mentioned by our customers that they have written the data into the device after it created a snapshot on the server and made a backup of this snapshot to the tape device. After a few months they couldn’t find the changed data on their tapes!  This is because NTFS as other file systems uses cache which is dropped to disk every few seconds. So we have at least two solutions here.  The first is to wait a few seconds before starting to make a backup of the LU on the server side, or the second option is to use software for dropping NTFS cache into the device on demand, such software you can find here. If we are using Linux/UNIX OS and other filesystem with similar iSCSI environment as described above we can use the provided system utility sync to get the same result.

Conclusion of this article
Always analyze potential risks of data loss and minimize it as much as possible by using alternative power of source and always be sure that important data which must be backup are consistent. Good luck!
0
2,941 Views

Comments (1)

CERTIFIED EXPERT

Commented:
You have covered some of the issues, but one area you havent covered very well is the why. You just talk about data loss, but data and filesystem corruption and referential integrity are also issues involved.

You also havent covered all the places that data is cached in your scenario and how the data can be protected in those places.

Write from Application -> OS Cache -> Network Cache -> NIC/Infrastructure/NIC -> Network cache -> OS Cache -> RAID Controller cache -> Disk Cache

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.