Basic storage buffer and queue question

mokkan
mokkan used Ask the Experts™
on
I have a  basic question regarding storage, When data arrive to control unit,  it should keep it in queue and send it to necessary lun to write data.

Inside the lun we need to worry about storage pool, raid  and disks.

My question is where are the places I need to worry about cache and buffer queue?  I know this question is more general, but I'm trying to get the view.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
read and write cache on the storage processor, queue/buffer on the initiator.

Author

Commented:
Thank you very much. If want to turn of for specific LUN, can you turn off? Or do we need to do it  per storage pool?

queue/buffer on the initiator mean, which initiator? Sorry, you mean by host level?
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
yes, host level hba initiator.

that would very much depends on the configuration of your SAN, not usually cache is set to all, it's a global.
Success in ‘20 With a Profitable Pricing Strategy

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
I'm working on storwize 7000 storage and they were telling me that I can't turn off or on cache on specific LUN?
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
yes, thats quite normal for a SAN.

Most cache settings are global for ALL LUNs.
DavidPresident
Top Expert 2010
Commented:
Cache, buffer, queue, prefetch, and more exist in multiple places.  The more you tune the fewer I/Os have to go to disk, the less frequently you do them; the size of the I/Os are smaller; and your cache has higher utilization..

By far, the HDD is the slowest, least efficient, that can take thousands or millions of times longer to get data when compared to cached data within a host computer.

The HDD has a queue used to reorder instructions. This is not generally programable other than on/off.  However cache utilization algorithms; prefetch sizes; the read/write utilization are, depending on make/model.  You get the greatest bang for a buck here.

Next step up, the RAID controller (assuming hardware).  I/O stripe size and cache on/off is about all that is there, other than RAID, and some deep settings best not messed with.  Cache is often binary for a specific LUN.

The HBA/initiator/drivers will have a programmable queue depth, but cache is really only there to prevent interrupts. It isn't something most people are allowed to mess with.
Top Expert 2014
Commented:
Why would you want to turn the storage cache off, it's backed up onto a local SSD should power fail.

Author

Commented:
Hi Andyalder,

The reason I wanted to turn of the cache is that I can get better rate on write, when I turn off the cache I got  around 680MB/S write, but when i turn of I got 490 MB/S. I wanted to see which option would provide better write. What is the disadvantage of turning off cache?
Top Expert 2014

Commented:
The disadvantage of disabling cache is that it slows writes down, which seems to be the opposite of what you are seeing. How are you measuring it and where are you turning it off?

Author

Commented:
I turn it off on specific LUN and writing data to it and it is increased by closed to 200 MB/S.
I'm using IBM storwize 7000
DavidPresident
Top Expert 2010
Commented:
Increasing by turning off cache is quite rare.  Personally, I think these numbers are highly suspect, especially if you are running some benchmark rather than real-world application I/O.  

So, specifically, how are you measuring throughput.   Not only that, but real-world, are your applications sequential writes?  Is this system serving as a data logger?

Author

Commented:
I  just used DD command to write data which sequential write.  One more question, if I'm writing data as 32KB block into raw device and my raid stripe size is 256KB, where does conversion happens? Or it waits for another 8 blocks of data and write all together? Sorry for asking too much questions.
DavidPresident
Top Expert 2010
Commented:
Exactly what is the dd command you are using?  

(But it is moot, as no matter how you use it, there is no guarantee that you will get 100% 32K KB sized CDBs by the time it gets to the disk).   In fact, nothing at all even gets sent until a fflush() gets called by the kernel, and that varies depending on some settings.  

Also, real-world, this is a crappy benchmark because it doesn't represent a real-world test, unless all your computer does is write zeros in chunks of 32KB to a raw device.

Author

Commented:
I'm using dd command like this. I'm writing in loop.

#!/bin/bash -x
for (( c=1; c<=20; c++ ))
do
  `time dd if=/dev/zero  of=/opt/testing123    bs=8k count=30000000`
  `time rm  /opt/testing123`
  `sleep 40`
done
President
Top Expert 2010
Commented:
Other flaws
 * You're writing to the filesystem, not hdd.  So FS parameters, blocking, journals, all NOT necessarily 8kb sized also gets done.  You should be using /dev/sdX if your desire is to test write performance on the hardware.
 * You are not writing anything that can be cached by the controller anyway, (except for small amount of housekeeping) that is why turning it off has some benefit.  Real-world this just doesn't happen unless your app is a real-time data logger to raw physical contiguous blocks.
 * filesystem housekeeping also forces mixed I/O sizes of typically 4KB random reads & writes.
 * 8KB I/Os WILL be aggregated by kernel, filesystem, and even device drivers.
 * Is /opt a separate partition, or part of the / filesystem?
 * You didn't 'nice' the dd, so the test competes with all other system I/O
 * Single threaded??  Your test doesn't even let multiple cores share the load.  At least run multiple instances in parallel and add a "wait" command to let them finish up.   [add & at end of each dd, followed by a wait command to let the n instances finish up, use different output files, set the start= on each instance to insure different parts of the disk]
 * dd doesn't sync the output.  You have I/O in a queue that has NOT been written to the HDD when 'time' has been run.     You should add a "sync" to flush all I/O to disk instead of waiting  40 seconds, and before determining the time if you want to use this script.
 * dd isn't even doing contiguous I/O to disk

In other words, garbage in => garbage out.  Your flawed benchmark resulted in flawed results that make you think no cache is a good thing.
Top Expert 2014
Commented:
I agree, it's a meaningless test, especially using /dev/zero since the storage will de-duplicate it to nothing. Use iometer - http://www.iometer.org/doc/downloads.html and throw some random stuff at it, not just large blocks.

Author

Commented:
Thank you very much. I'm doing it now using vdbench and will provide you the update.

Author

Commented:
Thanks a lot guys, almost got whats happening.  If  I understand correctly, if I`m writing 32KB to raw device, in the storage my raid stripe size is 256KB. How is it going to write in storage? It waits for another 8 blocks (32 X8), and assemble together? And then write? Or it all depends on the storage? what is the general behavior in storage?
Top Expert 2014
Commented:
It'll generally put it in the write cache until it can write a full width stripe unless it runs out of cache space or cache is disabled. Since the cache is backed up it can immediately tell the Os write operation complete even though it's not written to the physical platters yet.

Author

Commented:
Thanks a lot all of you. Your answers are very good.
Top Expert 2014

Commented:
Did you get the same anomalous rsult turning the cache off when you threw random test data at it?

Author

Commented:
It was at least  150 MB/S  was increased on SSD drives when turn of the cache.
Top Expert 2014

Commented:
First time you mention SSD is now? You think we would guess you had SSD drives in it rather than spinning disks covered in rust? It's a completely different ballgame when you have NAND flash.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial