NSS very slow when copy/write large files

We had an issue with this on a NetWare 6.5 IBM server with ServeRaid 7t (Adaptec 2410SA) SATA RAID disk system and 1 Gbit NIC.
Copying smal files was so and so, while storing of large files - like iso images - on the server took "forever".
All the usual topics was checked - change of NIC, buffers, client cache settings, network. Also, we studied this question:


but nothing helped. Read performance was average at about 40%, write performance a disaster with 7% use of bandwidth at most with delayes in the transfer stream up to 1 second. Actually, it was faster to download an iso image from an external FTP server to the workstation than saving it to the file server.

The we located the tuning guide:


and these paragraphs:

NSS /CacheBalance=xx and NSS /MinBufferCacheSize=n
Every 30 seconds (by default) and/or when the server loads or unloads an NLM, NSS rebalances the NSS file system cache to the percentage specified by the /CacheBalance parameter. By default this is set to 60%; the valid range is 1-99 expressed as percentage. (You can change the default time interval with the /CacheBalanceTimer command. The range for this is 1 - 3600 seconds.)

The /MinBufferCacheSize represents the minimum amount of file system cache NSS will use. It will never balance the cache below the /MinBufferCacheSize. By default it is set to 512 * 4KB buffers, or 2MB. The valid range is 256 - 1,048,576.

On a server with only NSS volumes, you can usually adjust the /CacheBalance parameter upwards of 85% without any problem; some customers have reported that they can adjust it in excess of 95%. Monitor the Least Recently Used (LRU) Sitting Time on the NetWare server to ensure that you are not running the server too low on memory.
NSS /ClosedFileCacheSize=n
Obtaining beast (file) information from disk is an expensive process in terms of performance. If NSS can get the information from memory, it will enable file handles to be given quicker. The /ClosedFileCacheSize parameter keeps these beast objects in cache so NSS doesn't have to go to disk and unpack beast information again when it wants the same file.

In NSS 3.0, the default size is 50,000; the valid range is 16-1,000,000. Novell recommends setting this parameter to 100,000 or more if you have applications on your server that consistently cycle through the same set of files and you suspect that this cache is being flushed when combined with normal server operations. On average, each Closed File Cache entry consumes 0.4 - 1KB of RAM.

This did the trick:
NSS /CacheBalance=99 /ClosedFileCacheSize=100000

Now read/write to/from a WinXP workstation with a 100 Mbit NIC runs at about 65% and 90% bandwidth respectively.

I realize I have answered my own question; however, this was really hard to track down and at the end the solution was so simple. I don't recall the initial settings not to say why they were so - no idea.
Any useful comments will be rewarded with a small gift of points.
LVL 53
Gustav BrockCIOAsked:
Who is Participating?
dotENGConnect With a Mentor Commented:
Does this look familiar?


For me I don't recommend SATA for NW servers because of the reasons I wrote above, but if there was only one vendor I could choose for a server, it would have been IBM.
Wal, I guess I could comment but how useful it might be, I don't know.

Tuning NSS is one of the post-install things we lucky admins get to do.  The reason NSS cache balance defaults to what it does is, it assumes that you're using at least one traditional volume.  Why, for NetWare 6.5, this would still be the default setting, I don't know.

Ya'd think it'd be on the list of questions or something, with a more optimal cache balance setting from the get-go if you say "no, I will never install a traditional volume on this server," along with better settings for things like /closedfilecachesize, based on the amount of RAM detected during install.  There are some other settings you could tweak for your individual server config that might help further.  What they are, depends on your situation.  How the server is used, how many users, how many files, general file size, server memory, etc.

My gut says the default settings are what they are so if you're upgrading a server with traditional volumes and the bare minimum required for memory, it should still run.  That doesn't mean the install process couldn't be improved upon, but as we all know, development energies are going elsewhere so it's not likely to be improved (for the standalone NetWare kernel anyway.  Hopefully it will continue to improve for the NetWare-on-XEN/SLES version.)

I hope you'll get some more words of wisdom from the other Experts...  I might learn sumpin' too.
Gustav BrockCIOAuthor Commented:
The server in question is extremely typical. A SYS and a DATA volume on three 80 GB SATA disks in a RAID and the NW Small Business server installation for five users running file, print, GroupWise. Also a SCSI DAT backup and Backup Exec.

We are somewhat surprised that the defaults may leave such a bad performance.
The tuning guide is not bad but it should contain some sets of settings for typical setups like this.

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

When was the server installed?

What is the size of volumes and how much free space?

Reason for questions is that SATA drives because of slow seek times tend to decrease performance when fragmented or when more then half full.
Gustav BrockCIOAuthor Commented:
It's three 80 GB disks leaving 160 GB space of which 32 GB is in use, thus about 128 GB free space.
The server and the array have run for about 1½ year, and all parts including the controller are IBM branded.

But the scenario has changed quite a bit.
After a reboot we noticed at boot time that the 2410SA controller reported the array as "degraded" meaning that a disk was out of service. While getting hold on a replacement disk, about six hours later the volumes were suddenly dismounted ... and at reboot the status of the array had changed to "failed".
Using the manager, only the first drive was displayed as alive, the two others were greyed out.
At this state, you probably know that there is nothing else to do than pray and press Ctrl+R to rebuild the array. However, that worked, the server would boot, and the pool and volumes could be activated and mounted.
The server has now run for three days with no further issues.

Our assumption is that one drive has been out of service for some time and that the slow write performance could be explained by this. Then our tweak of the NSS cache settings may have compensated for this.
We adjusted back the NSS settings to the default values, and these now work very well as up to 90% bandwidth of a 100 Mbit connection is used when saving a large file from a WinXP workstation.

So the question is why one and later two disks pop out of service without showing any malfunction later? The server runs without an UPS as power supply here is much more stable than in many other parts of the world.

Have you seen something like this?

Gustav BrockCIOAuthor Commented:
Yes, that's pretty much it, though no drive actually was "defunct".

The problem with not recommending SATA is that most entry-level servers these days come with SATA, but - of course - this experience has made us reconsider the recommendations for small NW servers. SCSI has never let us down.

ShineOnConnect With a Mentor Commented:
SATA (and  PATA) mfrs are now putting out 2 classes of drives.  One consumer-grade and one "raid" quality, with a much higher MTBF and longer warranties than the consumer-grade.

They are close to rivaling SCSI reliability, with much larger per-drive capacity for less money.   If  you're tight on funds, it's not that bad an option any more.
Gustav BrockCIOAuthor Commented:
Good point.
Thanks for the input both of you.

All Courses

From novice to tech pro — start learning today.