NSS very slow when copy/write large files

Posted on 2006-06-28
Last Modified: 2008-01-09
We had an issue with this on a NetWare 6.5 IBM server with ServeRaid 7t (Adaptec 2410SA) SATA RAID disk system and 1 Gbit NIC.
Copying smal files was so and so, while storing of large files - like iso images - on the server took "forever".
All the usual topics was checked - change of NIC, buffers, client cache settings, network. Also, we studied this question:

but nothing helped. Read performance was average at about 40%, write performance a disaster with 7% use of bandwidth at most with delayes in the transfer stream up to 1 second. Actually, it was faster to download an iso image from an external FTP server to the workstation than saving it to the file server.

The we located the tuning guide:

and these paragraphs:

NSS /CacheBalance=xx and NSS /MinBufferCacheSize=n
Every 30 seconds (by default) and/or when the server loads or unloads an NLM, NSS rebalances the NSS file system cache to the percentage specified by the /CacheBalance parameter. By default this is set to 60%; the valid range is 1-99 expressed as percentage. (You can change the default time interval with the /CacheBalanceTimer command. The range for this is 1 - 3600 seconds.)

The /MinBufferCacheSize represents the minimum amount of file system cache NSS will use. It will never balance the cache below the /MinBufferCacheSize. By default it is set to 512 * 4KB buffers, or 2MB. The valid range is 256 - 1,048,576.

On a server with only NSS volumes, you can usually adjust the /CacheBalance parameter upwards of 85% without any problem; some customers have reported that they can adjust it in excess of 95%. Monitor the Least Recently Used (LRU) Sitting Time on the NetWare server to ensure that you are not running the server too low on memory.
NSS /ClosedFileCacheSize=n
Obtaining beast (file) information from disk is an expensive process in terms of performance. If NSS can get the information from memory, it will enable file handles to be given quicker. The /ClosedFileCacheSize parameter keeps these beast objects in cache so NSS doesn't have to go to disk and unpack beast information again when it wants the same file.

In NSS 3.0, the default size is 50,000; the valid range is 16-1,000,000. Novell recommends setting this parameter to 100,000 or more if you have applications on your server that consistently cycle through the same set of files and you suspect that this cache is being flushed when combined with normal server operations. On average, each Closed File Cache entry consumes 0.4 - 1KB of RAM.

This did the trick:
NSS /CacheBalance=99 /ClosedFileCacheSize=100000

Now read/write to/from a WinXP workstation with a 100 Mbit NIC runs at about 65% and 90% bandwidth respectively.

I realize I have answered my own question; however, this was really hard to track down and at the end the solution was so simple. I don't recall the initial settings not to say why they were so - no idea.
Any useful comments will be rewarded with a small gift of points.
Question by:Gustav Brock
  • 4
  • 2
  • 2
LVL 35

Expert Comment

ID: 17005856
Wal, I guess I could comment but how useful it might be, I don't know.

Tuning NSS is one of the post-install things we lucky admins get to do.  The reason NSS cache balance defaults to what it does is, it assumes that you're using at least one traditional volume.  Why, for NetWare 6.5, this would still be the default setting, I don't know.

Ya'd think it'd be on the list of questions or something, with a more optimal cache balance setting from the get-go if you say "no, I will never install a traditional volume on this server," along with better settings for things like /closedfilecachesize, based on the amount of RAM detected during install.  There are some other settings you could tweak for your individual server config that might help further.  What they are, depends on your situation.  How the server is used, how many users, how many files, general file size, server memory, etc.

My gut says the default settings are what they are so if you're upgrading a server with traditional volumes and the bare minimum required for memory, it should still run.  That doesn't mean the install process couldn't be improved upon, but as we all know, development energies are going elsewhere so it's not likely to be improved (for the standalone NetWare kernel anyway.  Hopefully it will continue to improve for the NetWare-on-XEN/SLES version.)

I hope you'll get some more words of wisdom from the other Experts...  I might learn sumpin' too.
LVL 50

Author Comment

by:Gustav Brock
ID: 17007554
The server in question is extremely typical. A SYS and a DATA volume on three 80 GB SATA disks in a RAID and the NW Small Business server installation for five users running file, print, GroupWise. Also a SCSI DAT backup and Backup Exec.

We are somewhat surprised that the defaults may leave such a bad performance.
The tuning guide is not bad but it should contain some sets of settings for typical setups like this.


Expert Comment

ID: 17027631
When was the server installed?

What is the size of volumes and how much free space?

Reason for questions is that SATA drives because of slow seek times tend to decrease performance when fragmented or when more then half full.
Online Training Solution

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action. Forget about retraining and skyrocket knowledge retention rates.

LVL 50

Author Comment

by:Gustav Brock
ID: 17029728
It's three 80 GB disks leaving 160 GB space of which 32 GB is in use, thus about 128 GB free space.
The server and the array have run for about 1½ year, and all parts including the controller are IBM branded.

But the scenario has changed quite a bit.
After a reboot we noticed at boot time that the 2410SA controller reported the array as "degraded" meaning that a disk was out of service. While getting hold on a replacement disk, about six hours later the volumes were suddenly dismounted ... and at reboot the status of the array had changed to "failed".
Using the manager, only the first drive was displayed as alive, the two others were greyed out.
At this state, you probably know that there is nothing else to do than pray and press Ctrl+R to rebuild the array. However, that worked, the server would boot, and the pool and volumes could be activated and mounted.
The server has now run for three days with no further issues.

Our assumption is that one drive has been out of service for some time and that the slow write performance could be explained by this. Then our tweak of the NSS cache settings may have compensated for this.
We adjusted back the NSS settings to the default values, and these now work very well as up to 90% bandwidth of a 100 Mbit connection is used when saving a large file from a WinXP workstation.

So the question is why one and later two disks pop out of service without showing any malfunction later? The server runs without an UPS as power supply here is much more stable than in many other parts of the world.

Have you seen something like this?


Accepted Solution

dotENG earned 250 total points
ID: 17029791
Does this look familiar?

For me I don't recommend SATA for NW servers because of the reasons I wrote above, but if there was only one vendor I could choose for a server, it would have been IBM.
LVL 50

Author Comment

by:Gustav Brock
ID: 17029941
Yes, that's pretty much it, though no drive actually was "defunct".

The problem with not recommending SATA is that most entry-level servers these days come with SATA, but - of course - this experience has made us reconsider the recommendations for small NW servers. SCSI has never let us down.

LVL 35

Assisted Solution

ShineOn earned 250 total points
ID: 17034118
SATA (and  PATA) mfrs are now putting out 2 classes of drives.  One consumer-grade and one "raid" quality, with a much higher MTBF and longer warranties than the consumer-grade.

They are close to rivaling SCSI reliability, with much larger per-drive capacity for less money.   If  you're tight on funds, it's not that bad an option any more.
LVL 50

Author Comment

by:Gustav Brock
ID: 17035317
Good point.
Thanks for the input both of you.


Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Ldap, AD speed scanner 4 445
Groupwise IMAP 2 1,170
Need help setting up GropWise 2012 on Windows Server 2008R2 10 711
Always backup Domain, SYSVOL etc.using processes according to Microsoft Best Practices. This is meant as a disaster recovery process for small environments that did not implement backup processes and did not run a secondary domain controller that ne…
The Windows functions GetTickCount and timeGetTime retrieve the number of milliseconds since the system was started. However, the value is stored in a DWORD, which means that it wraps around to zero every 49.7 days. This article shows how to solve t…
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question