4.11 hangs deleting dying replica w/TTS tweaked

We've adjusted the TTS settings as Novell recommends (TID 2939221) and still hang when using DSREPAIR -A "Destroy selected replica on this server" on a dying replica of [root] that we can't get rid of. It runs fine for maybe 45 minutes, and then hangs (without abending) with a transaction log file that's 312MB.  But there's still another 200-300MB free on SYS!

Any suggestions besides placing a Novell support call?  We need to clear a -637 error from stuck obits that were traced to the half-dead "dying replica."
Who is Participating?
jstegallConnect With a Mentor Commented:
Thanks,  good to hear it got cleared up.  I assume we have a siruation where we must keep trying until successful.
I would say you are running out of disk space since the TTS log is larger than the free space on the drive, the system may create the new file with changes before deleting the old file so with that you don't have space to create the temporary file.  I would think it would shut down transaction tracking in that case.  Try clearing another 100MB and see what happens.  You could restore the deleted files from backups, you do have good backups don't you?
Check the server object is not inadvertently defined in another container, otherwise get Novell to dial in and DSDump the sucker.
The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

You can see at the following TID's about -637 error: 2915160,2942837.

paulnicAuthor Commented:

Thanks but we've been there.


   Please read this TID from novell, maybe it can help you:

Removing dying subordinate reference replica (Last modified: 9JUL1998) TID2939186

This document (2939186) is provided subject to the disclaimer at the end of this document.


Dying subordinate reference replica with inconsistent replica ring. The server in question is the only server, thinking it is in the replica ring together with the master and the replicas, while the other servers agree not having it in the replica ring.


All involved servers must be communicating.

Create a database dump file on the "dying" server in DSRepair, Advanced options. Just as a precaution.

From NDSManager add a read-write replica on the server. This will replace the dying subref with a read-write, which can then be deleted from NDSManager after it has advanced to On state.


A part of the process of deleting a readable replica is to check the subordinate replica rings on the server. In case there are subordinate replicas of partitions below the one directly being deleted, they now become excessive and are deleted using normal replica deletion procedures (set replica state to dying etc).

Sometimes, particularly when deleting replicas of large partions, a large number of transactions are being generated on the server and this can cause TTS to be disabled and potentially leading to an abend if it is not being addressed properly. This kind of errors can generate the problem of having a subordinate replica in dying state.

Please refer to TID 2939221 (Preventing TTS disabled during partition Op's) for further information on how to avoid this situation.


Perform partition continuity on the replica ring in NDSManager to verify that at least the majority of the servers have the same replica ring information. The server with the dying replica will most likely not show up here, but this is caused by the fact that the servers being read do not include the problem server. It is not NDSManager reporting incorrect information, but since it does not contact all servers in the tree to see if "anybody else should happen to think it belongs to the replica ring" for the partition in question, it normally does not hit the problem server. Tree-wide replica ring verification can be performed with the Partchk utility from Novell Developer support available at http://developer.novell.com/engsup/sample/areas/delphis.htm.

Hardware Configuration
Search -621 621 -672 672 kill killing subref sr rw destroy delete.

Please let me know if it works.

paulnicAuthor Commented:

Good idea, thanks.  Only problem when I tried it in response to your message was that attempting to create any new replica of root gives us the -637 "previous move in progress" error.  In fact, it was when we tried to put a replica on a new server that the stuck obituary problem surfaced in the first place.


It's been taking awhile to free up extra space on SYS, but we're almost ready to try your plan.  The fact that the large TTS file was named backou00.tts instead of backoUT.tts supports the idea of a temp file, though I've no clue how the two filenames relate....   Thanks whatever happens.


paulnicAuthor Commented:
Well, it looked like the server had hung again--I lost rconsole and mapped drive access--but when I visited the site, DSREPAIR was showing its completion log,  the dying replica was gone, and the server had healed itself!  And NDS looked a lot better.

This time DSREPAIR processed different objects than had scrolled past during the last of our two attempts.  And the TTS log had only reached about 152MB in size.  My conclusion is that each attempt it went further in the process, and that even after the server had become seemingly dead to the world, it was still processing the destruction of the dying replica.  With enough disk space it might have run to completion had I not interrupted the previous attempts.   (This does not increase my faith in 4.11's mechanism to balance resources among executing threads, and it would be nice if the bug was documented.)

Now the only mystery is how we were able to run a lengthy purge from FILER of what NWADMIN showed as 230Mb of purgeable files, without having the free space reported by Explorer or TOOLBOX increase significantly.

Anyway, almost all of you had helpful & thoughtful responses so if you would like the points, be the first to post a note as an answer and you'll get them.


paulnicAuthor Commented:
We finally were able to contact Novell support yesterday.  Rep didn't provide a clear answer about whether TTS needs free space twice the size of its log file.  His conclusion was that our hang came from having less than 10% of disk blocks free on SYS as the operation progressed.  He said that's a common cause of 4.x server problems...so disk space is apparently a concern even if TTS doesn't need twice the size of its logfile.
That is good to know,  Thanks
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.