Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

4.11 hangs deleting dying replica w/TTS tweaked

Posted on 1999-07-19
10
Medium Priority
?
482 Views
Last Modified: 2012-08-14
We've adjusted the TTS settings as Novell recommends (TID 2939221) and still hang when using DSREPAIR -A "Destroy selected replica on this server" on a dying replica of [root] that we can't get rid of. It runs fine for maybe 45 minutes, and then hangs (without abending) with a transaction log file that's 312MB.  But there's still another 200-300MB free on SYS!

Any suggestions besides placing a Novell support call?  We need to clear a -637 error from stuck obits that were traced to the half-dead "dying replica."
0
Comment
Question by:paulnic
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
10 Comments
 
LVL 5

Expert Comment

by:jstegall
ID: 1597553
I would say you are running out of disk space since the TTS log is larger than the free space on the drive, the system may create the new file with changes before deleting the old file so with that you don't have space to create the temporary file.  I would think it would shut down transaction tracking in that case.  Try clearing another 100MB and see what happens.  You could restore the deleted files from backups, you do have good backups don't you?
0
 
LVL 3

Expert Comment

by:brosenb0
ID: 1597554
Check the server object is not inadvertently defined in another container, otherwise get Novell to dial in and DSDump the sucker.
0
 
LVL 3

Expert Comment

by:danilop
ID: 1597555
You can see at the following TID's about -637 error: 2915160,2942837.

0
On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

 
LVL 3

Author Comment

by:paulnic
ID: 1597556
Danilop,

Thanks but we've been there.

Paul
0
 

Expert Comment

by:mwb_m
ID: 1597557
paulnic;

   Please read this TID from novell, maybe it can help you:

Removing dying subordinate reference replica (Last modified: 9JUL1998) TID2939186

This document (2939186) is provided subject to the disclaimer at the end of this document.

Symptom:

Dying subordinate reference replica with inconsistent replica ring. The server in question is the only server, thinking it is in the replica ring together with the master and the replicas, while the other servers agree not having it in the replica ring.

Solution:

All involved servers must be communicating.

Create a database dump file on the "dying" server in DSRepair, Advanced options. Just as a precaution.

From NDSManager add a read-write replica on the server. This will replace the dying subref with a read-write, which can then be deleted from NDSManager after it has advanced to On state.

Causes:

A part of the process of deleting a readable replica is to check the subordinate replica rings on the server. In case there are subordinate replicas of partitions below the one directly being deleted, they now become excessive and are deleted using normal replica deletion procedures (set replica state to dying etc).

Sometimes, particularly when deleting replicas of large partions, a large number of transactions are being generated on the server and this can cause TTS to be disabled and potentially leading to an abend if it is not being addressed properly. This kind of errors can generate the problem of having a subordinate replica in dying state.

Please refer to TID 2939221 (Preventing TTS disabled during partition Op's) for further information on how to avoid this situation.

Troubleshooting:

Perform partition continuity on the replica ring in NDSManager to verify that at least the majority of the servers have the same replica ring information. The server with the dying replica will most likely not show up here, but this is caused by the fact that the servers being read do not include the problem server. It is not NDSManager reporting incorrect information, but since it does not contact all servers in the tree to see if "anybody else should happen to think it belongs to the replica ring" for the partition in question, it normally does not hit the problem server. Tree-wide replica ring verification can be performed with the Partchk utility from Novell Developer support available at http://developer.novell.com/engsup/sample/areas/delphis.htm.

Hardware Configuration
Search -621 621 -672 672 kill killing subref sr rw destroy delete.

Please let me know if it works.


0
 
LVL 3

Author Comment

by:paulnic
ID: 1597558
mwb_m:

Good idea, thanks.  Only problem when I tried it in response to your message was that attempting to create any new replica of root gives us the -637 "previous move in progress" error.  In fact, it was when we tried to put a replica on a new server that the stuck obituary problem surfaced in the first place.

jstegall:

It's been taking awhile to free up extra space on SYS, but we're almost ready to try your plan.  The fact that the large TTS file was named backou00.tts instead of backoUT.tts supports the idea of a temp file, though I've no clue how the two filenames relate....   Thanks whatever happens.

Paul


0
 
LVL 3

Author Comment

by:paulnic
ID: 1597559
Well, it looked like the server had hung again--I lost rconsole and mapped drive access--but when I visited the site, DSREPAIR was showing its completion log,  the dying replica was gone, and the server had healed itself!  And NDS looked a lot better.

This time DSREPAIR processed different objects than had scrolled past during the last of our two attempts.  And the TTS log had only reached about 152MB in size.  My conclusion is that each attempt it went further in the process, and that even after the server had become seemingly dead to the world, it was still processing the destruction of the dying replica.  With enough disk space it might have run to completion had I not interrupted the previous attempts.   (This does not increase my faith in 4.11's mechanism to balance resources among executing threads, and it would be nice if the bug was documented.)

Now the only mystery is how we were able to run a lengthy purge from FILER of what NWADMIN showed as 230Mb of purgeable files, without having the free space reported by Explorer or TOOLBOX increase significantly.

Anyway, almost all of you had helpful & thoughtful responses so if you would like the points, be the first to post a note as an answer and you'll get them.

Paul


0
 
LVL 5

Accepted Solution

by:
jstegall earned 1000 total points
ID: 1597560
Thanks,  good to hear it got cleared up.  I assume we have a siruation where we must keep trying until successful.
0
 
LVL 3

Author Comment

by:paulnic
ID: 1597561
We finally were able to contact Novell support yesterday.  Rep didn't provide a clear answer about whether TTS needs free space twice the size of its log file.  His conclusion was that our hang came from having less than 10% of disk blocks free on SYS as the operation progressed.  He said that's a common cause of 4.x server problems...so disk space is apparently a concern even if TTS doesn't need twice the size of its logfile.
0
 
LVL 5

Expert Comment

by:jstegall
ID: 1597562
That is good to know,  Thanks
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you’re involved with your company’s wide area network (WAN), you’ve probably heard about SD-WANs. They’re the “boy wonder” of networking, ostensibly allowing companies to replace expensive MPLS lines with low-cost Internet access. But, are they …
We live in a world of interfaces like the one in the title picture. VBA also allows to use interfaces which offers a lot of possibilities. This article describes how to use interfaces in VBA and how to work around their bugs.
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…
Visualize your data even better in Access queries. Given a date and a value, this lesson shows how to compare that value with the previous value, calculate the difference, and display a circle if the value is the same, an up triangle if it increased…

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question