ArcServe Delay

Hi,

I'm backing up two NW6.5SP5 Cluster nodes using ArcServe r.11.1 and here's the problem:

Jun-02 18:32:17    168     Connect to node Server1 @ 192.168.3.100 (TCP/IP)
Jun-02 18:32:20    168     No estimate on backups
Jun-02 18:32:21    168.001 Connected with TSA Server1.NetWare File System
Jun-02 18:32:21    168.001 Process all files via SMS
Jun-02 18:32:21    168.001 LONG DOS MAC UNIX  files will be processed using
                           SMS Engine
Jun-02 18:32:23    168.001 Connected with TSA Server1.NetWare File System
Jun-02 18:32:24    168.001 Connected with TSA Server1.NetWare File System
Jun-03 08:35:03    168     The following 2 nodes will be backed up using data
                           multiplexing:
Jun-03 08:35:03    168     NWTreeName
Jun-03 08:35:03    168     Server2
Jun-03 08:35:03    168     Connect to node NWTreeName @ 192.168.3.100 (TCP/IP)
Jun-03 08:35:05    168     Connect to node Server2 @ 192.168.3.101 (TCP/IP)
Jun-03 08:35:22    168     No estimate on backups
Jun-03 08:35:22    168     No estimate on backups

[etc]

There is a considerable delay betweeen the 18:32:24 on Jun-02 and 08:35:03 on Jun-03.  
The backup server is also NW6.5Sp5 and all devices are connected to the same switch.  There are no other additional communication problems, that I'm aware of.

Any ideas or additional questions?

Best regards,

Michael
LVL 2
MMDCiscoAsked:
Who is Participating?
 
dotENGCommented:
Theoretically 1Gb/s==60Gb/m==7.5GB/m that is 100 times faster then 74MB/m.

* Run tsatest
    - http://developer.novell.com/documentation/samplecode/smscomp_sample/tsatest/tsatest.html
    - TSATEST.NLM can be found in the SMSSRVR.ZIP file located in the \PRODUCTS\SMS\ directory on the current NetWare 6.5 and NetWare 6.0 support pack CDs and images.
    - http://support.novell.com/cgi-bin/search/searchtid.cgi?10092890.htm
* Try to lock the Duplex Mode and Sync Speed to 1000/Full duplex both at the servers and the switch, some times auto mode can cause a serious performance degradation.
* Time a simple 3GB file copy to measure basic LAN performance (use a compressed file or large mpg/avi).
* Enable debug mode/logged mode on ArcServe and NWAGENT.
* Check server error log and console log.
* If you can, change all server physical LAN cables.

0
 
MMDCiscoAuthor Commented:
BTW: the backup job does normally finish.
0
 
dotENGCommented:
Can you post the loaded versions of your tsa*.nlm

Run
m tsa*
from your Server1 console.

0
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

 
MMDCiscoAuthor Commented:
TSAFS.NLM                                                            
  Loaded from [SYS:\SYSTEM\]                                          
  (Address Space = OS)                                                
  SMS - File System Agent for NetWare 6.X                            
  Version 6.51 June 8, 2005                                          
  Copyright (C) 2002-03, 2005 Novell, Inc.  All Rights Reserved.      
TSANDS.NLM                                                            
  Loaded from [SYS:\SYSTEM\]                                          
  (Address Space = OS)                                                
  TSA for Novell eDirectory 7.x, 8.x                                  
  Version 10551.44.05 February 3, 2004                                
  Copyright 1999-2001 Novell, Inc.  All rights reserved.              
Server1:                                                                
0
 
MMDCiscoAuthor Commented:
SMSStart.ncf:

Load SMSUT.nlm
Load SMDR.NLM
Load TSAFS.NLM /EnableGw=yes
0
 
dotENGCommented:
Are you using NWAGENT.NLM ?

If you are backing up the shared pool Try loading TSAFS /cluster /EnableGW=true

Check slp, run "display slp services" on all servers to see if you receive same results,
Do all servers show service:smdr.novell ?
Do you have an slpda ?
If you do, is it configured in all servers at sys:etc/slp.cfg
0
 
MMDCiscoAuthor Commented:
I'll add the /cluster switch to SMSStart.NLM

I am using NWAGENT.NLM

When I do run "display SLP Services" on the two cluster nodes, the results are the same with 28 Total URL's for "(All)/(default)/(Not specified)".  When I run it on the back up server, I get, ahem, 78 Total URL's for "(All)/(default)/(Not specified)".  The backup server sees additional services including GWIA.

The thing most obvious about the results from the backup server is an entry not seen in Server 1 or 2:   afp://192.168.3.105/?NAME=SERVER3
Server1 and Server2 do not display an AFP result in the query.

I've looked at the SLP.CFG file and Server1 has an entry for Server2 and Server2 has an entry for Server1 but neither have a reference for Server3 and Server3 does not have any configuration entries in SLP.cfg.  Summarized:
SERVER1:
DA IPV4, 192.168.3.101
SERVER2:
DA IPV4, 192.168.3.100
SERVER3:

This is what I get for not studying as much as humanly possible.  I don't understand the implications of the above entry nor do I understand the implications of the AFP entry in Server3's DLP Services query.

Any ideas?  I can't run the SMSStart.ncf test until tonight as we are back in production today.
Thanks in advance.

0
 
dotENGCommented:
As I see it there are a few options for configuring SLP:

1. Huge or very complicated network (10+ subnets - 5000+ client) - Two or more SLPDA
2. Normal Network
 a. ONE to Two SLPDA
or
 b. ONE Clustered SLPDA

So your configuration is either 1 or 2b,
meaning you should create a cluster resource for the SLPDA, load it on the cluster and have the same slp.cfg in all servers (you can copy the file).

After the change you should run on all servers:
SET SLP RESET=ON

Wait a few seconds and run "display slp services" again on all
0
 
MMDCiscoAuthor Commented:
I think I found the problem and it's the throughput of ArcServe itself.

To back up 66.88 Gigs at 74.4 MB/min it would take about 15 hours.  According to the NWAgent on one of the cluster nodes, that's exactly what is happening.

Sounds like my delay is just a very slow backup which can be caused by a myriad of things but it's not the SLP communication as the node is found and the process started as soon as the backup job is requested.

0
 
dotENGCommented:
74MB/min is very very slow, if using NWAGENT, unless you backup over WAN ?
0
 
MMDCiscoAuthor Commented:
I'm not backing up over WAN, I think I made a mention that the machines are all on the same Gig Switch.  That's the irritating part.  So, I tried backing up to disk instead of tape and I get the same results.
0
 
ShineOnCommented:
In your nwagent's ASCONFIG.INI, do you have, in the [NetWare Backup/Restore] section, the entry "performanceNonNakoma=TRUE" ?

In the [NWAgentLoader] section, do you have on the CSNLM2 line, "ARP cmp_max_req=32" ?

Do you have the TCP optional "Minshall's algorithm" turned on?  That helps to fix the ARCserve problem with delayed ACK and Nagle without turning them off...  "SET TCP MINSHALL ALGORITHM = ON"  - if that works for you, then add it to your AUTOEXEC.NCF files after the network loads are done, either that section of loads/binds or the initsys.ncf, depending on whether or not you use inetcfg...
0
 
MMDCiscoAuthor Commented:
I will agree with the decision.  Please accept my apologies for the lack of a response.  Things have been crazy and I've just returned to the states after getting married abroad.

Your help was greatly appreciated.
0
 
ShineOnCommented:
Congratulations.

I married a broad too ;) - just kiddin'
0
 
MMDCiscoAuthor Commented:
to think, I actually removed the (Insert joke here) comment following the word, abroad.
0
 
jadedataMS Access Systems CreatorCommented:
Bah-Dump Bump!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.