Solved

Sybase Rep Server - Warm Standy Problem

Posted on 2008-10-21
9
1,437 Views
Last Modified: 2012-05-05
We use RS in Warm Standby configuration one of the databases had a problem today and there was a large number of rows generated that we not needed. As a result I truncated the table on the primary side. the problem is that the data still seems to be flowing through the repserver to the database on the warm standby side. The queue for this Db has been increasing in size all day and is now alomst 10 GB.

So I wanted to find out if there any way to delete the data ?

I attached a file with the output of the admin who, sqm & admin who,sqt commands.
repserver-102108.txt
0
Comment
Question by:IDCSSupport
  • 4
  • 3
  • 2
9 Comments
 
LVL 19

Assisted Solution

by:grant300
grant300 earned 200 total points
Comment Utility
The immediate answer is that you can shutdown the RS and it's agents and stop the flow of data.

As far as cleaning out the "unneeded rows" goes, you really need to get with Joe Woodhouse.  Joe is the definitive EE source for Rep Server.

You will probably want to have more information in hand than you have given us to this point.  For instance, the nature of the problem with the database and why you have built up such a huge backlog of "unneeded rows".

FWIW, my guess is that, if you are really determined and really understand the schema of the RS, it will be possible to remove the queued up rows before they get to the application stage but Joe is the guy who can really tell you what to do.

Regards,
Bill
0
 
LVL 24

Accepted Solution

by:
Joe Woodhouse earned 300 total points
Comment Utility
(Bill, that's quite a reputation to live up to!)

You *can* delete entries from queues, but I don't think that's a good way to go. You either have to do it entry by entry - will take forever in this case - or delete *everything*, and risk losing transactions you do in fact care about.

You can also increment the "generation ID" in the warm standby side, which will have much the same effect, it will ignore everything currently in the queues, and you'll have to increase the gen_id on the primary side as well so that all xacts from that point on won't be lost.

Either way you have a complicated business of having to confirm you're only ignoring xacts you don't want while guaranteeing all others are kept. Add to that the risk of getting things out of sync between primary and standby and this is a huge headache.

Your best bet at this point is to treat this as a lost standby. Resync the stadby - dump the primary database and load into the standby, following the method in the Replication Server Troubleshooting Guide. This is a standard scenario, having to restore the standby from the primary, and there are some big gotchas (disabling the replication truncation point in the standby after it's been loaded; incrementing the gen_id, restoring maint users if you've aliased them to dbo, etc). Go straight to the manuals for this one!

Restoring the standby might seem like a nuisance, but (if you read and follow all the instructions in the manual) it actually is pretty easy... and *much* easier (ten times easier!) than attempting surgery on the RS queues.

If you've never done a restore into a standby before it's very good practice and in principle can even be automated - I've seen a site that could do it with a single script. Very handy to have, since many different RS problem resolutions require this.
0
 

Author Comment

by:IDCSSupport
Comment Utility
Joe & Bill,

Thanks for your suggestions I was going to go ahead with them and I have run into another problem.
When I try to execute and commands in the repserver via isql I get error messages like this:

1> sysadmin purge_all_open, 117, 1
2> go
Msg 14024, Level 12, State 0:
Server 'RSMDR001':
Executor does not allow the entered command in the current mode.
1> admin who
2> go
Msg 14024, Level 12, State 0:
Server 'RSMDR001':
Executor does not allow the entered command in the current mode.
1> admin who_is_down
2> go
Msg 14024, Level 12, State 0:
Server 'RSMDR001':
Executor does not allow the entered command in the current mode.
1> add partition rep_dev08.dat 'D:\repData' with size 2048
2> go

In addition the RSM server dies peridically and then I loose the use of Sybase Central until I restart.

So know I'm stuggling to get anything done do you guys have any suggestions ?
0
 
LVL 19

Assisted Solution

by:grant300
grant300 earned 200 total points
Comment Utility
As I said before, how about you tell us what started this whole mess.  It may well be this is another symptom of the overall problem.

Regards,
Bill
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 

Author Comment

by:IDCSSupport
Comment Utility
There were about 10 million rows generated as a result of an update and then a trigger firing.
0
 
LVL 24

Assisted Solution

by:Joe Woodhouse
Joe Woodhouse earned 300 total points
Comment Utility
"In the current mode" make me think you've restarted RepServer in "recovery mode" (usually necessary for some recovery commands. Either that or it's in single user mode. Shut it down and restart it normally.

If you read up on purge_all_open it will tell you you may have to end up recovering your standby database. The problem is you cannot guarantee that the only transactions in the queues are the ones belonging to this data explosion. If even one that you want to keep has entered the system, you'll be restoring/resyncing anyway... so you may as well just restore/resync now.

Many replication problems end up with a dump from primary and load into replicate. If I can't fix something within half an hour I just do that. It works every time, it's not very complicated and it usually is a *lot* faster than fooling around trying to repair things.

That said, 10 million xacts shouldn't have killed your system for days, I suspect you have wider-ranging problems here. Are you using (table-level) repdefs for your Warm Standby? Do they have "replicate_minimal_columns" set? Have you tuned Rep Server - overall memory, SQT cache, enabling smp?
0
 

Author Comment

by:IDCSSupport
Comment Utility
Joe,

I'm new at this company and as you can see of limited experience with Rep Server in the Warm Standby configuration. We are doing database level replication for all objects in this and three other databases.

It's my understanding that the Replication system was setup using rs_init so the parameters that you mention have not been modified. I looked in the config file for the Rep Server and none of them are mentioned.

I don't have a problem going ahead and Restoring the standby db but I can't find the directions you mention, could you include a link ? Also I am concerned that since the queues appear to be full none of the other db's are getting replicate either ?

Back to the error message I'm getting "Executor does not allow the entered command in the current mode" indicates that rep server was resatrted in Recovery mode that wasn't something I did, at least intentionally, in any case I rebooted rep and I'm still seeing the same thing, any ideas for how I can get out of this mess ?
0
 

Author Closing Comment

by:IDCSSupport
Comment Utility
Thanks
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
Comment Utility
Most RepServer parameters don't live in the config file (it's not like ASE), or even in RepServer itself - they live in the associated RSSD (Replication Server System Database), which is either a database in an ASE, or an embedded RSSD (ERSSD). Assuming it's an RSSD in an ASE, you can see and set RS parameters with:

use [RSSD database name]
go
rs_configure
go

Some other parameters are not RS-wide and apply only to specific connections/DSIs.

Re. database-level replication - this is a very simple and convenient way to do things, and unfortunately also the slowest. And guess what, the worst performance hit is to updates!

This is a long and complicated topic, but very briefly without a *table*-level replication definition (repdef), suppose you have a table with 26 columns and you only update one of them in the primary database:

update [table]
set C=0
where A=19999   -- let's say A is the PK

Without a repdef, RS translates this in the replicate database as:

update [table
set A=19999
,     B=[old value of B]
,     C=0
[...]
,     Z=[old value of Z]
where A = 19999
,          B = [old value of B]
,          C = 0
[...]
,          Z = [old value of Z]

Pretty horrible, huh! It has to do this because without a repdef it doesn't know how to uniquely identify a row, so it has to put in all those WHERE clauses. It also isn't clear on which columns it needs to update (send all? send only the ones that changed? either could be valid in some circumstances) so it updates them all.

With a table-level repdef you can least get rid of the extra WHERE clauses... but you still set every column. You have to also turn out "replicate_minimal_columns" in the repdef to only set the columns that changed.

Note without a repdef (or minimal_columns), you're updating every column including indexed columns and the PK! This means whatever style of update it was in the primary database, it's now a deferred update in the replicate, so the same update statement takes much longer to run than it did in the primary. I'd say this was your first problem. Repdefs are really crucial for performance...

Re. that error message - what are you using to connect to RS? This can also happen if you connect using an old style of interface that tries to connect in ANSI CHAINED mode... which issues a "BEGIN TRAN" upon connecting... which RS doesn't understand. You can apparently ignore these messages. You won't get them if you connect with any modern Sybase client that uses ct-lib... like the isql that shipped with RS?

I have some bad news for you - I don't think it's possible to effectively maintain a RS system with neither prior experience nor the standard training course. You can pick up ASE if you've used any other enterprise RDBMS before, but nothing really prepares you for RS. Reading it all from the manuals is a tough ask. You need the training course.

Lastly there is a fantastic tuning whitepaper available from Sybase at http://www.sybase.com/detail?id=1015811 ... but it's going to be a lot to swallow without the training course first. :(
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Marketing can be an uncomfortable undertaking, especially if your material is technology based. Luckily, we’ve compiled some simple and (relatively) painless tips to put an end to your trepidation and start your path to success.
Describes a method of obtaining an object variable to an already running instance of Microsoft Access so that it can be controlled via automation.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now