210007: LU allocate xlate failed / Failed to rep un_xlate for

Getting a %ASA-3-210007: LU allocate xlate failed from a 5540 firewall pair. I looked up 210007 in the Syslog Messages PDF and it says to check the memory. Memory is about a third utilized.

I do a
     debug fover fail
on both the primary and secondary, and on the secondary I get:
Failed to rep un_xlate for outside 10.x.x.x1/56616 - inside x.x.x.249/2055 flg: 2001000 2000002
Failed to rep un_xlate for outside 10.x.x.x2/63588 - inside x.x.x.249/2055 flg: 2001000 2000002

There are two different 10-addresses, each with a different port (as shown) and repeating back and forth. The "inside" addresses are always identical. These debug messages go on and on and on and there's no sense that they'll stop.

I did
     sh xlate | inc x.x.x.249
and got a NAT address that didn't match either one of the debug un_xlate errors.

Both devices are on 8.4(5). I'm not showing any other errors. I'd try to compare the xlate tables, but they over 60 pages long.

Everything I see online says it's a bug, but was likely fixed with 8.4(2.3) (something like that). I don't necessarily want to just believe it and update if it isn't the case.

Any ideas?
LVL 3
ArchiTech89IT Security EngineerAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

pgolding00Commented:
re memory - try "sh block" and see if you have any with current or low water mark close to exhausted (ie 0).

try show xlate with some options to reduce the output to just the sessions you are potentially looking for. the global and local options are most likely the helpful ones. if they are still not enough, try with "sh xlate | inc <ip or port numbers>" to further reduce the output.

this sounds like it might be a benign session replication failure. maybe try clearing xlate or conn related to the troublesome xlate, on the stby (assuming it does not exist on the active) or on the active if it does appear there.

are the two 10 addresses or the inside .249 address from the messages involved with nat or object nat config? (8.4, so no static?) if so, do you have nat for the same inside .249 address to two different interfaces, maybe outside and dmz? if so, "fail exec st sh int ip br" - are all 3 interfaces involved with the nat "up/up" on the stby? failover session rep fails if the int is not up on the stby.

its not recommended to run debug on the standby device.

i presume you know its possible to execute commands on the standby from the active - "failover exec standby <whatever cli you like>". note that theres no cli help or tab completion available with this.
ArchiTech89IT Security EngineerAuthor Commented:
Boy, thanks for the all the good ideas!

Here's what we ended up finding out.
1. Memory was OK.
2. Configs were syncing, but xlates were not. This is part of what you were driving at: there were twice NATs configured. (Bug CSCue32221)
3. As you described, debug on the standby device was not a good idea; it actually caused the secondary to crash, because I had left the SSH session with the debug running and the image didn't like that. (Bug CSCuf68858)

Upgraded the OS on the secondary to 8.4(7), then failed over, then upgraded the primary (now standby) to the same and reloaded it, errors went away. Did that all from the active device just like you noted by using failover exec standby + command.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
ArchiTech89IT Security EngineerAuthor Commented:
This solution came from the Cisco TAC. Due to a related critical priority issue, escalated problem to them to get fast solutions.

Great exercise-Update Image Live with No Downtime:
1-Use ASDM to get later version images on both firewalls in the pair.
2-From CLI on active:
   * run boot system disk0:newimage-k8.bin
   * write that so the standby gets it
   * run fail reload-standby to restart the standby
   * wait for sync messages (will get version mismatch warning)
   * run no failover active to failover the pair
3-From the CLI on the new active
   * run fail reload-standby again to now restart what was before the active but is now the standby
   * wait for sync messages to complete
4-You can now no failover active again if you want to get the primary back to active status

You've just upgraded the OS image in-place.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Cisco

From novice to tech pro — start learning today.