My Program sometimes freezes, the log file shows error 10054 and ADS Error 7020

A week ago I installed our application into a new customers premises. It has been a nightmare. Every so  often, the app hangs, the only solution is to kill the process from the task manager. Yesterday I examined the Advantage error log, and found a whole stack of 10054 Socket errors followed by ADS 7020 (User logged out)  errors. At one stage when the workstation had locked up, I examined the active queries in Advantage Data Architect and it showed active queries running for the locked terminal, there percent % complete status was in he range of -495000.00%. The problem seem to happen when a transaction is commited to the datebase. I put some diagnostic messages into the code as follows:-

 DisplayMessage('Please Wait...., Saving Transaction');
      StandardEurekaNotify(ExceptObject, ExceptAddr);
      TransactionSaved := False;

 TransactionSaved := True;

When the code executes it displays the message, updates the database, hides the message and then sometimes hangs.

We have other customers with a similiar problem, but it only happens to their system every couple of weeks.

Can anybody help?


Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Joe WoodhousePrincipal ConsultantCommented:
Has Sybase Tech Support had any suggestions?

I'm not familiar with the new Advantage suite but if you're having socket errors and hangs, it's time to bring in the network people at this client (or your own network people). Do they have a healthy LAN? (What collision rate and retries are they seeing?) What is the  TCP_KEEPALIVE setting on the server? Is the Windows Server healthy? (Anything showing up in its Event Viewer?)

(These are necessarily vague hints at this point; hopefully we can zero in on the issue.)
robert_n_harrisAuthor Commented:
Hi Joe,
I have raised an issue with Sybase Tech Support many weeks ago with a similiar problem. It has not been resolved.

We brought in a nework company a couple of days ago. They changed the ADS router, re-terminated network cables. We also re-configured Norton Security Suite on the server to Ignore our database files. Other than that, they did not seem to know what else to do.

I went to examine the 'System Event Log' yesterday and got the message 'Event Log Corrupt'.

 Could you tell me where to get the collision rate and retries and also the TCP_KEEPALIVE settings ?
Joe WoodhousePrincipal ConsultantCommented:
I would've hoped your network people would have looked into more than just the physical wiring... oh well.

Open a command prompt, run "netstat -a". If you see a non-trivial percentage of the TCP connections (could be a lot of output) in a "TIME_WAIT" state then we suspect TCP_KEEPALIVE is hanging onto dead connections for too long.

I'm not sure how to check for collisions or network retries under Windows, nor how to check the current setting for TCPKEEPALIVE. (More of a UNIX person myself). For what it's worth your network people totally should know how to do this, and should at least have looked into collisions & retries.

If the Windows Server event log is corrupt then I think we have to suspect more things are wrong with this box. At a minimum I would be looking to restart it, and perhaps make sure your database server is being given a fixed IP rather than DHCP (and that the DHCP server reserves that IP so no-one else can use it).

Sorry I can't help you more with this, but I think we have to suspect both Windows and the network setup at this point, but I'm not trying to claim it couldn't be a Sybase problem... but "Event Log Corrupt" is not something a healthy Windows machine will ever say.
Cloud Class® Course: C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

robert_n_harrisAuthor Commented:
Hi Joe, I have examined the IP processes using 'netstat -a'. There were only 2 processes in a TIM_WAIT state. I searched the registry for TCPKEEPALIVE, but it does not exist. Maybe I have phrased the question wrong. I need to find out more about 10054 errors. Specifically,

1. What is a 10054 error. I can see from searching the internet that they are 'Connection Reset by Peer', but what does that mean ?

2. How can I reproduce them here on my development machine ?

3. How can I correct them ?

Sorry for the delay in getting back to you, but I have been doing a lot of thinking.
Joe WoodhousePrincipal ConsultantCommented:
We're beyond where I can help you with this, I'm afraid.

"Connection reset by peer" means a connection was broken and your Advantage server thinks it was from the other end. It may or may not be correct about that.

My (old) sources tell me TCP_KEEPALIVE is handled in Windows in the registry setting


and that the (decimal) value is in ms. A good number is probably around 15 minutes, which would be 900,000 (decimal).

The only way I can think to reproduce this on another machine would be to give the DEV machine the PROD machine's IP address (after changing PROD), and patching it into the same switch.

You need some network people to look at things like network addresses, settings and traffic. Checking physical cables etc was not a waste of time but is a bit strange as the first thing to test, let alone the only thing to test.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Joe WoodhousePrincipal ConsultantCommented:
Was it just a KEEPALIVE issue?
robert_n_harrisAuthor Commented:
Hi Joe,
  No. I Really dont know. There was no keep alive setting in the regisry. My original problem was that my program was freezing when commiting a transaction. I noticed from Advantages error log file that a socket error 10054 followed by a 'user disconnect' was logged whilst this was happening. Unfortunately things are never what they seem. On wednesday and thursday and last, the program froze but these error messages were not logged. This makes me think that something else is going on. If I could find a way of duplicating the 10054 error, then I might be getting somewhere.

On a seperate note, they use Windows 2003 server, last week the system event log became corrupt on 2 seperate occasions. This may also be related to the freezes thay are having. On  thursday we changed all of the NICs to Half Duplex from the original setting of 'Auto'. We had not reported errors on friday, although I worry that my customer is just getting used to rebooting rather that reporting the error.

Any further comments would be most useful.
Joe WoodhousePrincipal ConsultantCommented:
Autonegotiation can indeed can some network mischief if they'll not all playing together nicely.

If there was no KEEPALIVE setting then it means you're getting default behaviour of 2 hours... but yeah, from your description that probably isn't the root cause, just making things worse when the problem occurs.

I was asking because I wasn't confident I'd earned the points. Now I know I haven't. Will keep thinking about this for you.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2003

From novice to tech pro — start learning today.