Networked application with running problems

Hello Experts,

I'm a Delphi 7 programmer and have developed a multi-threaded network application that basically queues and generates database reports in printed and pdf format. The problem is that the application works 100% perfect on all segments of the network when the network isn't busy. However, when the network gets busy, it works 100% fine on the segment in which I've developed it, but in other areas of the network it either completely fails to produce the reports or it misses reports or data on reports intermittently.

The situation has become serious now that the network team consider my application to be the problem rather than the network it's running on and I've been asked to consider redevelopment (about 9 months work with all of the reports to rewrite). However, I find this difficult to understand when it can run fine on my segment of the network.

Can anybody shed some light on this problem and confirm my suspicions that it is more likely a network issue than the app, which after all runs fine all the time on one segment of the network (busy or not). Any suggestions as to what may be causing these inconsistencies would be greatly appreciated.

The details of are as follows:

The network:
We are running on a NetWare system on IPX on a single Dell PowerEdge dual-processor server (purchased new last year) with a mix of mainly 100BASE-TX and a few 10BASE-T workstations over a fibre backbone. The network team has recently changed hubs for Allied Telesyn switches on the 100Mbs segments of the network. At busy times there are probably in excess of 30 workstations running database apps from the server.

The database system:
We are running Extended Systems (now Sybase owned) Advantage Database Server (ADS) ver 7.1. There are a few small distinct databases and one very large DBF/CDX-based business-critical database. There is also a fairly busy medium-sized Paradox database running from the same server.

The application details:
My Delphi application queues and generates R&R Xbase reports. The application updates the ADS database before the R&R reports read from the tables. The application runs a synchronised thread which waits for each report to complete from the external report writer before submitting the next (uses an R&R control table for those who are familiar). ADS is client/server based, but R&R uses its own non-client/server Xbase driver to read the tables which therefore creates extra database traffic between the server and workstation. The R&R application that is called from my app runs locally on each workstation. The application and all the tables it creates are run from the server. At present there are around 5 workstations that may run the application concurrently.

If you require further details in order to answer this question, please don't hesitate to ask.

Thanks in advance,
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

How is the Network segmented?  You said it works on your network but not other networks, how are they related / connected to your netowrk and the ADS server network.

You are running IPX/SPX, is this the protocol used by your app to communicated with ADS server?
Not to hurt your ego, but it probably is the application, not in defects, but the logic in spawning processes and waiting for the result, or proceeding without it.  It could be something as simple as the assumption that a network share \\remoteseg\server\data\* is available, and insufficient planning for sending msgs back to the msg queue of the process error code.  To get into the details, you'd have to explain the calls for the different processes, and that would be too much to post here.  The write-before-read is a trouble, that could utterly fail on a network where the security to bridge from local to remote needs a different PW or something that simple.  Each call will have to be preceeded by a test call, analyze the return code in debug mode and I think you will find that something to do with inter-network privileges is not taken into account/
what you need is a bandwidth analyzer to tell you what the problem is. You application could be doing just good but need a little bandwidth management for it's utilization in a busy network.

There is a device called packeteer , wich could help your network administrator sort out the problem.. It's what we did on our network middle size ISP in Honduras and worked great for us.

Officially it's a little expensive but if your staff could spare the updates and support, you could find them here used ofcourse and lower price...

stick to the quality of your work and manage the enviroment in wich it works on.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

limsmanAuthor Commented:
Thank you all very much for your valued comments. Here are my replies to each:

naveedb: We have just the one network server connecting one medium-business sized network that has around 4 or 5 copper segments running over a fibre backbone. All but one are running at 100Mbs, the other is running at 10Mbs (due to be upgraded to 100 very shortly). So my network is the same as everybody elses on the system. Just that my segment seems to be running OK for the app. My app does indeed communicate over the IPX/SPX protocol to the ADS server. The ADS server by the way is "database server software" running on the same server that everyone is connected to. Hope this helps?

scrathcyboy: Ego not hurt! I just want an honest opinion and if it's my software, then I need to tackle it. The code has been extensively tested and stepped through line by line in each thread. The synchronised process waiting appears to be working perfectly. That is, by stepping through, it waits for an infinite amount of time for the report writer to signal back that it has finished a report before the next process is spawned (the process being a Windows API call to R&R to run the next report). Everybody that uses the application has full Netware rights on the network as well as rights on the 2000/XP workstations where local ini and registry reads and writes are needed. As explained to naveedb, inter-network privileges are not an issue as we are all running on one segmented network. The write-before-read should therefore be no trouble since the users have full rights to update the tables and read from the same. Hope this clarifies?

CKWT: Thanks very much for your comment. I have recently completed CCNA parts 1 to 4 and now recall that a protocol analyzer was mentioned in troubleshooting networks. I shall check out your suggestions and recommend this to our network team.

I look forward to any further comments from my explanations that may help resolve the problem and shall close and award points shortly.
If you can go with packet analyzer as suggested by CDWT and someone is available who can read packets, that would most likely allow you to resolve the issue.

Few other things you can try:

At busy time; when users on other segments are unable to work with your app; are they able to access the Dell Server? Like if it is being used for File Sharring, are they able to access their files etc.?

Can you do a stress test on your application with different combinations? 1) Connect more then 30 workstations on your segment and see if it works. 2) Do not run the application from your segment, run it from other segments, have over 30 workstations accessing it, and then try to access from your segment?

What is your fiber backbone, is it a switch connected with other hubs? If so, is it possible that you can swap the ports on the switch with a segment that is not working with yours?
"it waits for an infinite amount of time"
Ok, in that case, you need to add a "RE-request" for the lost packets when the NW is busy, so you need the following test -
Accumulate total time = T
Accumulate retries = R
>> send call to write file (wait-time-out = 30 seconds)
if #error, check if T >360, check if R > 12
loop back to >> if R-T not at max.

The whole principle of ethernet is to resend retries to overcome packet loss.  You have to do this in the app.  This will be where the problem is, yes it is OK to wait, but a busy NW means repeated resends until it gets through.  You will have to experiment with R and T, to reach the balance between not flooding the network with too short an R, and not losing program functionality with too long a T.  ALSO, very important, when you send a retry, you will have to also send a cancel prev request if the file write would create a problem if you did multiple writes, if no problem to write the same data twice, then let it go, as the cancel request is a lot of complicated programming.
limsmanAuthor Commented:
Thanks again for your further help with quite different answers between naveedb and scrathcyboy. As this has been my first time on EE, I've been overwhelmed with your help. I am now closing down the question now so that some things can be tried. It may take some time! Hope you don't mind, but I feel it only fair to split the points between all three contributors for this question.

Here are my comments to the last two posts:

naveedb: I am going to ask the network team to test using a packet analyzer as first suggested by CDWT. In answer to your questions:
When users have problems with my app, generally speaking, they can still access their files from the Dell server, although it has been known to be slow. The stress test sounds like a very good idea and I shall be suggesting it to the team. I don't know about the practicality of it all, but I believe that there are such devices that can emulate loads on networks by creating controlled traffic. By the way, today was part of our Easter holidays so only half the staff were there. One of the problem areas was reported just fine today without the normal heavier load. Thanks for your contributions built upon CDWTs original recommendation. As CDWT was the first to suggest this, I am going to accept CDWT's answer and split the award with 200 points each.

scrathcyboy: Thanks for your suggestion. I have recently coded something similar but it's pending testing. Basically, when R&R completes a report, rather than sending directly to the printer, it creates a print file which is checked for existence. If no file, the call is repeated for up to 5 times after which it exits the loop and informs the user of the failure. A check is also made after the database update before calling R&R so that the report is guaranteed to "see" the updated data. As you've mentioned, it's getting the balance between flooding the network and losing app functionality. It could do both! Writing the same data twice is not an issue. As you suggest, this all boils down to packet loss on the network when it's busy and therefore the first objective must be to sort out any bandwidth issues. I will however, take up your advice about putting in a wait-time-out for R&R in case it ever freezes. Thanks for your contributions. They are much appreciated. I hope the remaining 100 points are acceptable to you?
Points are your choice, but I still think you will find the solution is along my lines.  Good luck.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Network Analysis

From novice to tech pro — start learning today.