asked on

Database running on client machines runs slow when more than one user is using them at one time`

I have just installed a server in a small medical practice. There are 4 workstations 3 running XP pro one running Vista business. The server is running SBS 2008. Now each workstatiion has a database program on it the server has the data on it. Each workstation has a mapped drive to the data on the server the database program points to this. Now when you go into the program (it uses access 2007) you log in and you then get the consultant menu. Whe you select the appropriate consultant the tables link and you then get the consultants menu. Then to see the patient details you click on the patient screen buttton. Then you get the patient details up where you can enter data etc. Now normally when you select the consultant and the database loads it takes about 10 secs maybe less, However if another user is in the pateint details screen eitheir entering data or it is just idle. This loading process can take up to a minute or two and if the other logs out of the pateint details creen during this it suddenly speeds up to normal speed. Now each workstation has a 10/100 broadcom network card and the server has has a broad com gigabit network card. These are are CAt5 cabled to a PowerConnect 2816 Web-Managed Switch, 16 GbE Ports. Now when i acces the shared folders its blazingly quick. If i drag folders from the server to a workstation it is like lghtening. It only takes seconds to log on to the workstations I just dont get why its slowing with the database application. It didnt used to when it was just on a peer to peer only on the machine that was acting as a server. When I look at the network stats when this is going on the use of bandwidth is miniscule. Can any one help?

HainKurt

looks like you have to re-design your application...

like using sql server and access application on each client...
access db on server, access applications (queries, forms reports, macro) on clients...

access is not a good solution for this purpose...

bbarr5179

ASKER

Think the latter is how it works but not sure as I didn't design the software. It's bought from a third party and was in use before I was involvedvin the network. When you install it on workstation access 2007 runtime has to be installed on the workstation seperatly if the version of office you are using doesn't include it. Then the data is put on the server and shared and you map a network drive to the share then point the program to the map drive to find the appropriate data. This is how the support staff told me to set it up. This seems like sage in this respect. I have a a similar problem with sage on a similar network but it didn't slow down nearly as much and I sped it up by stopping services on the server that weren't being used like exchange from running have done this here but to no avail.

HainKurt

try to disconnect the vista (from the network, or shut down a bit) and use it between xp's
maybe the issue is with vista, who knows...

bbarr5179

ASKER

I have shut down the Vista machine and it is instantly quicker. The machine originally came with XP pro apparently so I am going to reinstall XP and see what that does

bbarr5179

ASKER

No go still slow help?

Justin Owens

I have worked in IT in the medical field as a consultant for over a decade. In my experience, programs that are designed to hold PIM or EMR for "small to medium" sized businesses are trouble. MSAccess database really isn't wonderful at multiple, rapid access connections. Your best bet is to replace the database structure with something that is designed for Enterprise level use. I would also suggest you speak with the vendor to see of that data can be safely migrated. They probably have an SQL version for a little more than what was already paid.

Justin

CoreyMac

There are quite a few reasons why it is so much slower. Switching to SQL does not fix everything by any means.

MS has some tips, but mostly with a thrid party app you are only able to do a few things with Access databases. Compact and repair the Access database (after backing it up) is a good place to start.

It could be that you are constrained by the physical disk. Use Performance Monitor on the machine acting as a server to see if the disk is a bottleneck. If switching the machine back to XP doesnt fix it then look at the performance counters. If the aveage physical disk queue is consistently abot 1 or 2 during the slow periods, then the physical disk is a likely place to improve performance. If the physical disk tranfers per second averages more than 75-125 and/or if the disk time per transfer is > .020 seconds (20 milliseconds) then the disk is getting another vote as a performance limiter.

If these are true you should consider getting one of the G2 Intel 80G or 160G SSD 34nm M series SATA drives for storing the database shared files. It will eliminate most of the disk latency for the application. (there are still potential protocol issues like latency, op-locks, inefficient application design) On the more drastic side, running on all Vista/Win7/Server2008 platforms can help since the newer SMB2 protocol is about 2x as fast in many cases.

http://office.microsoft.com/en-us/access/hp051874531033.aspx

Access is NOT an Enterprise application platform, but tuning and understanding how it is being used can go a long way towards making it work well in a small office environment with 2-10 shared users. It really just depends. With a little bit of homework you can most likely make this work just fine for the users.

CoreyMac

There are some other basic system tuning things like defragmenting the disks on the machine hosting the Access database on a regular basis. Putting the database on a separate drive letter with larger cluster sizes. (the correct size can be determined by using performance monitor to learn what the average I/O size is for the application) 10 seconds to load a screen up os pretty slow for most applications. Normally you want it to be under 5 seconds per screen as human factor studies show that we tend to get distracted by about 7-8 seconds.

You might try attaching a couple of the workstations at 1G data rates and see how that helps. 1G links have lower latency and can deliver about 2X the number of transactions as a 100M link. You are not saturating the link, but you want each Access disk I/O to be as rapid as possible. If you have to get new NICS I would pick Intel 1G NICs first and Broadcom 2nd. Try and avoid really cheap ones as they tend to not perform as well.

What kind of server setup do you have? If you have a RAID controller in the system, it is VERY important that you have a battery backed up write cache with Access database applications.

Both for security/stability and for speed. Without this, every database write waits on the physical disk to complete. With it, the database writes are written to the cache at 10X to 100X the speed of the disks.

If you do not have a RAID system for the server I would strongly suggest either you get one or you get the SSD as mentioned before. You can do both, and RAID SSDs, but it gets a bit expensive and there is a point of diminishing returns.

A couple of hours of testing and monitoring and you can get enough facts to solve this I am pretty sure. I have been working on systems like this since before PCs had hard drives and while lots of things have changed, really it is not so different, we are just using much bigger numbers now. KB->MB->GB->TB :-)

Throughput is rarely the bottleneck today. Bandwidth is very cheap. Latency is the key to performance on most database application systems as far as the users are concerned.

CoreyMac

There are some proof of concept tests you should run just in case. Copy a very large file from the server to the WS and time it. Then copy the same file back to the server. You should get 10-12MB/sec on a 100Mb link in either direction. Time it on a WS with a 1G link as well to compare.

Now copy the same file using a WS rom the server to another place on the same database server volume and that will let you know how the server handles large bi-directional I/O on that disk.

This will set the upper limit on performance. How large is the database BTW?

You can post any of this information here so we can look to see if anything seems amiss. While all of these test are running, you can ping the server from some of the the WS to make sure that latency does not climb for any of them which could indicate some sort of switch/cable/NIC issue.

If the performance with one user is great, but 2+ is a problem, then you are likely having issues with op-locks and the caching/disk tuning could be the best way to assist...

CoreyMac

One final note... Sometimes a drive letter map to the directory where the database files are can improve the repsonse time of the connections since UNC links are occasionally much slower.

bbarr5179

ASKER

I have put the vista machine back to xp pro however the application crashed on that machine last night and even though I have reinstalled it is running really slow. I have spoke to the company that created the software and they have that the app normally takes 8 to 10 secs to load and that there are no Tweeks I can do to speed it up. I have changed it to using unc path instead of the mapped drives and this seems to have made a difference as well as installing standalone antivirus on the server and all machines instead of using control center on the server to monitor all machines. I have also changed the share permission of the data folder instead of authticated users having full control I have changed it to users and have added the users names as well. Now three of the machines seem to cope better there is still some delay if one machines is doing a mail merge to create a letter but it's better than it was. I will try your suggestions though and post any results.

bbarr5179

ASKER

Also with the remaining machine that the app crashed on I tried reinstalling the app to no avail so rung the company. They told me to reinstall the machine.

CoreyMac

Monitoring the server during the slow times is the best time to catch the bottleneck. Vista <-> 2008 should be faster than XP <-> 2008. My tests show 2:1 better in sustained throughput, but I have not done the more important latency tests yet myself. One issue with antivirus is that you want to avoid checking the database files. Exclude them from the AV scans on both client and server.

bbarr5179

ASKER

Ok have am transfering a file across the network. The file is about 2.4GB While the process has been going on I have been running remote desktop from another workstation into the server and running performance monitor. The network speed is a consistant 25mbps the disk speed is at the highest so far 2-5 MB/sec. I have also pinged the server from each WS the response from the server is instant.

bbarr5179

ASKER

I have copied the same file back to the server and the time was the same give or take a few seconds. I have also copied the same file to another location on the server and it took 1min 55secs. Over the network it took 7mins

bbarr5179

ASKER

I have just done the same using the WS with the gigabit card in and it transfered the same file from the server in 5.26mins

CoreyMac

Those are awfully slow numbers when using the network. Something is wrong. You are getting far less performance you should as a minimum on the gigabit WS. The gigabit WS copy should have taken ~2 minutes or less.

There are several options that could be involved.

Check for errors on the ports for both the clients and the server on the switch.
Possible faults here include cabling
Make sure the links are negotiating to 1GB for the WS and server.

Check the TCP/IP offload on the server. It should look something like this, but post the results if you can.

C:\>netsh int tcp show global
Querying active state...

TCP Global Parameters
----------------------------------------------
Receive-Side Scaling State : enabled
Chimney Offload State : disabled
Receive Window Auto-Tuning Level : normal
Add-On Congestion Control Provider : ctcp
ECN Capability : disabled
RFC 1323 Timestamps : disabled

CoreyMac

fixing this should not be a huge problem, but there are many things that could causes these symptoms. The process is one of eliminating them one at a time. First we need the machines to be on the network performing well. Then you can tackle the application and disk I/O issues if any...

To test the upper limit of the server in delivering data you can do these simple tests:

From the server command prompt, time this command: (X: is the volume where the database file resides)

C:\> copy /b X:HUGEFILE.DAT nul:

From the GigE workstation do the same, but the source file is the same file you just tested, but you are copying it from the server to the WS, but not writing it to the WS disk. This eliminates the WS disk as an issue. Assuming the machines are 2.5GHz+ and otherwise idle, the numbers should be fairly close up to about 50MB/sec. If the server number is greater than 50MB/sec the WS will not really be able to keep up if it is XP.

C:\> copy /b X:\SERVERPATH\HUGEFILE.DAT nul:

A 100Mbps attached modern client should copy about 10-12MB/sec with this command. GigE will be about 2X - 4X greater...

Rob Williams

Have you looked at "opportunistic locking" as a possible cause of the delays? It is something I am not very familiar with but I was recently privy to a somewhat similar issue with a low end database. Opportunistic locking is usually enabled by default, and for data protection probably should be, but some database apps believe it needs to be disabled. In the case of the problem I was involved with it had been disabled during the database app original configuration. Enabling it solved the poor performance instantly. It might be worth 'playing' with by enabling/disabling.
http://support.microsoft.com/kb/296264

CoreyMac

The point to these various checks and tests are to work through the 7 layers of the OSI model and verify the function and performance of each. We can't really check all 7 since they aren't being used, but basically we are starting at the bottom (Physical) and working our way up to the application.

These are not all the real OSI names, just what we are testing here...

Layer 1 is the physical (the wire)
2 is the MAC layer (the switch to NIC link so checking the stats on the switch is useful)
3 is the IP layer (pinging and basic TCP/IP connectivity says this is likely fine)
4 is the TCP layer (the netsh command and the file copies are helping with this)
5 is the SMB layer (the file copies and Vista versus XP, op-locks, etc... live here)
skipping 6
7 is the Application layer which is mostly un-tuneable according to your vendor.

There are also the database and disk I/O issues as cuplrits, but if the LAN is performing this poorly, we should start there.

Rob Williams

If one PC connects, performance is fine, if a second PC connects to the same database there are performance issues. As a Microsoft Networking MVP I would be the first to suggest looking at the network infrastructure especially the cabling. However where performance is fine unless a second user connects to the database, it sounds like an issue with the database or application itself. As mentioned by HainKurt, Access is not an ideal product for this, bur 2 users shouldn't be a huge issue, and it did work fine before. It sounds like a configuration issue of some sort.

CoreyMac

I agree, it is just that there are probably 30 or 40 actual things to check. When the network performance is this poor, (GigE native is running ~7.5MB/sec instead of 20-50+... Barely more than 1/2 of what a 100Mbps link provides), it is hard to see much else working well.

The way op-locks work, the first client is fast because it can cache the database and index files locally in the workstation file cache since no one else is laying claim to them. That makes performance excellent and the network is not a primary factor for many apps since the file data ends up in the local WS cache much of the time. All the locks are local, I/O is not network latency dependent, etc...

Once the second user joins in, the world changes dramatically, the local WS cache must be flushed and now the network server file lock/unlocks and network I/O transactions are the bottleneck. If the network (or server) is unhealthy in some way, it can really show up now as SMB/TCP retransmissions are painful and costly in terms of the number of database operations per second that can occur.

If there is a huge difference in 1 vs 2 users, then either there is a huge amount of data being processed from the local cache, or a very large number of disk transactions. Any existing network/protocol/server weakness or problem, makes this performance gap ever larger. In that case, while op-locks are very important, they are more of a symptom than a cause.

You obviously can't make most networked shared apps run as fast as they do locally with Access, but by taking a look at each layer in process, the problem should be eliminated and things should run well enough. If not, you will have eliminated a huge percentage of the possibilities and will be much closer to the solution. A bonus is that usually you will improve other pieces along the way and once the original problem is fixed, overall response will improve beyond where it was in the beginning.

bbarr5179

ASKER

I was in the office yesterday when I ran these tests. The disk response time when I transfered the file was about 5-8 ms. All the nics are running at 100mb and the gigabit one is running at 1000mb also the server is running at 1000mbs. Yesterday I had it so that 3 machines were running fast also while in the database app I ran other apps like office etc and Internet explorer ad well as accessed the shares on the server and it all kept running fine. There was the odd slow down but if you took one machine out off the patient screen the speed goes backbto normal. Unfortunatly I'm not going to be able to do more testing until next weekend as the office is back to normal and they won't be able to spare the time. I ran all the machines with more than one app running at the same time with lots of windows open and went from machine to machine accessing the database program and using it how they would and leaving it open when I went to the next machine and thet seemed to run fine apart from the couple of slow ups I mentioned earlier. It could be the cabling as there is much more cable than is needed and it is coiled up in places. But the database speed has improved. But from what you are saying there are issues. I don't think though I will be able to deal with them till next weekend when the office is free.

bbarr5179

ASKER

As I said the guy I spoke to at the company that made the software said that he was running it all on a local machine and it took 8-10 secs to load. I have had 3 machines loading in between 2-5 secs even with multiple apps running and with files on the server open. When it slows it takes 40-50secs but like I said if you come out of the patient details screen on one machine it instantly speeds up.

Rob Williams

Out of curiosity what is the name of the problematic application?

If you have coils of cable and any doubt about the cabling integrety you should likely have it certified. As CoreyMac stated you really need to make sure your basic network infrastructure is sound. Network cabling is not like telephone cabling where if there is a connection it works. Stretched cables, improperly terminated cables, proximity to EMI, especially if coiled, all can cause "cross talk", and thus lost packets, which result in retrasmissions that can take a network "to its knees". Certification has to be done by a proper installer and requires the use of a $6k - $10K piece of test gear. Any reputable installer will have one. Numerous high end database companies will not install their product on your network unless you can provide a recent certification report.

Also you mention the speed of the NIC's. Have you locked any of these at a particular speed? Never do so unless you also have managed switches and lock those as well. If you cannot lock the switches, leave everything at auto-negotiate.

bbarr5179

ASKER

Ok I will set them to auto negotiate I have them set at full duplex maximum speed. To be honest I don't think the database company care about the state of the cabling also the old network ran fine on the same cabling with the new switch. As I put the new switch in and server but didn't switch it to the domain until I could talk to one of the database guys to set it all up on the server.

bbarr5179

ASKER

The name of the application is Private Practice Manager or PPM

Rob Williams

>>" I will set them to auto negotiate "
Good. If one is set to auto-negotiate and the other to a fixed speed and duplex, sometimes the port can actually lock up when negotiation fails.

>>"To be honest I don't think the database company care about the state of the cabling "
Probably not, but you should :-)
I was a cabling installer and troubleshooter in the NT days prior to doing IT work. It is amazing how many networks work, but at 25% efficiency and nobody realizes it.

I am not familiar with PPM. Just thought it might be one of the Medical Apps I have dealt with on occasion and would shed more light on the situation.

CoreyMac

Given the performance you see now, there is likely a cable issue or a speed/duplex mismatch between at least one of the machines and the switch. The switch should be showing errors for any machines with problems. Errors and such you can investigate without taking down the network or users is you just look at the switch port stats. You can also use "netstat -e" to look at counters on the WS/Server as well as the Broadcom utilities.

You should install the current Broadcom drivers and utilities on every machine, so that you can be sure the NIC setting are correct. Early Broadcom drivers and such were know to have serious configuration issues. It should not matter if they are the absolute latest, but no more than 6 months old I would say.

bbarr5179

ASKER

Will check the drivers for each of the machines I know the server ones are up to date as its brand new same for one of the other machines but the other 3 probably need updating. Im gonna sound really lame now but how can I see if there are port errors on the switch. I know its a managed switch but Im only used to non managed switches. Do i need specialist equipment to manage it?

CoreyMac

No you don't. There are a couple of options. A CLI for terminal access or the Web interface.

Here is the manual page for setting things up to be managed. You can choose whichever method you are more comfortable with.

http://support.dell.com/support/edocs/network/pc28xx/en/index.htm

These switches are pretty easy to manage from the web interface once you get them set up. Read all the intrcutions first though if this is your first time. Some of the configuration changes will reset the switch and briefly drop all the network connections. Doing that during a work day is obviously going to be disruptive. Once you get insto the web page though you can monitor statistics, check for errors, etc... without disrupting things.

bbarr5179

ASKER

I have set all the NIC's to auto negotiate. Everyone in office this morning and the app is running slow have spoken again to the manufacturer of PPM and they said as working before with switch and cabling its probably something on server. Checked the TCP/IP off load on the server and it is exactly the same as you posted earlier on here. havent managed to get into switch yet as too disruptive as it will shut down network and they are really busy. Will do it later when the office is clear

bbarr5179

ASKER

Have just found out something interesting. One of the staff has commented that her machine has always run at the slower speed even in the old configuration. Also this machine when I went to defrag it sat just froze up and had to be restarted I even tried a defrag in safe mode and it did the same and it seems the other machines when they are running ok if I go into ppm on this machine it seems to slow the others. Could this one machine be causing this?

CoreyMac

Yes it could File/record locking in Access is dependant on the slowest participant. If one machine locks a table and is very slow to unlock it, there could be affects on all other machines attempting to use that database table.

Unless this slow machine is the one tested, it would not explain the poor file copy performance you experienced during the tests. Could be multiple issues here to fully resolve the problems.

bbarr5179

ASKER

Ahh this sounds like it could be it and it just so happens that this is the machine that I used to copy the file. Of course like you said there must still be an issue as this isn't the machine with the gigabit nic.

bbarr5179

ASKER

have logged on to the switch there are no errors or collisions crc or align erros, fragments or collisions

bbarr5179

ASKER

However on the port the server goes into after i cleared the stats and run ppm again and got it to slow down there were 194 dropped packets

bbarr5179

ASKER

Have shutdown all machines but the ws with the gigabit card in and the server and transfered the 2.4gb file to the ws again. At first it transfers at 5000mbps then for some reason it drops and settles at between 30-49 mbps

CoreyMac

With Gigabit it is sometimes helpful to enable flow control. That shoudl eliminate the dropped packets. The question would be why is it dropping packets. It should not in a two station file copy.... Are the drivers up to date on the WS?

The burst speed is obviously bogus and the 30-49MBps is about right, IF you mean MegaBytes Per Second and not MegaBits) If this is MegaBits then there is still a problem.

With GigE there is auto-MIDI-X cross over for most devices so you can eliiminate the switch by plugging the WS into the Server and assigning two static IPs.

CoreyMac

There are several reasons why the switch would drop packets. Flow control helps to illustrate who might be having issues.

If the switch is unable to push out packets to the destination while new ones are coming in, then it would run out of buffers and drop them on the floor. The data rates you are talking about are not high enough for that to be a problem. At least not usually. Dell and other low cost switches tend to have pretty small port buffers, but for just two stations that should not be a problem.

Why would a switch with only two active devices drop packets? Either the switch is sick and not working or one of the end stations is not playing nice for some reason.

You could try a small 5 or 8-port unmanaged switch if you have one or the back-to-back cable in the earlier message.

So Mega BITS or Mega BYTES is the question...

Rob Williams

Is it possible to move the PC and bypass the existing wiring and use a patch cable connected directly to the switch?

bbarr5179

ASKER

That file transfered this time in 3mins

CoreyMac

The 2.4G file from the Server to the GigE WS? Drag and drop or COPY /B? Did you use RobWills suggestion or direct link or an alternate switch? Basically I was wanting to know what changed.

How about flow control on the switch ports?

CoreyMac

You can also check the Broadcom utilities on the client WS to see if there are errors on that side...

Many Broadcom and Intel GigE devices have a cable tester built in to the NIC port. If the link comes up at GigE it will test all four pairs for certain issues... You can use the driver/utility NIC interface to do so...

bbarr5179

ASKER

I used copy and paste I found though that performance significantly reduced when I did it again with the other machines on. This I did after I had matched the port settings on the switch to the nics on each machine. That's what had changed

bbarr5179

ASKER

The switch seemed to have a cable tester built into it. Haven't tried it yet though. Certainly now I've sorted out the switch settings when looking at performance monitor when transfering the large file it seems to have made quite a difference running consistantly at 100-150 Mbps and higher as apposed to before it's still made no differnce to ppm Though I am beginning to think that this application isn't ment to be on a domain and has been desighned to run on peer to peer networks

CoreyMac

It should be able to function well in both environments. At least I have not run across very many apps that really behave differently. 100-150 for single large file is still pretty slow for a GigE pair of links.

Were you able to eliminate the drops as seen by the switch ports? Did the NIC management utility show anything about the configuration? How about the NIC drivers? I know there are a lot of things to check... I am sure once you are able to take care of each one things should be better. How much though sort of depends on the app. What about compacting the Access database and defragging the server drive. 2008 server should do that automatically anyway, but it might be worth double checking. Is the database on the same drive letter/volume as the OS on the server or is the drive partitioned? What sort of storage/RAID is being used?

bbarr5179

ASKER

I have just seen this on ppm's website:
Additional Copies

You may load a copy of the software onto another PC at no additional charge. As 'PPM' uses one data file, only one copy can be used at any one time and the one to be used must have the latest data loaded onto it before work commences. This can be done over a Network, via the Internet or by portable data storage device..

Does this mean what I think it means that really ppm should only be accessed by one machine at a time?

CoreyMac

Hard to say. Theiy are being a little vague if you ask me. Surely not though... What good is it with only one user?

Maybe they are speaking about the license on a per seat basis.

bbarr5179

ASKER

Thats what I was thinking, but it would explain alot. The guy I spoke to said it needs administrative rights to run which on a domain is a big no no you never give administrative rights to a bog standard user and tyhey stated they had never put it on a domain before and that I would have to figure out the rights problem. All I can think of is that its for very small practices. However this explains whats happening if all the machines are accessing a single file surely they will have to negotiate on the server for acces to the file and that explains the slow down when more than one person is using the application.

bbarr5179

ASKER

well after all that I decided to have another go at trying to sort the problem out. I went through the switch stats again and found some collisions and errors on a port so I checked the whole switch again and made sure that each port matched the nic it was connected to. The result. I had all four machines running the database simaltaneously there were some slow downs however but I feel it's getting some where.

CoreyMac

It soulds like you are indeed making good progress. :-) Sounding better all the time.

Your explanation about the app being a single file that is shared is true, but that is how Access applications work anyway. It does negotiate with the server to lock and unlock regions of the file for shared access, but with most applications and a small number of users (< 10) it can work just fine.

Normally it is application dependant, but there is not a huge amount of special coding you do in an Access application to make it shareable. Some apps however do share better than others, but 4-5 should be fine in most cases if the server and the network are healthy.

What kinds of numbers do you get back from Performance monitor now? I/O/sec for the drives, queue depth, disk transaction times, network transactions, server, etc... You can use performance monitor to record the statistics to a file and post it here if you would like. CPU, memory, Network, Physical/logical disk, Server, etc... You can just record it all for a few minutes while the four workstations are running the most delayed/sluggish transactions. If you can reproduce and record the problem, in all likelihood, it can be resolved.

There are always bottlenecks, since if there were not, everything would be instantaneous. :-) The trick is to push them around and eliminate what you can one at a time so that the users perception is that life is good. Partly engineering and partly psychology. Fixing the pieces that cause them the most distress can help and those also tend to be the easiest to reproduce and fix.

One thing about shared files apps like Access. Many of the users will experience the delays caused by the slowest machines. After all the network issues are resolved, if two 1G NIC workstations are significantly faster than the 100M, then likely they will all benefit from being upgraded to 1G clients. Let you testing and such tell the truth. Benchmarks are only general and rough guidelines. Your application is the true test.

If the disk queue gets consistently above 1X-2X the number of drives used for data, then you likely have an I/O bottleneck that an Intel G2 SSD can fix. You have not described the server configuration so that part is not yet clear. Once the network is solid the most likely targets include the disk I/O on the server (mechanical stuff is nearly always the slowest part). The type and model of RAID (if used) and the logical/physical drive layout can be very important here as well...

bbarr5179

ASKER

Ok i will post the server specs on here also will record some data and post it for you to look at. I've just had a call from the office they are saying that ppm is running slower than yesterday. When I left it last night it was running better. There was one machine that has always been slower as I told you and it was the lady that uses that machine that phoned me and said things were slower.

bbarr5179

ASKER

Here are the server specs:
Quad Core Intel® Xeon® X3230 Processor, 2x4MB Cache, 2.66GHz, 1066MHz FSB
4GB Memory, DDR2, 667MHz
500GB, SATA, 3.5-inch, 7.2K RPM Hard Drive (Cabled)

The HD is divided into 3 Partitions C: for OS, D:Data partition and E: recovery partition

The D: Partition is the largest I dont have exact numbers as I'm not in office at mo

The drive set up is how it came from dell.

CoreyMac

Thanks. Good luck today! :-)

bbarr5179

ASKER

I capture the data for the disk over a period when the application is loading consultant data and its slow. I have attached them.
Network-Data.csv
Disk-Data.csv

CoreyMac

These traces would indicate that the host should not be I/O constrained in any way. There is only about 30MB of disk I/O on the server and about 40MB of network I/O and that is spread over about 260 seconds. Really low data rates. Disk queue is empty most of the time and so it all looks very quiet.

Looks like the next best thing might be a packet trace from the server NIC and one at the same time from a workstation during the slow down. Using a similar time frame as the perfmon traces...

MS Netmon v3.3 is likely the best tool in this case as it can capture the process name generating the traffic when run on the machine using the application. In this case though it may not matter much which tool is used since the traffic would not be a socket application like IE, but file sharing Access.

NetMon v3.3
http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=983b941d-06cb-4658-b7f6-3088333d062f

If you prefer, Wireshark is also very good for capturing traffic.
www.wireshark.org

Since we do not know what we are looking for capture everything on both sides for the trace. Please do not use the machines for anything else if at all possible during the capture. Partly because it will make the traces harder to interpret and partly because you do not want any passwords or anything sensitive in the traces. Zip up the file when you are done and see how large it is. Depending on the size we might do something different with the file. Don't upload it here though just yet.

What kind of data is in the Access database? If it is medical data that contains specific patient information it would violate HIPAA to post it here unencrypted.

To get around this there are a few options. One way is to tell the packet capture tool to only capture say the first 68 or maybe up to 100 bytes of the traffic. 68 bytes might get us enough to see what is going on for now and it will not contain much if any bytes of actual file data. I did some tests and 68 and 132 bytes is where the headers end and the data starts. (this is sometimes called slicing the packets) In NetMon 68 bytes is the default size I believe and the setting is under Tools->Options on the Capture Tab. Another option could be to Email it encrypted to somewhere besides here. Then send the password in a separate message to another address.

Also try checking the switch ports again for errors or anything out of the ordinary. I looked for a firmware update for your switch and they have really only had one update since the switch initially shipped. The release notes did not indicate there was anything similar or serious for the most part in that update. I suppose it should not hurt, but odds are not good it would help much based on what they documented.

What did the cable tests show from the switch ports?

bbarr5179

ASKER

I have tested the cables and found shorts on the one from the server to the switch and the gigabit WS to the switch and have replaced them with brand new cables retested and they have registered as ok. Also foundm cable faults on the cable to the router and to the consistantly slow machine. Cant replace them at this moment in time as have to get more and with the raods the way they are in this country at the mo I'm not going to be able to get some for awhile. Also have done a network capture from a machine running PPm slowly and have saved the data what am I looking for?

bbarr5179

ASKER

I have run the expert add on in Net monitor. The TCP analyzer. When PPM is running slowly from the WS to Server it takes 56.062ms and the average data rate is 4.498 KB/s (36.784kb/s) the retransmitted data is 648 bytes.

From Server to WS it takes 55.859 ms the average data rate is 89.390KB/s (715.117kb/s) and the retransmitted data is 0bytes
The experts states its slow because of Limited bandwith. This is when two WS (this one included) are running ppm at the same time specifically one is in the patient details screen the other is loading the consultant data.

Now if I log the ws in the patient screen out it instantly speeds up. Now i have recorded it when PPM is running quickly with no other WS logged into the patient screen. It is as follows.
Time Taken From WS to Server: 41.234ms

Average Data rate from WS to Server: 97.690KB/s (781.517)kb/s

Retransmitted data 0 Bytes

Time taken from Server-WS:41.421ms

Average data rate: 6.370KB/s (50.964kb/s)

Retransmitted data: 266

CoreyMac

A few things are standing out here assuming the numbers from NetMon are correct.

A) No local LAN-based client/server should have 40-60msec of latency for any shared file based conversation. The goal is to be not muck slower than the disk as a worst case. Anything averaging above 20msec is bad and with caching on the server, you should be much lower. In the 1-20msec range.

B) Similarly you should not have any significant retransmitted data. (maybe one or two per hundred oreven thousand over several minutes.) 266 is a huge number for a local link.

C) The speed difference between what file copy performance numbers you get and what is more typical is still an apparent issue.

D) The speed difference between 1 user and 2 would seem to indicate that the physical link (cableing/switch port/NIC/TCP-settings) are causing the network to be unreliable and unpredictably fast/slow. This has a strong tendancy to mess with the TCP Round Trip Timer and RTT Time Out RTT/RTO to drastically slow down the network.

Overall these in combination with the previously discussed opportunistic lock behavior (comment ID 26164576) would indicate to me that once the network is solid, the lag times should dramatically decrease. My best guess is that with only one user it is working so well because the application is caching the Access database file locally. This mostly takes the network out of the equation since the database I/O is ~40MB and that easily fits into the cache.

-------------------

After the current network issues are resolved, going by the listed configuration, sharing a single SATA hard drive with multiple users as well as the operating system is less than optimal. If you get the opportunity, add a second drive (either the Intel G2 SSD or a WD Velociraptor depending on their storage requirements and budget) to the server and put most of their heavily used data on it.

This will help make the machine behave more predictably and that make the users generally happier.

-CoreyMac

bbarr5179

ASKER

How do I go about resolving the network issues? I know that I need to change some cables but all the machines have been reinstalled and are clean. The nics are all set to match the switch I have checked the switch again for errors etc there are none. The ws that was running slow has been reinstalled and I have found that it also has a gigabit nic so I have adjusted the switch settings accordingly and it is running quicker on it's own but ppm still runs the same. Could this be a dns issue? Or is it just a case of network bandwith and latency? Like I said I'm at abit of a loss now for what to do. I have also looked at oplocks on two of the ws it was disabled however enabling it has made no difference whatsoever and I looked for it on server but couldn't find it and I followed the microsft page to the letter.

CoreyMac

OK. Maybe I misunderstood.

I thought from the comment 26272464 that you still had a cable problem with the slow workstation and the router. I missed that you had replaced the server cable already. My bad.

Since this is really an iterative process, so lets retest thoroughly to see exactly where we are.

1) All the cables should be verifiably good or those machines should be disconnected for this test.
2) All the NICs and switch ports should be set at Auto-speed/duplex unless we find a specific reason for one of the 10/100 NICs not negotiating correctly. (I have not yet had a 1G NICfail to negotiate correctly if the drivers and switch code was current.)
3) Verify the drivers are all current for the NICs.
4) Lets test/exercise the network paths for those where we believe the cabling is solid.
put a copy of the same large file (1GB is enough) on each of the machines (this is a scratch file, we will be erasing)
make sure this file is accessible by each of the other machines (share the drive)
defragment the disk drive where this large file is on each machine after it is copied
reboot each of the machines and start clean
reset the switch counters or reboot the switch
make sure the server is idle and not really being used at the moment
open a command window (CMD.EXE) on each machine and set the prompt "prompt $d$t$g"
this will record the date/time in the screen so you do not have to hand time it.
I am assuming there are two GigE WS (WS1 and WS2) and one SRV

from each WS machine, one machine at a time do the following:
repeating each copy at least two times. if the numbers are not similar each run, that is important so do it three or four times:
NETSTAT -S
COPY /B \\SRV-IP\Share\BIGFILE.DAT NUL:
NETSTAT -S
COPY /B BIGFILE.DAT NUL:
COPY /B BIGFILE.DAT \\SRV-IP\Share\BIGFILE.TMP
NETSTAT -S

COPY /B \\WS2-IP\Share\BIGFILE.DAT NUL:
NETSTAT -S
COPY /B BIGFILE.DAT NUL:
COPY /B BIGFILE.DAT \\WS2-IP\Share\BIGFILE.TMP
NETSTAT -S

From the server do the same to/from each WS:
NETSTAT -S
COPY /B \\WS1-IP\Share\BIGFILE.DAT NUL:
NETSTAT -S
COPY /B BIGFILE.DAT NUL:
COPY /B BIGFILE.DAT \\WS1-IP\Share\BIGFILE.TMP
NETSTAT -S

COPY /B \\WS2-IP\Share\BIGFILE.DAT NUL:
NETSTAT -S
COPY /B BIGFILE.DAT NUL:
COPY /B BIGFILE.DAT \\WS2-IP\Share\BIGFILE.TMP
NETSTAT -S

This will take a few minutes to get through them all, but it will let you create a little spreadsheet matrix that should provide valuable insight into the relative health of the network between each machine.

Create something like this:

The grey boxes are where you fill in the timings and you can set the file size in the upper left hand corner. The times you can get from the command prompts and the calculations should be done for you in the spreadsheet.

A couple of things this will answer. Which links are solid and which are not (if any)? How consistent is the performance? If there are issues, which links are the problem and are they causing excessive retransmissions (the NETSTAT -S command)

Please post the output of the commands if you can so that we see all the numbers and you dont have to type so much... :-)
This will let us proceed to the next steps.
Srv-WS-Throughput-Chart.xls

bbarr5179

ASKER

Superb thanxs for that will do tomorrow eve and post results asap cheers

CoreyMac

Good luck and have a fun time...

:-)

bbarr5179

ASKER

OK I decided to start at the beginning. According to the office staff PPM was running fine before I installed the server but AFTER I had installed the switch. At this time the network was peer to peer and the PPM data was on a WS shared and the other P.C's connected to the WS share via a mapped drive to get the data.

So I have transfered the Data to the gigabit WS removed it from the domain and shared the data as before. I have also removed a 100MBps WS and connected them both with brand new cables to the 10/100 switch that was being used before I installed the Dell switch. I have also updated the network drivers on both these machines and have installed new cables which have both tested OK. Guess what? PPM runs EXACTLY the same. However on the machine that is acting as a server so to speak it always runs fast as it is local to that machine however on the machine trying access the data the exact same slow down happens if the person on the GB WS is in the patient details screen.

bbarr5179

ASKER

Will still run teststhat you suggested though but will do it at weekend when have more time. Cheers

CoreyMac

When running it where the one remote client is the only one is the performance similar to the local client only scenario?

bbarr5179

ASKER

Sorry don't understand what you mean by remote client. If you mean that does the machine that uses the network to access the data run the same as the one with the data on when I isolated just two machines as I said earlier. The answer is ppm on the machine that accesses the data over the network runs slow the same speed as if it were on the domain with the other machines but the Ws with the data on runs fast regardless. When I had just these 2 ws networked together I forgot to say I run diagnostics on the nics and they came back ok

CoreyMac

Sorry about that. I meant if there is only one user in PPM and they are across the network, does it work well then?

bbarr5179

ASKER

Yes. But as soon as another person goes to log in that persons ppm is slow loading the consultant but only if the person who is already using ppm is in the patient details screen or the diary. If the person already in ppm is in the main menu it loads quick and swaps consultant quick. Whether it was just the 2 ws I networked together seperatly or whether it's on the domain. Sorry if I haven't been too clear.

bbarr5179

ASKER

I have tried to run the netstat tests that you posted however. Whenever I try to run them I just get network path not found. I have checked that the file is shared and accessable and it is. I can even type the path in in "run" and it will take me to the file but if I copy the path from run and paste it into cmd prompt it just says network path not found.

bbarr5179

ASKER

Also if i Ping each computer using its hostname it replies and it also replies using its ip address. Also if I type in the copy cmd exactly ie with NUL: at the end it says syntax error.

CoreyMac

Were there any Windows or application updates about the same time as the slow down?

bbarr5179

ASKER

No not that i could see

CoreyMac

Excuse as I might not have been clear enough.

The "netstat" command is one word, not two. The copy commands require you to replace the "\\SRV-IP\Share" piece with the IP address and share name for the machine in question.

If the machine that is the server is using IP 192.168.8.4 and the shared directory is shared as "DATAFILES" then the copy command from a WS would look like this when copying a file called "BIGFILE.DAT":

COPY /B \\192.168.8.4\DATAFILES\BIGFILE.DAT NUL:

Does this help? You can test it on your current PC if you share part of your disk.

Most machines already have the administrative share "C$" or "ADMIN$" available if you are a local
administrator for that system.

A "NET SHARE" command run locally will show what is shared for a system.

bbarr5179

ASKER

No probs thanks for that will try as soon as I'm in the office again which will be Monday eve prob rushed off my feet this weekend. Thanks again

bbarr5179

ASKER

Sorry havent posted for awhile have been really busy. Am back in the office on Sat will run the test then.

CoreyMac

NP. Have a good week.

bbarr5179

ASKER

Have done some of the tests however couldn't get them all to run succesfully will post the results when I get home

bbarr5179

ASKER

Have done some of the tests however couldn't get them all to run succesfully will post the results when I get home

CoreyMac

Thanks.

bbarr5179

ASKER

Hi sorry haven't posted the netstat results yet had bit of a disaster will post them later tonight. Just to let you know I set up a virtual network with vmware of 2 machines and put ppm and the data on one of them and shared it so the other could access it and ppm ran slowly when one of the machines was in patient details screen exactly the same infact almost to the second as the domain in the office. So that eliminates the cableing the switch and the server I think really it only is the application however I will still post the results of the netstat though

CoreyMac

Does the application use an Access frontend database where each users data and forms are local, but the shared data is in another? Or is everything in one shared set of files?

bbarr5179

ASKER

I think that it uses an access front end and the data is in a remote location ie server. Also another thing has happened ppm have given me an updated accde file to install this is local on each machine this replaces the old one and has solved the issue they tell me that it opens the files In a different way and that has solved the problem now they all open quickly regardless of who's in what.

CoreyMac

That sounds great. It would be very useful to the others here if what they did could be shared with the group. I know there has been an awful lot of time invested by you and others to resolve their programming issue

CoreyMac

It could be that all your effort to resolve the problem bade it clear to them that they had an issue to take care of. That may have been what motivated them to re-design the application file/open method.

I would think your users would be very happy now with the new changes.

CoreyMac

I would not agree with having the problem deleted.

There is a very considerable amount of time spent in helping the user to analyze the problem and test various solutions. (quite literally several hours) There were apparently some issues with the network/workstation configurations even though the feedback from the user was minimal. The steps involved would be the same whether the user comes back to acknowledge the results or not and would be useful to another user.

All of this could assit another user with similar issues. The ulimate answer after eliminating the various connectivity issues was to prove that the application was poorly written and was it's own worst enemy. Once that was explained to the vendor and they re-worked the program, the problem was finally solved. (apparently anyway)

CoreyMac

I don't care about the points, just the information about the testing and reconfiguration of the workstations, server and cabling.

The comment from the author 02/09/10 07:30 AM indicates the last piece of the problem was solved when the vendor finally changed their application. That is likely why the author is not coming back.

For him, the issue was resolved. It is not about credit for the solution, just the various steps that are involved in getting to the point where the vendor's application was the most likely remaining problem. Once the vendor was conviced that everything else had been done, then they decided to take a look at their application and fix it.

Justin Owens

I agree with CoreyMac. There is no way I am deserving of or wanting any points for this question, but I still think it has valuable troubleshooting info for others with similar problems. My preference would be to close the Question, but keep it in the Knowledge Base.

ASKER CERTIFIED SOLUTION

ee_auto

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial