Link to home
Start Free TrialLog in
Avatar of olecg
olecg

asked on

Switching app's after inactivity or browsing file shares extremely slow

Hi all,

We have about 1000 users spread out on different locations connected by high-bandwith links to a datacentral. In the central place we have different resources as Citrix Metaframe server, Fileserver, printservers etc. The scenario and problem is like this:

Al users have a "home" directory on a central server. This server also holds a program share which is used by the Metaframe serverfarm for old rather "stupid" applications that can be run from a fileshare. In addition to the central servers we also have local fileserver on each location containing the location share where the users can store files to share within the "location". The problem is when accessing the home-area, it is extremely slow. Accessing the local fileshare is really fast. OK, lan bandwith is of course better that WAN bandwith, BUT if we try to access another "local" share on a different site (department) across the WAN, it is really fast. Accessing the home area is slow all the time. We believe there is something completely wrong with this central server, and we have done some changes which made some interesting results:

We have a mixed environment with XP clients (only) using the different fileservers in an "ordinary" way. At the same time they also have access to a thin-client environment where they can access published applications as invoicing etc. When a user logs into the a terminalserver and launcehs the invoicing program, it workes perfect. All terminalservers run the "invoicing" application from a share on this particular central server which also holds the different users-home-area's. After a period of inactivity (15-20 minutes), the application freezes when the user tries to get it "on" again. Exaclty the same happens if a user foe examples opens a Word-document and an Excel-sheet from the same server in his / her home-area (directly from the XP client and NOT within the thin-client). My point is when we access whatever from the server it seems to freeze after a period of inactivity.

So, we expected something strange with the server, and we set up a brand new server and moved the different applications to this server on a new share. Changed the publishing properties in Citrix, but the situation was unchanged. The different applications was still freezing after some inactivity. The server holding the home-area's should now have fewer connections / session / open files, but it didn't help. Then we moved some of the applications to a directory on all of the temrinalserver so when launching the published application it was run from the local disk. YES! This is working. Now no one is experiencing the freeze-situation. That's perfect, but we still have the issue with browsing / opening and saving files on the server for the home-areas. We have also moved som eof the users home-area's to another server in the central site, but neither that helped. Again, browsing / opening / saving files on the different location-servers are really fast all the time no matter where you are located in the WAN.
The difference between the central server and all of the decentralized server are the number of session / connections to it. Can this be the issue ?

We have also done the registry fix for stopping the signing of files in the registry. This hepled indeed. Not for one big file, but it was way difference when copying a big fil-structure containing many files. As I write this question, I have also implemented the registry fix for the SizeReqBuf key which is set to 0000ffff. It's too early to see if that helps, but tomorrow will show.

Any help is highly appreciated as we are stressed on a daily basis by the management resolving this problem.

rgs.
olecg
Avatar of harleyjd
harleyjd

"We have also done the registry fix for stopping the signing of files"

I don't think this is the same thing, but I always disable SMB siging in Group Policy at both Domain and Domain Controller levels disable

Microsoft Network Server: Digitally sign communications (always)
Microsoft Network Server: Digitally sign communications (If Client agrees)
Microsoft Network Client: Digitally sign communications (if server agrees)
Domain Member:Digitally encrypt or sign secure channel data (always)

These guys cause more headaches per machine than they can possibly be worth...
Avatar of olecg

ASKER

Have done that in the Domain through GPO and DC's. As I said, copying a filestructure containing a lot of files improved dramatically. Copying just one or two bigger files didn't change that much in time.

BTW: I hold other possibilities also open. At the moment I'm exploring the Browser issues if there are workstations with browser-roles they shouldn't have. Maybe I'll force some registry entries on all machines to set who's gonna be the Master browser and not.
Anyway thanks for the quick reply.

olecg
Ok. I'll think some more about this, but don't wait for me... it might take some while....
Have you attempted to monitor the network traffic? Observe the hits?
What speed is the network bus that the central server is on?
Monitor the network; Maybe the NIC is the issue.
Avatar of olecg

ASKER

We have traced network traffic in all possible ways, but it doesn't show any "strange". All server are IBM HS20 BladeServers, grouped 7 by 7 in each centre. Every centre has both internal and external GBit/s ports and they are linked to a stacked Cisco 3750 module (two of them).
Port utilization is low, so we don't have any capacity problems. NeiIQ endpoint are also installed on all servers and tests show around 700Mbit/s throughput in all directions.
We have also replaced the server completely with a new one, but the same problem is there. FirmWare, NIC-drivers etc. etc. are all brand new. We know about a newer firmware on the he Cisco 3750 modules which fixes a bug where packet below 68bytes can be dropped under certain situations. I don't believe that's the right direction.

The network trace show CIFS protocol and of course SMB as the "main" traffic. We believe there must be something direct or indirect to SMB handling, but I really don't have a clue.
I should also mention that Offline Folders are used on the private area. We are also investigating what happens when we turn off Offline Folders.

rgs.
olecg
Avatar of olecg

ASKER

More input.

Since yesterday, we have done several "panic" actions just to see if SOMETHING affetcs the situation, and I must tell you that something strange is going on:
Thang or freeze situation happens after a period of inactivity or the first time we try to browse the fileshare. A collegue of mine did the following:

Cleared the "Get the DHCP setting from server" and put a DNS from an external ISP there. After that he created a rather long hosts-file containing all servers he communicates with. What happens? Nothing freezes ot hangs. He did his work all day long without a single problem and today he wanted to switch back again. SO he deleted his hosts-file and set the config so he got the DNS setting sfrom DHCP. Rebootet the PC and started to work. After a few minutes of inactivity, Lotus Notes hangs. After that also word hangs with an open document from the infamous fileshare.

SO, ig this has something to do with nameresolution, how can we verify that the setting are correct and the DNS is working as it should? Obviously it doesn't but at the moment everyting seems fine from the server-side.

rgs.
olecg
DCDIAG

run it from each DC to confirm that they are able to talk to one another properly - that's the first indication on DNS errors.

run nslookup from the workstations. Lookup the DC's the Cirtix box, whatever.

Make sure DHCP is issuing the correct IP's for the DNS. Make sure the DNS server accepts dynamic registrations. There's heaps to do.

Avatar of olecg

ASKER

I can't believe what caused the problem. I don't know how many articles and postings I have read about the WebClient service, but that was the problem.
We even tried disabling the WebClient on quite a few PC's, but the users didn't report any change in the "delay".
Now, several days later, we have discovered that most of the testusers browsed fileshares on the local server and not over the WAN. You see, the remote fileshares are behind a firewall, and I discovered the port 80 request to the fileshare, but didn't some days ago. In the trace I see that the client don't get accept or refuse from the server and tries for about 20 seconds before using SMB.
After changing the rule from dropping port 80 to the server to rejecting port 80, the client gets an TCP Connection refused message just in a few milliseconds and carries on with the SMB traffic.

Anyway thanks for suggestions, and I'm really dissapointed in myself not verifying the results from the testusers in more detail.

rgs.
olecg.

PS: I'm new to this "game". What should I do now? Give or not give some points?
Well, in simple terms if one of these suggestions contributed to your answering it you should award a B at least. A's should be a "that was it exaclty, thanks" and C's are generally frowned upon - the worst thing about a C is that it goes in your record, and if someone sees your profile they might not help you out, thinking you're going to give low points.

D's are reserved by the mods, I haven't seen one, but I'm pretty new to this meself.

All that said, It doens't look like we directly contributed to the answer, so you should request a PAQ and refund in the community support area - this will give you your points back, and put the question in to the database for everyone else to  find if need be. https://www.experts-exchange.com/Community_Support/

please read https://www.experts-exchange.com/help.jsp#hs5 before you post. It's pretty easy...

ASKER CERTIFIED SOLUTION
Avatar of GhostMod
GhostMod
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial