Link to home
Start Free TrialLog in
Avatar of Capt_Ron
Capt_RonFlag for United States of America

asked on

IIS FTP Dropping connection

We have a strange problem that is driving our networking and mainframe guys crazy.

Our Mainframe FTPs files to a Windows 2008 R2 server running IIS.  The Mainframe initiates a FTP connection and starts copying files to the FTP server.  At some random point the IIS server times out during a mainframe DIR command and resets the connection.  The problem is not consistent nor is it predictable.  I've set the connection timeout settings to 120 seconds in IIS, but the timeouts are happening in less than that.

I know this is not alot to go on, but does anyone have an idea where we can look?  All indicators point to IIS as the problem.
Thanks
Ron
SOLUTION
Avatar of Jackie Man
Jackie Man
Flag of Hong Kong image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Capt_Ron

ASKER

Thank you.
I made a modification to the Data Channel TImeout and we'll see if it makes a difference.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
THey are useing z\OS and they have enabled teh keepalive, but the disconnecting is still happening.

I set the following timeouts to 120 seconds:
Control Channel
Data Channel
Anonymous

Not sure where to look now.

Their process is:
Open connection
Copy File
Close Connection

For each file they transfer.
Any firewalls between them?
Yes but the traffic passes straight through.
The firewall/network team has been working on the issue and they can tell what is happening but not why.
Since the Windows side seems to be the side terminating the connection, I would run a packet trace from that side to see how long it is actually waiting until it sends the reset.

How many files are in the directory that the dir is being done against?

Is the directory physically on the ftp server, or is it a mount on another server?

If a mount point which version of SMB?  Older versions of smb are really inefficient when doing directory listings.  First step is to get a list of all files, second step is to get the file information (last updated date, create date, attributes) for each individual file separately.  So if you have 1,000 files, there are at least 1,001 network requests.

Does the mainframe side have to do a dir?  Can it do a ls instead?  The difference is the ls just returns file names, whereas the dir returns file names plus file info.
OK.
The windows server is waiting for a max of 2 minutes (corresponds to the timeout valuse I set)
Each DIR command is only looking at a few files (5-10 at most)
Directory is physically on the server
I have not received a response from the MF team regarding the DIR / LS command.
We have a meeting early next week with all the parties in the room to discuss.  I'll see if I can get more information before then and post it.
Thanks
Ron
Have you checked the networking hardware, such as routers and switches?
Yes.
Our networking team has been looking at it from their perspective.  They are using an app called Opnet and Panorama to monitor IIS, Network in and out, and router/switch traffic all from mainframe to the FTP server.

The meeting never happened unfortunately.  The MF team has seen the number of instances of failure go down to once every week and they closed the ticket accepting that.   I'd still like to figure this out though.
Thanks
Ron
Any thing in the IIS log?
The IIS logs show timing out waiting for a command from the mainframe.
The Mainframe logs show a timeout from the FTP server.
The network logs show all is good.
O.K, there are only a couple of possibilities when both sides say they are waiting on the other:

1) The mainframe did not really send the command.
2) Windows received it, but for some reason does not know it.
3) Firewall ate it.

Guess which one I'm voting for?  #3.

When the mainframe says it sent something, it sent it.  

The majority of the time if Windows said it did not receive something, it did not receive it.

When something disappears on the network, its typically a firewall.

Now you just need to convince somebody to run a packet trace on both sides of the firewall, the mainframe side and the Windows size.
LOL.

I voted for #3 too.  But the "firewall" guys say that all internal traffic bypasses the firewall.  We are having them run a trace on both sides anyway, but the probelms have been scarce now that we increased the timeouts.
I'll post back when the problem happens again and hopefully I'll get the MF, and trace logs to compare with the IIS logs.

Thanks again
Ron