For months now we have a process that transvers a CSV file via an sFTP connection through the public Internet that periodically fails. The only way to get it going again is to log into the server, you don't have to do anything but login.
We have a routine that runs every 15 minutes on a Windows 2012 Server that transfers a CSV file to an sFTP server on the public Internet. We are using a product called GoAnywhere Director. Periodically, and not on any clearly discernible timetable, that job will fail and it will report "connection refused". Initially I thought the problem was on the sFTP server side and out of my control.
The sFTP admin reported no such errors so I began packet captures. I ran captures on my Cisco ASA but never saw any packets coming from the Windows server to the sFTP server. I logged into the server and the next scheduled job was a success. I ran a packet capture on that server but it was all successful, so I left it running for a few days.
After a while it failed again. I logged into the server to troubleshoot, but just as before it started working again. I checked the packet capture and I see where we are sending an ACK but Wireshark reports that the sFTP server is sending a RST, ACK, then there is a spurious retransmission.
I don’t believe that the reset is actually from them, despite the packet capture reporting such. Reason being is that during successful periods the time between the initial SYN and its next packet is 0.036972 seconds while the time between packets during a failure is 0.00795 seconds. Furthermore, the next line device (the firewall) should have seen the packet requests come through but it never sees anything. Please refer to the attachment.
Thinking it was a service provider issue we switched to another provider (I have multiple links to choose from) but the issues continued. Next step was to move the process to another server using a different application (WinSCP) to transmit the file, but it too has the same problem as GoAnywhere Director on Windows 2012. This server is a Windows 2008 R2 (the original was a 2012 R2).
I thought the problem was Explicid Congestion Notification (ECN) so I disabled that on the Windows 2012 server, but that didn't correct my issue.
Right now as the issue occurs, I receive an email about a failed attempt then I log into the server. That works, but that also means I can't take a day off...