TCP Time Wait - Higher on Ubuntu 18

I have been running Ubuntu 14 with custom tcp tuning parameters for a couple years.  I applied, via puppet, all of the same tuning parameters, but on Ubuntu 18, my TCP Time Wait is very high.  What is the best method of finding the source of this high TCP Time Wait?  The process that is using the tcp connections is a java application.

Please let me know any other information I should provide.

Graph on the left is Ubuntu 18.04, right is Ubuntu 14.04

Thank you,

Reade
tcp-time-wait.png
Reade TaylorAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

David FavorLinux/LXD/WordPress/Hosting SavantCommented:
This is almost always due to broken application code.

The purpose of TIME_WAIT is to consume the final ACK sent by the application.

If the application is broken, then the final ACK is never sent + TIME_WAIT state sockets pile up.

Broken == Application terminates incorrectly, either an error or success termination where socket isn't closed correctly.

Fix: Review your application code + change code so all exits correctly tear down sockets using close.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Reade TaylorAuthor Commented:
Hi.  Thanks for your response.  The developer is reviewing to confirm.  In Ubuntu 14.04, the pre 4.12 kernel supported net.ipv4.tcp_tw_recycle, which was handling this differently than it is handled now.  So it appears to me that the TIME_WAIT sockets are piling up, but eventually closed due to the 60s timeout.

Reade
David FavorLinux/LXD/WordPress/Hosting SavantCommented:
You're welcome!

The TCP subsystem at Kernel 3.18 had around 30% of code rewritten. By the time Kernel-4.20 released, this amount likely is approaching 60% or more

There are 2x general ways to fix this problem.

1) Bandage over the problem using Kernel tunings. This approach will always fail again in the future, either from normal traffic increases or Kernel code rewrites.

2) Fix code causing problem. This approach... well... actually fixes the problem, so there's no reason to play games with Kernel tuning.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.