Memory consumption on Solaris-10

We have two Solaris-10 x86 servers with below details, both are on VMWare -
bad-server - 4 GB memory and 2 vCPU
good-server - 16 GM memory and 4 vCPU

There is an application running on both servers, which query something and read from a file. Queries are failing on bad-server, while good-server is fine. If I check sar reports on bad-server, total memory utilization is never going higher than 30%.

Upon further investigation, we see that once PID 1243 (this is process id of that application) consumes 900 MB of RSS (from prstat output), queries starts failing. We attached that PID with truss and found below line
/1243:   1.6180 open("/export/correctaddress/data/ltravel.wrk", O_RDONLY) = 23

Open in new window

It takes more than one second, and this fails the query, while on good-server, it takes around 0.030 seconds. When this happens, application should be restarted and then total RSS would be 100 MB. After couple of hours, it would be 300 MB, then 500 MB and once it will hit near 900 MB, queries will again fail. This makes application has to be started every 7-8 hours. Can somebody explain this, when memory consumption is never reaching over 30% ? We would have increased its CPU and memory to match good server, but it should tell us, if it crossing threshold value.
Thanks in advance.
Dip ShAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Joseph GanSystem AdminCommented:
Are both servers in the same cluster on VMware?
Dip ShAuthor Commented:
They are on same cluster, but different Storage group. I got confirmation from VMWare admin that storage policy and setting are same for both datastores.
Joseph GanSystem AdminCommented:
Sounds like disk (storage) issue rather than memory issue.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
OWASP Proactive Controls

Learn the most important control and control categories that every architect and developer should include in their projects.

Dip ShAuthor Commented:
I did a write test and read test on both servers with "time dd ....... ..... ...." command and both are showing us almost same time to execute complete (when there is no issue, i.e. when RSS is less than 800 MB). Didn't got chance to run and test dd during time of problem.
So, I was not able to prove, if it is a storage issue.
Joseph GanSystem AdminCommented:
Just to remember, storage groups under VMware are sharing with other Virtual servers. When the problem occurs, there may be many servers using the same storage disks, which caused disk performance issue.
Dip ShAuthor Commented:
I just checked, there are 5 VMs in each datastores. At VMWare level, as well as storage level, I do not see peak in memory or CPU utilizing peaking in past 7 days. Both datastores are coming from same storage, with same policies applied on both.
I can dig more, or probably can open case with Storage vendor. But before that, I would like to see, if there is no issue on OS side, which is not the case right now.
How will I explain that, once it will reach to certain memory utilization, application starts taking longer time in opening one file ?
Joseph GanSystem AdminCommented:
Check patch level between the two. Your VMWare admin should be able to show the storage performance when your server had issues. Those 5 VMs in each datastores are not identical, so the performance are random.
Dip ShAuthor Commented:
Both servers are at kernel level 150401-13, update 11.
I will get both storage checked again tomorrow morning and then will update you.
Joseph GanSystem AdminCommented:
Tomorrow is weekend here :)
Dip ShAuthor Commented:
We have one more day to work :-)
You can check that, once you are back
Have a nice weekend :-)
Joseph GanSystem AdminCommented:
Author has checked, and has no more questions.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Unix OS

From novice to tech pro — start learning today.