Link to home
Start Free TrialLog in
Avatar of Jay Pe
Jay Pe

asked on

Memory consumption on Solaris-10

Hello,
We have two Solaris-10 x86 servers with below details, both are on VMWare -
bad-server - 4 GB memory and 2 vCPU
good-server - 16 GM memory and 4 vCPU

There is an application running on both servers, which query something and read from a file. Queries are failing on bad-server, while good-server is fine. If I check sar reports on bad-server, total memory utilization is never going higher than 30%.

Upon further investigation, we see that once PID 1243 (this is process id of that application) consumes 900 MB of RSS (from prstat output), queries starts failing. We attached that PID with truss and found below line
/1243:   1.6180 open("/export/correctaddress/data/ltravel.wrk", O_RDONLY) = 23

Open in new window

It takes more than one second, and this fails the query, while on good-server, it takes around 0.030 seconds. When this happens, application should be restarted and then total RSS would be 100 MB. After couple of hours, it would be 300 MB, then 500 MB and once it will hit near 900 MB, queries will again fail. This makes application has to be started every 7-8 hours. Can somebody explain this, when memory consumption is never reaching over 30% ? We would have increased its CPU and memory to match good server, but it should tell us, if it crossing threshold value.
Thanks in advance.
Avatar of Joseph Gan
Joseph Gan
Flag of Australia image

Are both servers in the same cluster on VMware?
Avatar of Jay Pe
Jay Pe

ASKER

They are on same cluster, but different Storage group. I got confirmation from VMWare admin that storage policy and setting are same for both datastores.
ASKER CERTIFIED SOLUTION
Avatar of Joseph Gan
Joseph Gan
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Jay Pe

ASKER

I did a write test and read test on both servers with "time dd ....... ..... ...." command and both are showing us almost same time to execute complete (when there is no issue, i.e. when RSS is less than 800 MB). Didn't got chance to run and test dd during time of problem.
So, I was not able to prove, if it is a storage issue.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Jay Pe

ASKER

I just checked, there are 5 VMs in each datastores. At VMWare level, as well as storage level, I do not see peak in memory or CPU utilizing peaking in past 7 days. Both datastores are coming from same storage, with same policies applied on both.
I can dig more, or probably can open case with Storage vendor. But before that, I would like to see, if there is no issue on OS side, which is not the case right now.
How will I explain that, once it will reach to certain memory utilization, application starts taking longer time in opening one file ?
Check patch level between the two. Your VMWare admin should be able to show the storage performance when your server had issues. Those 5 VMs in each datastores are not identical, so the performance are random.
Avatar of Jay Pe

ASKER

Both servers are at kernel level 150401-13, update 11.
I will get both storage checked again tomorrow morning and then will update you.
Tomorrow is weekend here :)
Avatar of Jay Pe

ASKER

We have one more day to work :-)
You can check that, once you are back
Have a nice weekend :-)
Author has checked, and has no more questions.