asked on

Memory consumption on Solaris-10

Hello,
We have two Solaris-10 x86 servers with below details, both are on VMWare -
bad-server - 4 GB memory and 2 vCPU
good-server - 16 GM memory and 4 vCPU

There is an application running on both servers, which query something and read from a file. Queries are failing on bad-server, while good-server is fine. If I check sar reports on bad-server, total memory utilization is never going higher than 30%.

Upon further investigation, we see that once PID 1243 (this is process id of that application) consumes 900 MB of RSS (from prstat output), queries starts failing. We attached that PID with truss and found below line

/1243:   1.6180 open("/export/correctaddress/data/ltravel.wrk", O_RDONLY) = 23

Open in new window

It takes more than one second, and this fails the query, while on good-server, it takes around 0.030 seconds. When this happens, application should be restarted and then total RSS would be 100 MB. After couple of hours, it would be 300 MB, then 500 MB and once it will hit near 900 MB, queries will again fail. This makes application has to be started every 7-8 hours. Can somebody explain this, when memory consumption is never reaching over 30% ? We would have increased its CPU and memory to match good server, but it should tell us, if it crossing threshold value.
Thanks in advance.

Joseph Gan

Are both servers in the same cluster on VMware?

Jay Pe

ASKER

They are on same cluster, but different Storage group. I got confirmation from VMWare admin that storage policy and setting are same for both datastores.

ASKER CERTIFIED SOLUTION

Joseph Gan

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Jay Pe

ASKER

I did a write test and read test on both servers with "time dd ....... ..... ...." command and both are showing us almost same time to execute complete (when there is no issue, i.e. when RSS is less than 800 MB). Didn't got chance to run and test dd during time of problem.
So, I was not able to prove, if it is a storage issue.

SOLUTION

Joseph Gan

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Jay Pe

ASKER

I just checked, there are 5 VMs in each datastores. At VMWare level, as well as storage level, I do not see peak in memory or CPU utilizing peaking in past 7 days. Both datastores are coming from same storage, with same policies applied on both.
I can dig more, or probably can open case with Storage vendor. But before that, I would like to see, if there is no issue on OS side, which is not the case right now.
How will I explain that, once it will reach to certain memory utilization, application starts taking longer time in opening one file ?

Joseph Gan

Check patch level between the two. Your VMWare admin should be able to show the storage performance when your server had issues. Those 5 VMs in each datastores are not identical, so the performance are random.

Jay Pe

ASKER

Both servers are at kernel level 150401-13, update 11.
I will get both storage checked again tomorrow morning and then will update you.

Joseph Gan

Tomorrow is weekend here :)

Jay Pe

ASKER

We have one more day to work :-)
You can check that, once you are back
Have a nice weekend :-)

Joseph Gan

Author has checked, and has no more questions.