Link to home
Start Free TrialLog in
Avatar of AlphaLolz
AlphaLolzFlag for United States of America

asked on

Problems determining how much SQL Server memory is actually in use.

We have an application that uses a SQL Server 2014 SP3 on Windows 2016.  It's a thick client application on Windows 10 and there are batch jobs running on a server against the DB, which run on another Windows 2016 server.

This application is an absolute DB memory hog.  At this time the vendor of the product has had us allocate 400 GB of RAM to the SQL Service for the DB (the server has about 432 GB).

The vendor is going to give us a number of releases with code improvements to reduce memory consumption.  Unfortunately, my DBAs are telling me that there's no way to determine the amount of memory in use from any of the SQL Server or Windows perfmon metrics.  We've engaged an SME from MS Support in their DB team and my DBAs say they're telling them the same.  I'm looking for some set of measures we can get from SCOM or WMI that we can put together in some formula and understand the amount of memory in use at say 15s intervals to determine if their code is improving and the memory is no longer in the hundreds of GB.
ASKER CERTIFIED SOLUTION
Avatar of Scott Pletcher
Scott Pletcher
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If you know the queries that the developers are tuning, you can check the execution plans and capture the memory grants for a particular query. use this as the baseline and compare against the updated query.
Avatar of AlphaLolz

ASKER

We're running in the MS Azure environment where the max an image can be is 432 GB.  Of that we're carving out 400 for the DB Service.    This isn't necessarily by choice, but the vendor requested 500 GB (without really providing any evidence as to why).  At this point, we're into politics and so we're giving them what they've asked for (as closely as possible).

What we expect the problem is that their application is working with large data sets (tens of millions of records at least) and doing updates which incorporate locks.  As this happens, other jobs (doing the same) are getting bottled up behind those with locks and these accumulate until jobs complete.  At least that's our understanding and we can see that during a performance test simulating just 17 users we've had an average of 3600 lock requests/sec (as per SQL Server objects) and a max of > 150,000 lock requests/sec (which seems dubious).  We believe the vendor has never worked with a company of our size and so they've been able to survive with some very poor algorithms and design.  Regardless, they're now trying to figure out how to partition their data.

In the meantime, we still want some sort of SCOM or WMI object we can measure.  These jobs can run for > 4 hours (some > 15 hours), so we really need something automated.
sorry - so far as the second comment goes, this is well beyond just query improvement.
Goodnight

your SQL database, is it installed on a physical or virtual server?
It's on a VM in Azure
It's still SQL, so I suggest my normal approach: review the missing index and index usage stats to verify first and foremost that (1) you have the best clustered index on every table and (2) that you're not missing indexes and (3) whether you need to drop and/or combine certain indexes.  All that reduces overhead and locking.
Indexes and stats are up to date.  I can't share the company I'm at, but our DBA organization is solid.  We're quite familiar with running enormous systems (DBs, SAP, integration, etc.).  Not Google sized, but one of the largest manufacturers in the world.

The issue we're dealing with regarding the DB in question are entirely poor application design.  They are attempting to manipulate updates of tens of millions of records from dozens of independent batch jobs on the same tables with combinations of row and table locking.

The vendor is working on updates to their application (which need to provide several orders of magnitude improvements or their application will be determined to be a failure).  On our part, we want to be able to measure the impact of those improvements (which we hope to start seeing next week).
The app does seem terrible, and typically apps come with their own indexes.

This seems to confirm it:
We believe the vendor has never worked with a company of our size and so they've been able to survive with some very poor algorithms and design.  Regardless, they're now trying to figure out how to partition their data.

Yet, then you say your DBAs have done/corrected the indexing, which seems somewhat contradictory to me.  I consider partitioning part of indexing, but perhaps you all don't.

If you got no value out of the query I posted earlier on memory usage, then I don't see how else I/we could help.  Are you looking for anything else?
also have you tried changing the isolation level to Snapshot. I have also supported a terrible application from Opentext with  poorly designed Indexes, multiple duplicate indexes. we have some improvement after changing the isolation levels.
Don't get me going on OT.  Their Livelink content server overall design is awful (IMO).

I only appreciate the products they have purchased vs. written (although I'm not certain they wrote the content server).  The Archive Server component (which they got buying iXOS) is fine.  VIM is fine.  OMS is fine.

None of these comments except for the first are actually relevant to my question though.  My question is on measuring memory consumption.  It's entirely up to the vendor to improve their product.
So you want to see something the query / command I gave above didn't show?
I'm having delays in getting my DBA to spend the time to use this, however I'm assuming it's right and considering this answer correct and useful.

Thanks