Link to home
Start Free TrialLog in
Avatar of amigan_99
amigan_99Flag for United States of America

asked on

NetworkEng Needs DBA Advice MS SQL Performance

So the consultant DBA has been blaming the network for months to management for high latency of writes to his DB. The VMWare datastore latencies from the host and VMs were all low like 2ms. No errors, no drops. Today I finally got to see what he was referring to. All of his databases have decent write latency except for one. The DBs on the same LUN all do fine except this one. Anyhow I think it's up to him to figure it out. BUT - I figured I throw the stats on the MS SQL database up here to the experts DBAs. Is there anything about these stats that looks out
of bed? Or any guess as to why one DB would show 64ms write latencies while the 26 others are between 1ms and 11ms for write latency? This is the biggest of the DBs - the others are more like 200GB. They have nowhere near the Total IO Write Stalls figure.Perhaps those could be from multiple users trying to write at the same time? There are probably 1500 or so users accessing it a day. DBA opinions sought!

Database Name ScubaPro
Logical File Name ScubaPro [ROWS]
Drive      E:
Size (GB) 857.99      
Total IO Read Stall  130,522,647
Total Reads      42,706,650
GB Read         1,167
Avg Read Stall (ms) 3      
Max Rec Read Stall Avg 30
Total IO Write Stall  2,711,425,650
Total Writes  41,904,190
GB Written 863.5
Avg Write Latency/Stall (ms) 64
Max Rec Write Stall Avg 30
Physical File Name  E:\MSSQLData\ScubaPro.mdf      
Read-Related Wait Stat  PAGEIOLATCH*
Write-Related Wait Stat  ASYNC_IO_COMPLETION
Sample Time      16:03.5
SOLUTION
Avatar of arnold
arnold
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of amigan_99

ASKER

Wow - what a thoughtful reply. Ok ' I'll give it a go from my neteng perspective. I've only heard some of these DB terms
in passing.

Do you have vcsa in the environment from which you manage the VMa, or do you use vsphere client.

A: vSphere Client

Who setup the VMware host/VMs.

A: The SysEng team. I'm new to the company so most was setup years before. But this database is on new hardware.

Writes delay on a single database, storage setup, all db's stored on the same location?

A: I believe some of these DBs are on different LUNs. But as I mentioned some are on the same LUN as the problem DB.

Database recovery model, DB properties, option full, simple or bulk?

A: No sure. I would guess that it's full because it's critical data.

VM resource prioritization, distribution. Granting VM with the SQL server a higher prioritization on CPU, storage and memory may help.

A: This is a prioritized as can be I think. Special server dedicated to critical app and this DB server and failover together if anything moves.

Using SQL tunning, you can capture a sample of queries to then analyze to see if there are indexes that if added could help?

A: Good thought.

How much ram us allocated to the VM with how many CPUs? 4,6?  

A: 512GB RAM and 21 CPUs  

VM's computer properties, advanced system settings, performance shoukd clear out asterisk and set to prioritize for performance, advanced
Prioritization should be for applications versus background.

A: Will ask the SysEng's about this.

Page file size? Fixed size or Windows managed?

A: Haven't ventured onto the box. But may be able to find out.

On the SQL server is it set to boost CPU?

A: Is that a VMWare setting?

If the database in question is rarely used, paging might

A: This database has 1500 or more users every day. Pounded.

Free space, amount of data written?

A: About 1TB free space - 860GB approx written and is approx the same as the file size. That's in the info I pasted.

Ref to prior question about recovery mode, under file/file group at what rate is the DB file grow settings,
is it percentage of DB file size or is fixed amounts of MB or GB?

A: Don't know.

Same question for the transaction log file if in full/bulk recovery mode.

A: Don't know. I know log file growth is a real problem. Somehow they keep running out of space and expanding.

If it is in a fixed amount, that might be the issue, on write the SQL server is request ion additional space, as it is
200GB each write triggers an additional space allocation. Changing it to 5% which will mean it will request 10GB and
endure a hit once while subsequent shoukd ho faster until another storage allocation is needed.You could compare this
DBs settings to the others......

A: I'll look into it with the DBAs. Interesting avenue.

Sounds a bit strange that it will be blamed on the network before the VM resources are blamed.

A: It was annoying and I finally said enough's enough and dug deep into the perf stats - exporting them and
doing my own analysis for the absolute values. And now I can prove perf is good from data store to the
target - low latency and 0 errors and 0 drops.

Is the datastore on the Nas/San  iscsi?

A: iSCSI - UCS with B200 M5 blade, 40 Gigs FI to core and to storage array. There is no congestion.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks again for the great analysis. Not sure I understand it all. But I don't see how you can say there are 55GB writes
in an hour as the entire db is 800-and something GB. Did you mean something else? Anyhow love all the ideas especially on the file growth process. I wasn't aware of that aspect.


Which VMware version?

A: 6.7

Answers to question such as recovery model criticality does not a definitive
that people will setup full for a point in time recovery.

A: I'll try to discover. It'll be funny when I start laying these pointed questions at them. :-)

DB and transaction log backup..

Files growth settings are important, based on your stats. You have 55GB writes per hour.
A 65ms gang suggests a request for additional storage for the DB file.

A: That's really intersting. So at the Datastore r/w latency measures in monitoring/advanced
- about once an hour I'll see a data point that's several times the average. Out of 180
data points in an hours one might be say 50ms while the rest are 10, 5, 1 - even less. I've
been racking my head about what could cause these once or so an our spikes. An allocation
of space request sounds mighty intriguing.

Is this DB being setup as a warehose sata storage into which data is being pumped in!

A: I don't know what this is.


How much ram in the physical box specs?

A: Will have to look up. But we're definitely constrained for phy RAM.

Access to the iscsi LUN over a separate network feed?

A: No. This is a trunk over dual 40Gbps port channels. But utilization is rarely even 10Gbps
in total over the 80Gbps of capacity for everything.

Network connections use Intel or vmware's native network adapter.
Intel is. An emulator that adds overhead.

A: The VMWare VMNic connect to Cisco UCS VNIC and are mapped to fabric interconnect vEth interfaces.

VM prioritization has to be check, login into the host, look at performance

A: How do I check that? I have access to vCenter and the CLI.

See resource distribution among the V s
Are they equally weighed for CPU, memory, storage?

A: So there are not many VMs on this host. Maybe just four or so. Everything is done give
this all the resource it could possibly ever need.

512Gb allocated to the VM would suggest on memory it has a higher prioritization.
21 vcpus ?

A: Yup 21

Your read/write ration seems too close to a one to one, meaning almost the same
amount of data read is written.Meaning there is a significant amount being written
as compared ... I.e data entry of new info.
Over a 16 hour period.


On an 800+gb DB, do you have maintenance such as rebuilding/reorganizing indexes on a monthly, basis?
Using Ola h's script. De fragmenting ..

If you use vcenter, look through the host there is the information
Host memory, CPU count.
Similarly it has pergormance/resource allocation.

The question about network deals with whether the VM settings have a network adapter using Intel which commonly the OS has the drivers for or uses the VMware native adapter which means the adapter 's drivers are installed along with VMware tools.
The Intel/other operates in an emulation...... Intel to VMware translation adapter.

Warehousing meaning a large amount of data stored for longer term storage.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'm loaded for bear for my meeting today thanks to you. Haha. Really much more than I was expected in tossing out such a problem that's bedeviled this company for a long time. I'm going to go through and make a check list of your various thoughts so I have it not only for this MS SQL  problem but others in the future.

THANK YOU
Glad I could help you. I find it always puzzling how the issue gets passed.
Interesting discussion!  Just three points are missing:
Does the SQL Server use some replication or HADR setup for given DB?
How many indexes are updated in the DB in question comparing to other (faster) DBs?
How many triggers are fired during the update comparing to other DBs?

All the three points could increase the write latency.

Of course, if conditional updates do not have necessary indexes available it also affects the write latency.