Application upgrade on SQL 2014 is crawling.  Processes frequently suspend with PAGEIOLATCH_SH wait type.

Posted on 2016-09-10
Medium Priority
Last Modified: 2016-09-14
SQL 2014 Enterprise installed on Server 2012 R2
8 cpu core's & 24 gb mem
SQL instance is dedicated to ONE application database.  DB is 210 gb in size.
6 tempdb’s spread across C:\, D:\, E:\, each 12.5 gb for a total of 75 gb combined.  Trace flag 1117 applied.
System logs and application log stored on D:\
System DB’s (master, model, msdb) and application DB stored on E:\

We have a third party enterprise web application connected to a SQL database.  The objective is to upgrade the application to a newer version – this process executes a series of SQL scripts against the database.  

We made a copy of the production DB in our development environment.  Dev architecture & resources are identical to Prod.  The dev upgrade completed successfully in 6.5 hours.

The application upgrade is underway in production.  It appears to be moving much slower than what we encountered in dev/test.  Using SQL Activity Monitor we see our commands routinely become SUSPENDED with a wait type PAGEIOLATCH_SH.  
 Activity Monitor
Using Resource Monitor we see E:\ has a high amount of disk activity, specifically the .mdf data file of our application.
Resource Monitor
The prod upgrade is moving much slower than dev/test, at this rate it will take 10-12 hours to complete – double the amount of time in dev/test.  We did not observe so many SUSPENDED processes during dev/test.

I’ve been researching PAGEIOLATCH_SH on the www but so far nothing has led us to the culprit.  As a test we increased memory in the prod SQL server from 24 gb to 32 gb but did not receive noticeable performance gains.

What’s the best way to track down the ‘wait type’ we’re experiencing and how to best identify a solution?

Changing (or optimizing) the SQL queries executed during upgrade IS NOT an option.  We expect the same performance in prod as observed in dev.  

Thanks in advance for your expertise!
Question by:bmsande
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 16

Assisted Solution

by:Megan Brooks
Megan Brooks earned 500 total points
ID: 41793102
PAGEIOLATCH_SH would seem to indicate that you are doing a bunch of I/O. And you are, most likely, depending on just what those upgrade scripts actually do. If they rebuild 200 Gb worth of indexes, for example, that is a lot of I/O.

The resource monitor throughput numbers seem small: 5.5 Mb/sec total reads & writes on the application mdf? You must have quite a queue. That's something like 20 Gb per hour. If the upgrade needs to move ~200 Gb, that could take 10 hours. But it could very well need to move more than just the size of the database, and your estimate of 10-12 hours isn't surprising. So why is the I/O so slow?

My first thought was as to whether all the upgrade scripts were being run, and in the right order, but while a problem there might increase the amount of data to be moved, it shouldn't cut into bare throughput this way. Are you sure dev and prod are identical from an I/O throughput perspective?

The amount of memory and number of cores isn't going to matter so much when I/O is the bottleneck. Are both systems on the same SAN? How busy is the SAN? Is it shared with other servers? Are the CPUs virtualized? Is there contention there?
LVL 14

Accepted Solution

Nakul Vachhrajani earned 1000 total points
ID: 41793271
What is the value of "max degree of parallelism" on both the environments (Dev v/s Prod)?

It is possible that the production environments has threads running in parallel and causing a load on the I/O.

Also, are you seeing bottlenecks on the SAN? When we were prototyping a large upgrade in our development environment, we had seen PAGEIOLATCH_SH when the disks on our VM hosts were busy serving requests from lot of guests simultaneously. We shutdown VMs that were not critical and that helped improve the underlying disk performance by freeing up the queues.

--Turn on advanced options
sp_configure 'show advanced options',1

--Check the MAXDOP value 
sp_configure 'max degree of parallelism'

--Turn off advanced options
sp_configure 'show advanced options',0

Open in new window

LVL 43

Expert Comment

by:Eugene Z
ID: 41793295
is it VM or physical box ?
Check Vendor's doc for their "best practice" sql DB configuration..

for start: you must review and adjust your server setup

#0 did you install sp1 for sql 2014? install if vendors are ok
#1 "8 cpu core's & 24 gb mem" -- what is the Sql server min- Max memory? POS  needs (~8GB RAM)
#2 after you restored the DB on this  new server did you run DB maints:  CheckDB; reindex: update stats, etc?
if you did not -  try to run ..
#3 check with your Sysadmins if the drives for used DB are fast
#4 why do you spread  "6 tempdb’s across C:\, D:\, E:\, each 12.5 gb for a total of 75 gb combined. ?

try to get dedicated 2 drivers 1 for data 1 for log and set them there instead -- make 4 data files same size and add up to 8 if needed..
#5  did you have issues o n this box with tempdb? if not keep  Trace flag 1117  out of the picture (usually t-sql code optimization -extra indexes helps more than this flag)

#6 (if you can) try move User DB -data log files to some fast dedicated drives instead of E: drive where your system DBs ..
NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

LVL 14

Expert Comment

by:Nakul Vachhrajani
ID: 41793299
Oh, and what is the initial size and growth increment for your log files and tempdb (Dev v/s production)?

If the file has to grow frequently, it would need to frequently request the OS for space. Better to have a large log file with larger growth increments so that you don't keep waiting too frequently.
LVL 51

Assisted Solution

by:Vitor Montalvão
Vitor Montalvão earned 500 total points
ID: 41793830
8 cpu core's & 24 gb mem
 SQL instance is dedicated to ONE application database.  DB is 210 gb in size.
24GB to support a 210GB DB looks short to me. What is the size of the larger table?
LVL 14

Expert Comment

by:Nakul Vachhrajani
ID: 41799217
Thank-you! If  you do not mind sharing, can you please share what worked? Was it the MAXDOP configuration or an issue with the SAN?

Featured Post

Free Backup Tool for VMware and Hyper-V

Restore full virtual machine or individual guest files from 19 common file systems directly from the backup file. Schedule VM backups with PowerShell scripts. Set desired time, lean back and let the script to notify you via email upon completion.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Your data is at risk. Probably more today that at any other time in history. There are simply more people with more access to the Web with bad intentions.
Lotus Notes has been used since a very long time as an e-mail client and is very popular because of it's unmatched security. In this article we are going to learn about  RRV Bucket corruption and understand various methods to Fix "RRV Bucket Corrupt…
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question