Solved

Application upgrade on SQL 2014 is crawling.  Processes frequently suspend with PAGEIOLATCH_SH wait type.

Posted on 2016-09-10
6
145 Views
Last Modified: 2016-09-14
CONFIGURATION:
SQL 2014 Enterprise installed on Server 2012 R2
8 cpu core's & 24 gb mem
SQL instance is dedicated to ONE application database.  DB is 210 gb in size.
6 tempdb’s spread across C:\, D:\, E:\, each 12.5 gb for a total of 75 gb combined.  Trace flag 1117 applied.
System logs and application log stored on D:\
System DB’s (master, model, msdb) and application DB stored on E:\

BACKGROUND:
We have a third party enterprise web application connected to a SQL database.  The objective is to upgrade the application to a newer version – this process executes a series of SQL scripts against the database.  

We made a copy of the production DB in our development environment.  Dev architecture & resources are identical to Prod.  The dev upgrade completed successfully in 6.5 hours.

ISSUE:
The application upgrade is underway in production.  It appears to be moving much slower than what we encountered in dev/test.  Using SQL Activity Monitor we see our commands routinely become SUSPENDED with a wait type PAGEIOLATCH_SH.  
 Activity Monitor
Using Resource Monitor we see E:\ has a high amount of disk activity, specifically the .mdf data file of our application.
Resource Monitor
The prod upgrade is moving much slower than dev/test, at this rate it will take 10-12 hours to complete – double the amount of time in dev/test.  We did not observe so many SUSPENDED processes during dev/test.

I’ve been researching PAGEIOLATCH_SH on the www but so far nothing has led us to the culprit.  As a test we increased memory in the prod SQL server from 24 gb to 32 gb but did not receive noticeable performance gains.

QUESTION:
What’s the best way to track down the ‘wait type’ we’re experiencing and how to best identify a solution?

Changing (or optimizing) the SQL queries executed during upgrade IS NOT an option.  We expect the same performance in prod as observed in dev.  

Thanks in advance for your expertise!
0
Comment
Question by:bmsande
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 14

Assisted Solution

by:Megan Brooks
Megan Brooks earned 125 total points
ID: 41793102
PAGEIOLATCH_SH would seem to indicate that you are doing a bunch of I/O. And you are, most likely, depending on just what those upgrade scripts actually do. If they rebuild 200 Gb worth of indexes, for example, that is a lot of I/O.

The resource monitor throughput numbers seem small: 5.5 Mb/sec total reads & writes on the application mdf? You must have quite a queue. That's something like 20 Gb per hour. If the upgrade needs to move ~200 Gb, that could take 10 hours. But it could very well need to move more than just the size of the database, and your estimate of 10-12 hours isn't surprising. So why is the I/O so slow?

My first thought was as to whether all the upgrade scripts were being run, and in the right order, but while a problem there might increase the amount of data to be moved, it shouldn't cut into bare throughput this way. Are you sure dev and prod are identical from an I/O throughput perspective?

The amount of memory and number of cores isn't going to matter so much when I/O is the bottleneck. Are both systems on the same SAN? How busy is the SAN? Is it shared with other servers? Are the CPUs virtualized? Is there contention there?
0
 
LVL 13

Accepted Solution

by:
Nakul Vachhrajani earned 250 total points
ID: 41793271
What is the value of "max degree of parallelism" on both the environments (Dev v/s Prod)?

It is possible that the production environments has threads running in parallel and causing a load on the I/O.

Also, are you seeing bottlenecks on the SAN? When we were prototyping a large upgrade in our development environment, we had seen PAGEIOLATCH_SH when the disks on our VM hosts were busy serving requests from lot of guests simultaneously. We shutdown VMs that were not critical and that helped improve the underlying disk performance by freeing up the queues.

--Turn on advanced options
sp_configure 'show advanced options',1
RECONFIGURE
GO

--Check the MAXDOP value 
sp_configure 'max degree of parallelism'
GO

--Turn off advanced options
sp_configure 'show advanced options',0
RECONFIGURE
GO

Open in new window

0
 
LVL 42

Expert Comment

by:Eugene Z
ID: 41793295
is it VM or physical box ?
Check Vendor's doc for their "best practice" sql DB configuration..
-------------------------------------

for start: you must review and adjust your server setup


#0 did you install sp1 for sql 2014? install if vendors are ok
#1 "8 cpu core's & 24 gb mem" -- what is the Sql server min- Max memory? POS  needs (~8GB RAM)
#2 after you restored the DB on this  new server did you run DB maints:  CheckDB; reindex: update stats, etc?
if you did not -  try to run ..
#3 check with your Sysadmins if the drives for used DB are fast
#4 why do you spread  "6 tempdb’s across C:\, D:\, E:\, each 12.5 gb for a total of 75 gb combined. ?

try to get dedicated 2 drivers 1 for data 1 for log and set them there instead -- make 4 data files same size and add up to 8 if needed..
#5  did you have issues o n this box with tempdb? if not keep  Trace flag 1117  out of the picture (usually t-sql code optimization -extra indexes helps more than this flag)

#6 (if you can) try move User DB -data log files to some fast dedicated drives instead of E: drive where your system DBs ..
0
Resolve Critical IT Incidents Fast

If your data, services or processes become compromised, your organization can suffer damage in just minutes and how fast you communicate during a major IT incident is everything. Learn how to immediately identify incidents & best practices to resolve them quickly and effectively.

 
LVL 13

Expert Comment

by:Nakul Vachhrajani
ID: 41793299
Oh, and what is the initial size and growth increment for your log files and tempdb (Dev v/s production)?

If the file has to grow frequently, it would need to frequently request the OS for space. Better to have a large log file with larger growth increments so that you don't keep waiting too frequently.
0
 
LVL 49

Assisted Solution

by:Vitor Montalvão
Vitor Montalvão earned 125 total points
ID: 41793830
8 cpu core's & 24 gb mem
 SQL instance is dedicated to ONE application database.  DB is 210 gb in size.
24GB to support a 210GB DB looks short to me. What is the size of the larger table?
0
 
LVL 13

Expert Comment

by:Nakul Vachhrajani
ID: 41799217
Thank-you! If  you do not mind sharing, can you please share what worked? Was it the MAXDOP configuration or an issue with the SAN?
0

Featured Post

Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
how to make geography query faster?  SQL 7 45
SQL Instance service gone? 5 39
Split string into 3 separate fields 5 22
Data encryption options between SQL DBs 3 32
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question