Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2369
  • Last Modified:

EMC + Emulex HBA + SQL Packet size

Hi,
   I have win2003 srvr with an Emulex Lp1050 going to a EMC Clariion. The Sql packet size is 4096 bytes with the max of 32xxx bytes possible. I want to increase it in SQL. Do I have too and how can I increase the packet size for the HBA and/or EMC so that the increase in SQL can be accomodated? I know how to do in SQl but not on the HBA and EMC.

Thanks.
0
sudipmis
Asked:
sudipmis
  • 7
  • 4
1 Solution
 
Duncan MeyersCommented:
Leave the HBA settings alone! They're optimised for CLARiiONs, and fiddling with the settings can break things pretty comprehensively. The other thing to note is that the SQL block size may be 4096 bytes, but SCSI commands (and hence, fibre channel) don't really care about what's happening at the OS and application level, so changing the FC card settings is pointless. The other thing to note is that a FC payload is 2112 bytes. End of story.

Before you do anything, read this: http://www.emc.com/techlib/abstract.jsp?id=1444&c=US&l=en then go into Navisphere, right click on the top level and select "Properties", then click on the tick box to turn on Statistics Logging. Keep an eye on the following properties: % dirty pages under the cache page, read and write bandwidth for the LUNs that you're interested in.

If you want to have a look at some performance related settings, you can have a look a write cache page size - try setting it to 4K - this aligns the cache page size to you SQL write size. It make for more efficient usage of write cache at the cost of extra processor overhead to manage the write cache. Default page size is 8K. Beware that this may have a negative impact on the performance of other attached hosts.

You can also check the alignment offset. For a Windows or Intel based Linux host, you set the offset to 63 when you create LUN in Navisphere. You can also use dispar.exe (W2K, W2K3) or diskpart.exe (W2K3 SP1) to set the offset when you create the partition in Windows. http://www.microsoft.com/technet/prodtechnol/exchange/guides/StoragePerformance/0e24eb22-fbd5-4536-9cb4-2bd8e98806e7.mspx?mfr=true has more information if your using W2K3 SP1. Google for articles if you're using W2K.

In my experience, most CLARiiON arrays are under utilized, and disc performance is generally not an issue.
0
 
sudipmisAuthor Commented:
meyersd,

  The reason I was inquring about the payload size was to see if the increase would resolve the sql error i recieve during heavy usage of our DB

SQL Error:
SQL Server has encountered 1 occurrence(s) of IO requests taking longer than 15 seconds to complete on file [x:\MSSQL\Data\Data.MDF] in database [Clinical] (12).  The OS file handle is 0x00000408.  The offset of the latest long IO is: 0x0000056dab0000

The drive it is trying to write too is an EMC lun.
Any suggestions?
-Sudip
0
 
Duncan MeyersCommented:
http://support.microsoft.com/?kbid=897284
http://www.microsoft.com/technet/prodtechnol/sql/2000/downloads/default.mspx
http://sqljunkies.com/WebLog/joesack/archive/2005/06.aspx

all discuss the issue you're seeing. You can get a pretty good ideas of how the CLARiiON is handling I/O by looking at the % dirty pages under the cache tab as mentioned above. If the % dirty pages is hitting 100% then you possibly have an array misconfiguration issue. What model CLARiiON is it?

You can also check I/O bandwidth to individual discs as well as to LUNs by looking at the properties pages and checking the statistics tab. Note that you won't get statistics unless you turn statistics gathering on as detailed above.

If the array is running well, then your next step is to investigate the configuration of the SQL database - there are MS tools available to help with this.
 
0
Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

 
sudipmisAuthor Commented:
Thanks. I will check out the links.

EMC suggests using 4k page size for DB app's just as you did but my page size is 8kb. I have a Clariion cx700. Stat logging was disabled so I do not have a dirty page % yet. I will keep you posted.

Thanks.
0
 
Duncan MeyersCommented:
Hi sudipmis,

How's it going?
0
 
sudipmisAuthor Commented:
Hi Meyersd,
   
    I changed the page size from 4kb and 8kb but still kept getting an error. I have narrowed it down to a inhouse process that runs from 1am til 3am. Adding a 2nd hba seems like the solution.
SP A has 17% of dirty pages. All toher stat's look pretty normal.

-Sudip

0
 
Duncan MeyersCommented:
From what you've posted, the CX700 is extremely unlikely to be the culprit here unless you were experiencing forced write cache flushes, and I have to say that I'd be very surprised if a second HBA resolves the issue, as it is unlikely that your server has the I/O bandwidth and processor power to flood the 2Gb FC bus.

One thing you may want to check is that the LUN isn't being trespassed from one SP to the other. Check the SP event logs for messages like "LUN shut down for trespass" during the periods when you're seeing the error. If the LUN is being trespassed without good reason, then you may have a connectivity issue (a damaged FC cable for example) or failover has been incorrectly configured on the CX700.

Finally, the following document describes performance tuning for SQL:
http://msdn2.microsoft.com/en-us/library/ms187830.aspx
0
 
Duncan MeyersCommented:
Another thing to check is that your Emulex cards have been configured correctly. You must set the driver for 'CLARiiON' and PowerPath when you install it. This document has all the details: http://www.emulex.com/ts/docoem/emc/pdfs/win.pdf pp 66-
0
 
Duncan MeyersCommented:
Hi Sudip,

Thanks for the points! I was just wondering what fixed the issue?
0
 
sudipmisAuthor Commented:
Hi Meyersd,
   
     Although, we did not come to a true solution, we were able narrow down to the exact process which caused this error. We run a nightly extraction from our DB. During the process, it creates a temporary view from one of our larger tables (almost 2 million rows) among some other smaller ones. The extensive amount of reads and writes was triggering the events. We stopped the process for a couple of nights and all was quiet. So, we decided to separate that process on to a separate server. Thanks for you excellent help.
-Sudip
0
 
Duncan MeyersCommented:
My pleasure!
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 7
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now