Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Storage Server 2003 crashes copying large file to MD1000

Posted on 2012-08-21
15
Medium Priority
?
928 Views
Last Modified: 2016-11-23
We have a Dell MD1000 Raid 5 array attached to a Dell PowerEdge 2950 running Storage Server 2003 with a PERC 5/E internal card connected with a short SCSI cable.  We can copy large (over 2GB) files to/from the internal hard drives in the 2950 but as soon as we try to copy it to the MD1000 it gets a bit past halfway and locks the server up.  When I mean locks up, it locks hard and we are unable to run the task manager or CTRL-ALT-DEL.  We have to do a hard shutdown to get back into the server.

The Dell server administrator shows that the RAID array, hardware, firmware, drivers are good and there's no errors there or the event logs. Unfortunately when it locks up there's no way for me to see what is going on with the performance logs or processes to start a diagnosis.

All the drivers and firmware are updated to the latest available on Dell's website.  OS is updated with all patches and service packs.  Server is not virtualized and the drives are all NTFS.

Anyone have ideas or suggestions for troubleshooting?
0
Comment
Question by:convergint
  • 6
  • 5
  • 3
  • +1
15 Comments
 
LVL 20

Expert Comment

by:strivoli
ID: 38320116
Does the server run any AV?
0
 
LVL 47

Expert Comment

by:David
ID: 38320233
Some versions of FTP crap out at files > 2GB.  Some file systems (FAT16) can't deal with a file > 2GB. .

Check the specs for the FTP client you are using and whatever file system / kernel / ftp server you are using on the Dell.
0
 
LVL 47

Expert Comment

by:David
ID: 38320245
The LINUX ext2 file system has a 2Gbit hard limit.  You need to use the ext3 or ext4 or another file system if you want to handle larger files.  This is NOT a FTP limit.  It is a hard limit on the largest file you can have.
0
Transaction-level recovery for Oracle database

Veeam Explore for Oracle delivers low RTOs and RPOs with agentless transaction log backup and transaction-level recovery of Oracle databases. You can restore the database to a precise point in time, even to a specific transaction.

 
LVL 20

Expert Comment

by:strivoli
ID: 38320264
dlethe: please read original question before posting your suggestions. Thank you.
0
 
LVL 47

Expert Comment

by:David
ID: 38320318
strivoli - Storage Server 2003 is an appliance, one can create a ext2 file system on it.  One can ftp back and forth.  No indication from the question that this is NOT a file system or FTP limit.
0
 
LVL 20

Expert Comment

by:strivoli
ID: 38320354
dlethe: from original question: "...the drives are all NTFS...". Period.
0
 
LVL 47

Expert Comment

by:David
ID: 38320494
strivoli - respectfully, that has nothing to do if problem is the access method, or if they are using a VM on top of NTFS.  I have a developers agreement with LSI and know the firmware inside and out, (much better than ANYBODY who works for Dell, I assure you) and it is likely going to be something unique to a pool because that differentiates the internal and external drives.  

I did not critique your response (which I could have) Perhaps when you get your first million points in servers, like I have, or even a few million points overall, you will earn the right to second guess me.  Show me the courtesy and respect that I have earned.
0
 
LVL 10

Author Comment

by:convergint
ID: 38320741
Appreciate the comments but the server is only a Microsoft file server.  No FTP, no Linux, no FAT32/16, no other server services except a program called Peersync that replicates files between our offices.  Peersync only replicates at night.  We do have ESET antivirus on it but disabling it makes no difference.

It doesn't matter if we are trying to copy a large file to the MD1000 over the network or locally within the server between a drive and MD1000, it still crashes.  The internal drives are connected with PERC 5/i and the only thing on the PERC 5/E is the MD1000.  We have replaced the batteries on them about half a year ago as well.
0
 
LVL 20

Expert Comment

by:strivoli
ID: 38320829
Peersync replicates at night but the service runs continuously? If it runs continuously, please try STOP the service.
0
 
LVL 47

Expert Comment

by:David
ID: 38320896
Peersync advertises that there is no upper limit, and even says a 30GB file is fine in their KB, but this suspiciously looks like a peersync bug.  I would contact them just to make sure you have latest patches and there isn't some obscure setting you have to make for this to work.

(Or maybe this is a known bug).

It seems rather arbitrary that he problem is limited to the MD1000, so I am questioning whether or not that is really the case, and it is not limited to a specific volume that just happens to be on the MD1000.  

Have you looked at the RAID controller logs?  Best practices for the controller is to do monthly, (i do weekly) data consistency checks on the RAID volumes.  There is a small possibility that old RAID firmware/drivers combined with a multiple failure scenario can lock up the controller, but that is one of those unlikely perfect storm scenarios.

Still, check the support.dell.com site and make sure your firmware+drivers on controller are current as well.  The data consistency check protects against data loss in event of multiple XOR parity errors.
0
 
LVL 13

Expert Comment

by:rhinoceros
ID: 38323855
It seems like your case?
http://serverfault.com/questions/82148/windows-2003-storage-server-hanging-on-large-file-transfers


The large file size transferring will cause network overload. Disabling taskoffload and autotuninglevel will speed up the network performance and solve the network hang issue. It is not related to the default from Microsoft. It depends on your network environment and bandwidth.

More info:
The Microsoft Windows Server 2003 Scalable Networking Pack release
http://support.microsoft.com/kb/912222

Information about the TCP Chimney Offload, Receive Side Scaling, and Network Direct Memory Access features in Windows Server 2008
http://support.microsoft.com/kb/951037
0
 
LVL 10

Author Comment

by:convergint
ID: 38325504
The server itself has internal drives and I can copy any size file all day long between the internal drives without any issues so it is definitely not a network or memory issue.  As soon as I try and copy a large file from the internal drive to the MD1000 directly attached with SCSI it will lock up.

Peersync is not the issue as it crashes with the service disabled.  The firmware and drivers are all up to date, they have been for a while as the server is a couple years old and Dell doesn't seem to be releasing anything new anymore.

Is there some faulty memory buffer on a PERC 5e card that might be causing large files to lock up the server?  This is the only explanation I can guess but have no idea how to test something like this other than swapping cards which is not possible at this time.
0
 
LVL 20

Assisted Solution

by:strivoli
strivoli earned 500 total points
ID: 38341217
I don't know PERC 5e specifically but I know and I've worked with tens of RAID CTRLs. Your idea about the buffer might not be so silly. You could set the CTRL to work with no cache at all. Set it to work with no write or no read or even without both caches and see how it behaves. The cache change usually can be done on-the-fly and does not have side effects excluding time taken for I/O operations which will be longer without cache.
0
 
LVL 47

Accepted Solution

by:
David earned 500 total points
ID: 38342156
The PERC controller has no concept of files, large or small.  It only knows about blocks.  There are read and write cache buffers, but i can't see how they could possibly get saturated with your configuration.  

There is an easy (sorta) way to test.  Take windows out of the equation.   Boot the system to a LINUX USB flash drive, then you can safely do a 100% read test, just to make sure reading works fine.

it would be dd if=/dev/sdb of=/dev/null bs=64k  (This reads from /dev/sdb, which is likely the right device since a usb flash drive will be /dev/sda.   It reads raw blocks from the logical device(s) and throws the data away)

Does it lock up?  If yes, you have a confirmed hardware/firmware issue (unreadable blocks most likely).  If not, then at least you know that reading works fine, and then the solution is most likely going to require analysis by a pro.

There are programmable settings that govern prefetch, retries, and cache buffers (both read and write) in the HDDs.  This particular controller doesn't mess with them.  It assumes they are correct because it is tweaked for Dell, so such tests are disabled because Dell won't sell disks that aren't programmed for these controllers.

If you do not have the Dell-branded disks which are certified for this controller with proper supported firmware, then it is a reasonable possibility that the disks will have to be reprogrammed with proper mode page settings.
0
 
LVL 10

Author Comment

by:convergint
ID: 38420165
We are not Linux people so doing a test like you explained is difficult added to the fact the server is a mission critical server so it is difficult to take down to do an extended test.

We will have to hope it holds out as it is due for an upgrade early next year.  Thanks for everyone's suggestions.
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

this article is a guided solution for most of the common server issues in server hardware tasks we are facing in our routine job works. the topics in the following article covered are, 1) dell hardware raidlevel (Perc) 2) adding HDD 3) how t…
Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

581 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question