troubleshooting Question

File Server Design

Avatar of gouldandlamb
gouldandlamb asked on
StorageServer Hardware
5 Comments2 Solutions1097 ViewsLast Modified:
Our current file server infrastructure has a few problems and needs to be redesigned.  Here is the current design:

3 servers with the following specs:
Dell PE1850
2003 Server R2
2x3.0ghz Xeon
4gb RAM
2x36gb SCSI in RAID1 for OS
7x146gb SCSI in RAID5 for file storage (2 servers have 1/2 of PowerVault220s each and third server has 1/2 of HP MSA30 - other half empty)

1 disaster recovery server (offsite) with the following specs:
Dell PE2600
1x2.8ghz Xeon
4gb RAM
2x36gb SCSI in RAID1 for OS
4x300gb SCSI in RAID5 for file storage

All 4 servers are running Microsoft DFS with Replication, which replicates almost 800gb of data amongst 10 DFS shares.  We aren't really using DFSR the way it was intended...  The DFS referrals are only active on 1 server for each share because there are times when multiple people need to access the same file; if that occurs, one user's changes would be lost when the other saved the file.  Therefore the servers are setup with a "load balancing-type" of configuration:
Server1 receives referrals for Shares1 through 4
Server2 receives referrals for Shares5 through 8
Server3 receives referrals for Shares9 and 10
All servers receive replicated data from the other servers.
If any single server/array fails, we can (and have) change the referrals to point to one of the other servers.  And the DR server doesn't have any active referrals... it just receives all of the changed files.

The file storage arrays are using NTFS Compression and have about 55gb of free space, which we will fill within 6 months and DFSR has a 1TB limitation, so it needs to be scrapped altogether.  Also, file access sometimes slows to a crawl.  The servers appear to be running normally even though it might take 30 to 45 seconds to browse the shares.  Then without warning or intervention, the access times improve and things are back to normal.  I think the DFSR processing has something to do with it because there are times when the slowdowns coincide with Event ID 4202 on one of the servers (above the high watermark).  But I'm also concerned that our usage does or doesn't warrant multiple file servers.  We have 250 users accessing as many as 6 or 8 files at a time.  There's a potential for nearly 2000 simultaneous reads/writes at a give time.

So my questions are:
What is a good solution considering a minimum of 1.5TB, high-availability, disaster recovery and high usage?  Are multiple file servers recommended due to usage stats?  A SAN would probably be ideal, but how would it solve the issues besides being able to dynamically allocate additional space?  The preferred solution would involve as little new hardware/software as possible, keeping costs down.

Tape backup is another factor... right now a full backup, which runs on Fridays, is about to exceed two LTo3 tapes and takes about 36 hours.

Thanks.
Join the community to see this answer!
Join our exclusive community to see this answer & millions of others.
Unlock 2 Answers and 5 Comments.
Join the Community
Learn from the best

Network and collaborate with thousands of CTOs, CISOs, and IT Pros rooting for you and your success.

Andrew Hancock - VMware vExpert
See if this solution works for you by signing up for a 7 day free trial.
Unlock 2 Answers and 5 Comments.
Try for 7 days

”The time we save is the biggest benefit of E-E to our team. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange.

-Mike Kapnisakis, Warner Bros