Link to home
Start Free TrialLog in
Avatar of gouldandlamb
gouldandlamb

asked on

File Server Design

Our current file server infrastructure has a few problems and needs to be redesigned.  Here is the current design:

3 servers with the following specs:
Dell PE1850
2003 Server R2
2x3.0ghz Xeon
4gb RAM
2x36gb SCSI in RAID1 for OS
7x146gb SCSI in RAID5 for file storage (2 servers have 1/2 of PowerVault220s each and third server has 1/2 of HP MSA30 - other half empty)

1 disaster recovery server (offsite) with the following specs:
Dell PE2600
1x2.8ghz Xeon
4gb RAM
2x36gb SCSI in RAID1 for OS
4x300gb SCSI in RAID5 for file storage

All 4 servers are running Microsoft DFS with Replication, which replicates almost 800gb of data amongst 10 DFS shares.  We aren't really using DFSR the way it was intended...  The DFS referrals are only active on 1 server for each share because there are times when multiple people need to access the same file; if that occurs, one user's changes would be lost when the other saved the file.  Therefore the servers are setup with a "load balancing-type" of configuration:
Server1 receives referrals for Shares1 through 4
Server2 receives referrals for Shares5 through 8
Server3 receives referrals for Shares9 and 10
All servers receive replicated data from the other servers.
If any single server/array fails, we can (and have) change the referrals to point to one of the other servers.  And the DR server doesn't have any active referrals... it just receives all of the changed files.

The file storage arrays are using NTFS Compression and have about 55gb of free space, which we will fill within 6 months and DFSR has a 1TB limitation, so it needs to be scrapped altogether.  Also, file access sometimes slows to a crawl.  The servers appear to be running normally even though it might take 30 to 45 seconds to browse the shares.  Then without warning or intervention, the access times improve and things are back to normal.  I think the DFSR processing has something to do with it because there are times when the slowdowns coincide with Event ID 4202 on one of the servers (above the high watermark).  But I'm also concerned that our usage does or doesn't warrant multiple file servers.  We have 250 users accessing as many as 6 or 8 files at a time.  There's a potential for nearly 2000 simultaneous reads/writes at a give time.

So my questions are:
What is a good solution considering a minimum of 1.5TB, high-availability, disaster recovery and high usage?  Are multiple file servers recommended due to usage stats?  A SAN would probably be ideal, but how would it solve the issues besides being able to dynamically allocate additional space?  The preferred solution would involve as little new hardware/software as possible, keeping costs down.

Tape backup is another factor... right now a full backup, which runs on Fridays, is about to exceed two LTo3 tapes and takes about 36 hours.

Thanks.
ASKER CERTIFIED SOLUTION
Avatar of dovidmichel
dovidmichel
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of gouldandlamb
gouldandlamb

ASKER

Thanks for the input dovid and John.

Does anyone know the MS recommendations for file server usage?  (Such as, number of users accessing X amount of data)
I've decided to go with two servers (either Dell 2950 or DP DL380) clustered, both connected to an EqualLogic iSCSI array.  A second array will be located offsite for snapshop replication for DR/BC.

As for the tape archiving, I'd like to have one dedicated backup server that will backup the array as well as Exchange, SQL, etc.  I have an LTO4 Autoloader (for array backup) and a single LTO3 (for Exchange, SQL, etc.).  But now I'm concerned about the time required for backups.  Looks like John's suggestion of staggering full backups might be a good idea.
Forced accept.

Computer101
EE Admin