Link to home
Start Free TrialLog in
Avatar of cartereverett
cartereverettFlag for United States of America

asked on

Backup Performance and Fibre Channel Zoning

Hi,

We recently acquired an EMC VNX5300 and a FC Tape Library.
The original zoning configuration was not following EMC’s best practice for single HBA zoning so I went ahead and created new zones.
Before the change, one of my backup jobs which used to take 11+ hours went down to 5+ but after creating the new zones my backup jobs is taking again 11+ hours to complete.

To accomplish the backup, I am using a windows 2003 server with Arcserve R16 on it.  This server, BackupServer1 has a dual 8Gb HBA and connects to both FC switches.
The backup job backs up a LUN that it is attached to a Solaris server.  The Solaris server uses a single HBA 4Gb connected to SANSwitch1.

The VNX_SPA, VNX_SPB are the storage processors for the VNX 5300.

The tape library is an (Overland) IBM 3573 FC with dual LTO6 drives.  I have one drive attached to SANSwitch1 and the other attached to SANSwitch2.

The switches are EMC (Brocade) DS-300B 8Gb FC.

I am obviously not a storage person and I just started learning about Fibre Fabric and Zoning.

 Does my new zoning configuration make sense?  
Could it be improve?
Is anything on the way is configured causing my backup issue?

I know there are other things on the variable such as the backup server itself, backup software agent and the library that could be the culprit, but I need to start somewhere.

Thanks in advance for your help.

BEFORE

SANSwitch1 (3 Zones)

VNX_ESX
Members:  ESXi1, ESXi2, ESXi3, VNX_SPA_P2, VNX_SPB_P2
VNX_SOLARIS
Members: BackupServer1, TLTape1, VNX_SPA_P2, VNX_SPB_P2, SolarisServer
VNX_WIN
Members: BackupServer1, TLTape1, VNX_SPA_P2, VNX_SPB_P2

SANSwitch2 (3 Zones)

VNX_ESX
Members:  ESXi1, ESXi2, ESXi3, VNX_SPA_P3, VNX_SPB_P3
BACKUP_ZONE
Members: BackupServer1, DocImagingServer, VNX_SPA_P3, VNX_SPB_P3
VNX_WIN
Members: BackupServer1, TLTape2, VNX_SPA_P3, VNX_SPB_P3

AFTER

SANSwitch1

VNX_ESXi1
Members: ESXi1, VNX_SPA_P2, VNX_SPB_P2
VNX_ESXi2
Members: ESXi2, VNX_SPA_P2, VNX_SPB_P2
VNX_ESXi3
Members: ESXi3, VNX_SPA_P2, VNX_SPB_P2
VNX_SOLARIS
Members: SolarisServer, VNX_SPA_P2, VNX_SPA_P2
VRC_BACKUP
Members: VRCData, BackupServer1_1
BACKUP_TL
Members: BackupServer1_1, TLTape1
VNX_BACKUPSERVER
Members:  BackupServer1_1, VNX_SPA_P2, VNX_SPB_P2
            
SANSwitch2

VNX_ESXi1
Members: ESXi1, VNX_SPA_P3, VNX_SPB_P3
VNX_ESXi2
Members: ESXi2, VNX_SPA_P3, VNX_SPB_P3
VNX_ESXi3
Members: ESXi3, VNX_SPA_P3, VNX_SPB_P3
BACKUP_TL
Members: BackupServer1_2, TLTape2
VNX_BACKUPSERVER
Members:  BackupServer1, VNX_SPA_P3, VNX_SPB_P3
VNX_DOCIMAGINGSRV
Members: Poseidon, VNX_SPA_P3, VNX_SPB_P3
DOCIMGSRV_BACKUPSERVER
Members: DocImagingServer, BackupServer1_2
ASKER CERTIFIED SOLUTION
Avatar of Duncan Meyers
Duncan Meyers
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Before the change, one of my backup jobs which used to take 11+ hours went down to 5+ but after creating the new zones my backup jobs is taking again 11+ hours to complete.

What happened to make it go down from 11+ to 5+ hours BEFORE you made the change
Avatar of cartereverett

ASKER

When it used to take 11+ hours the LUN that I was backing up was on an old iSCSI SAN.  When I moved the data to the new SAN and I did the backup, it went from 11 to 6 hours.  
So the change was the new SAN and on my initial tests zoning was configured no following single HBA zoning best practices.

Something very strange happened last night.  The AC unit in our server room stopped cooling and since the ETA for the technician was 30, to avoid damage to the unit I completely shutdown the VNX5300 including the additional DAEs.  The unit was off for over an hour.
I did my regular backups last night and the backup went back to 6 hours.
Now I'm really confused...
Meyersd,

Thanks for answer.  Not to question your suggestion, but more to try to understand it...why should the switches be cross connected different?

I attached a diagram of how they are connected.
It's EMC best practice. It's to protect against a very rare fibre loop failure condition. Port 2 on both SPs are logically on the same fibre loop.

I don't think the diagram came through.

Can you post more information about the backup job? How big is it? How is the SAN disk configured that the backup job is using?  Is the backup going straght to tape or are you doing disk to disk to tape?
Thanks for the explanation.  I created the new zones and now I have 1:1 single hba zoning configuration.

The backup job is about 1.3TB.  I created a 2TB LUN on Pool1.  Pool1 has two tiers (Performance (Raid 5 (4+1)) and Capacity (Raid 6 (6+2)).  
There was an initial full backup created.  Now a daily differential backup gets written to the LUN.  The differential happens on the Solaris box.  That portion takes about an hour to complete.  I then do a full backup of everything on the LUN to tape.

One thing I forgot to mention is that I mapped a LUN to a Windows server and it is showing twice (see attached picture) and some other DGC LUNZ are showing with red X’s.  
On the windows server I have a dual HBA connected to each switch.
Duplicate-LUNs.jpg
Download PowerPath from Powerlink (https://powerlink.emc.com) and install it to resolve that problem. PowerPath handles path balancing and multi-path I/O. You can run it in unlicensed mode, but I'd recommend purchasing a license if this server is likely to need good storage performance.

You can also use Symantec Storage Foundation Basic as a freebie, but it is limited to two processors and 4 volumes under management. You can download it from the Symantec website.
Try writing the backup directly to tape. If  you've got LTO-4 or LTO-5 you should be able to get over 400GB/hour
It is kind of cumbersome the way it works.  The Solaris server has an application which is vendor supported.  The application uses Oracle for the DB.
The vendor via the application provides a way to do the backup.  I can either use to tape or to disk.  
Either way will take the Oracle DB offline during the backup and bring them back online after the backup.  If I do a backup to tape the application does a full backup to tape which takes the Oracle DB offline during the 6+ or 11+ hours.  We can’t have the DB offline for more than two hours so the only way to accomplish this is by using the “To disk” option which does a differential backup to disk and it only take an hour to complete.  I then do my full backup to tape from a Windows server.

So I can’t really do the backup directly to tape.

Hopefully I’m making sense…
by the way, I have LTO6 TL
Just a thought: you can do a Oracle RMAN backup direct to a Data Domain box with partial source-based dedupe - that would really fly.

For the pool that you're writing to: how much RAID 5 and how much RAID 6 disk do you have?
And what tiering policy are you using?
3.5TB (raid 5)
32TB (raid 6)

"Start high then Auto-tier" policy
If you zone the tape library to the Solaris server, can you have the Solaris server dump its disk-based data direct to tape?
Thanks! Glad I could help.