Windows File Server boot from SAN issue

Hi I have a HP BL35p class blade booting off an HP 6000 EVA SAN.  
Windows 2003 Enterprise Server Sp2
Qlogic fiber card

This server is a file server with about 2TB of san storage connected to it.  The issue we are having with it is randomly users will be not able to access files during peak usage hours.  Also I will not be able to rdc to it at all and also it complains about not having any available system resources.

I'm upgrading firmware on the SAN switches and SAN itself to hopefully resolve the issue as there is an HP advisory out for that has performance issues with the current firmware we are on for the SAN switches.

Problem is I'm worried that this issue is something else since prior admins have seen this same performance issue on a HP DL 380 with local storage.  That was the first file server and its data was moved to the blade which is now on the SAN.  My worry is that there is an issue with the file structure itself.  Only thing I see is that there are several VOL's that where migrated off of an old Novell server.  Possible something wasn't setup with the data structure which causes stability issues with server 2003.

Any info from people that boot file servers off a SAN or novell data migrations and experience issues like this would be helpful.

Thanks
ryan4496Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Smart_ManCommented:
it is hard to advice a guru. sometimes temon can help teh lion king. i hope my thoughts would be accepted.

i would reallyconcentrate on spoting the bottle-neck. using as many monitoring tools as i can. taskmanager. windows monitoring tools. san utilities ........

it looks liek if it works fine for all teh file in teh usual time and only go bad in peak time then it is perfromance issue.

waiting for your reply
0
2PiFLCommented:
We have the same problem with a similar setup.  HP worked on it for months only to recommend that we not boot from the SAN.  
0
ryan4496Author Commented:
I've use ever tool at my disposal and nothing points to a bottle neck.  eva perf, san surfer, perfmon, task manager.

2PiFL did you have the same symtoms of not being able to rdc and seeing system resource errors in windows.  I also see this error pop up in event viewer. It always preceeds the issue
-----------------
Event Type:      Warning
Event Source:      Srv
Event Category:      None
Event ID:      2012
Date:            1/28/2008
Time:            9:15:42 PM
User:            N/A
Computer:      FileServer01
Description:
While transmitting or receiving data, the server encountered a network error. Occassional errors are expected, but large amounts of these indicate a possible error in your network configuration.  The error status code is contained within the returned data (formatted as Words) and may point you towards the problem.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 00 00 04 00 01 00 54 00   ......T.
0008: 00 00 00 00 dc 07 00 80   ....Ü..€
0010: 00 00 00 00 3d 02 00 c0   ....=..À
0018: 00 00 00 00 00 00 00 00   ........
0020: 00 00 00 00 00 00 00 00   ........
0028: 7a 09 00 00               z...    
0
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

Smart_ManCommented:
i would not go with just not booting from san. i guess it can handle petabytes in a good way.

anyway another monitoring /manageing tool of the san switches maybe more helpfull

http://san-switch-monitoring.qarchive.org/

waiting for your reply
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
2PiFLCommented:
Rdc not working was 1 of the many symptoms/problems including the event log errors you mentioned.  HP brought in several different teams but this thing never really performed the way it should.  We've had it for a year now in a non production environment but are going "live" June 1.  I will be talking to HP this afternoon about the performance problems - I'll pass along any info.
0
theraindogCommented:
I had a problem similar to the one you are talking about, turned out to be a problem with the windows swap file management; solved somehow tweaking with the buffering on disk.
Another solution would have been using a local disk for swap.
Furthermore we had some tweak increasing the BBCREDIT (Buffer to buffer credit) for the hba (emulex not qlogic).
0
ryan4496Author Commented:
Raindog do you mean using a seperate disk for swap?  I just updated my SAN and rebuild the file server on a new blade/new OS.  IF that doesn't fix my issue I think my next step is to buy some internal SAS drives for the Blade and use that for the swap file.
0
2PiFLCommented:
Sorry it took so long to get back to you - HP gave me a 1/2 dozen tests to run under various conditions and they found a lot of errors on the Brocade switches - are you using Brocade?  Apparently ours weren't configured properly.  HP says we need to configure "zoning", which looks to me to be a combination of trunking and vlan config in the cisco world.

I can see having a local disk for swap space making a big difference on boot time, especially if the problem is with the switches.

I'll post back when we config the switches.
0
ryan4496Author Commented:
Yes we have brocade switches as well however from what I see every server has its own zone in my environment.
0
theraindogCommented:
Ryan, i was meaning exactly that, "wasting" a disk just for the swap file. And btw you could also keep a copy of the san disk, you never know.....
Anyway, zoning means you have to create a "VLAN" containing just the WWN (a sort of mac address) of the qlogic hba and the hba from the storage side.
When working with McData hw, i often used an application greatly enhancing the ease of the configuration process. If my memory doesn't fail me it was named EFCM (Enterprise Fabric Connectivity Manager). It allows you to manage the whole connection process; if you have to take care of an extended San, with a lot of hosts connected, a must have tool. Since McData was incorporated from Brocade, the tool was available also for Brocade san switches/directors.
 
Just a hint : you should have something like :
a "Zone" called Whateveryoulikeforexampleserverxybootfromsan
containing two "addresses" e.g.
10000000C953D611 (WWPN of your hba)
5005076304FFC68C (WWPN of the storage hba you are presenting your disk/disks on)

This setup allows you to avoid the chance to "overlook" something when setting up the machine, allowing any machine to see just some disks.
Further configuration could be done using also "masking".
Just like saying to the host hba: you can see anything the switch allows you to see (and mount and use it)
and saying to the storage: present just these disks, on this hba, for THIS host.
So you can just go around plugging in fiber patches without having to worry about screwing something.
should you need further information in setting up zoning / masking, i have quite a few experiences in this field.
This kind of setup allows you also to set up a "dual path" for the disks you are making available, thus reducing the chances of a dead channel.
Just let know if you need further clarification, i should have somewhere some "red books" like ibm call it, to deepen the fabric world knowledge.
0
Smart_ManCommented:
have you looked for san monitoring / configuration tools yet ?

waiting for your reply
0
ryan4496Author Commented:
Raindog.  Thanks for the more info.  Like I said I have zones setup for each server on my SAN so i think I'm good there.  I rebuild the server on another BL35 so we will see if that helps.  My only concern was how the BL35's share the second fiber port on the Blade fiber patch panel.  Right now I just have to wait for the issue to resurface.

Smart Man thanks.  We actually have op manage and I was surprised to see op stor.   Are you happy with it as I think I'm going to buy it to help manage our EVA 6000
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Storage

From novice to tech pro — start learning today.