Can I test QLogic HBA outside of OS?

Hello Experts. My company provides hardware support for out of warranty and legacy
systems. My area is PowerEdge and ProLiant rack & blade servers. One of my customers
has a HP ProLiant DL380 G2 that boots to W2K3 from their EMC VMAX SAN. The server
has two QLogic 2340 FC HBAs in PCI Slot-1 and Slot-3 that are set as Ctlr :1 and
Ctlr :3 respectively in the BIOS Boot Controller Order. The storage controller is not
set for boot and the HP Integrated PCI IDE Controller is set as Ctlr :2.

The customer says that the server no longer can boot to the OS (after making some
undisclosed non-hardware changes) and they insist that it's due to faulty HBAs but I
suspect it's not though I have no SAN or FC expertise. So, the question is how do
test the HBAs and have something to show that they are working properly? If the HBAs
can detect the fibre switch even if they do not detect any SAN devices is that good
enough to show operability?

I attached reports that they say show the commands they ran on their and the results
that were returned. These I think are supposed to indicate there's no issue on their
end.  Can you please take a look?

The server location is in a caged area in a remote data center with no on-site company
personnel .  Initially I visited the site just to check out the adapter’s configurations.
At the site I first ran Fast!UTIL  by entering Ctrl-Q in POST.
The settings were properly enabled (later confirmed by the storage admin remote
 connection to the server) and the Selectable Boot Settings still displayed the target
WWN but found “No device present” when I chose “Scan Fibre Devices” option.
I tried reseating and swapping the cards and rebooting many times and it did, in fact,
reach the login security screen once but lost it when I was force to reboot.

I left and days later re-visited the site with two replacement HBAs configured to match the
ones installed in the server. Before bringing them to the site I had tested three HBAs in the
Same model server at my lab but from within W2K3 OS and connected to an old spare EMC
DS-4100B Brocade switch using QLogic’s SANsurfer software and the MS Fibre Channel
Information Tool utility basically to determine if the HBAs can see the switch.  (I have
attached the fcinfo output as well.)  Same results except these cards did not have any LUN
targets displayed in the Selectable Boot Settings at all in Fast!UTIL.

Later the customer sent me the following message:
“I updated the Brocade and storage array configuration last night.  I scanned for devices in both HBAs but did not get any device.  I tried rebooting a few times this morning but no luck.  Right now it’s stuck in a PXE boot menu.  Please let me know how you’d like to proceed.”

Can you tell by all of this if this may or may not be a hardware issue?
If I go back to the site, what can I do make that determination without the OS?
I can, however, bring the drive from system and reconfigure theirs to boot from it and use the
utilities I set up on that to see if I can run some tests. (I'm desperate).
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

I would agree with your assessment.

The likely issue is the consequence of changes made.
Could they have retired the DHCP/bootp server that used to start the bootup process on this system?

There is a specific requirement to boot server 2003 from San.
PhillyGeeAuthor Commented:
Thanks, Arnold.  According to the customer, the current setup has been the same since 2012.
The only thing in this situation is to make sure that the SAN properly allocates the LAN that is used to boot the system. The Switches are properly ZONED for the HOST HBAs.
the query from HBA should reflect the resources allocated to it.

Unfortunately, I've heard too many times, "Nothing changed" The issue with those who say that is that they mean nothing is changed with the system that is having issues.  Often a change they do not think affects this system were made.

Figuring out what was going on before this issue arose.
I.e. where they performing something else when this server crashed? i.e. once access to the SAN is blocked, the system would have likely hung/crashed.

Do you or they have documentation about this system?
It either has to have a local drive that boots it part way before switching to the SAN or have the HBAs set as bootable and reflected as the boot device.

Is there a local drive in the server?
Protecting & Securing Your Critical Data

Considering 93 percent of companies file for bankruptcy within 12 months of a disaster that blocked access to their data for 10 days or more, planning for the worst is just smart business. Learn how Acronis Backup integrates security at every stage

PhillyGeeAuthor Commented:
No local drive for boot.  The only boot controllers set up in the BIOS are the HBAs.
Checking tge SaN, fabric fibre swirches and validate the zoning is the only remedy.
Something changed, could be the San to switch cabling, switch error events.

With two connections and both going down with the hba seeing the switch and the switch seeing the hbas, points to zoning/San connection to switch.

It Is a different thing if the lun was corrupted OS, in the information, the hba are either not bootable, or they are not presented with Luns.

What you are saying the bootup process falls through with network boot as the last resort.
Are there other system/s using the San?

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
PhillyGeeAuthor Commented:
I was able to set up a small lab with equipment suitable for testing the HBA adapter’s capability to detect devices in a FC loop using HP StorageWorks appliances that do not require any special setup procedure beyond proper driver software to create LUNS and detect FC devices.

We used an HP StorageWorks MSA1500 Modular Smart Array, an Emulex Fibre Switch and HP StorageWorks MSA20 disk enclosure.  The host server is an HP ProLiant DL380 G3 server, the same model as LA-DMFILE02.

I tested the original cards from the customer's server as well as my company's stock of QLA2340 adapters.  (See the attached with POST & BIOS pictures and SANsurfer “Collect” reports from the host)
They all performed as expected. Thanks for the help, Arnold.
Glad to hear you have someway to resolve that the issue is not the HBA.
I often find the statement, "we did not change anything." As the most frastraing where the items you are responsible for are just a component of the system.
In your case, you as the HBA provider, have to account for misconfiguration/errors/changes on the SAN/SWITChes.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Server Hardware

From novice to tech pro — start learning today.