Link to home
Create AccountLog in
Windows OS

Windows OS

--

Questions

--

Followers

Top Experts

Avatar of Ike Smith
Ike Smith🇺🇸

Is it RAID or is it Heat?

Hello friends,

 

I have a server 2025, brand new, with a RAID card avago 9361-8i running   2 ssd sata drives for the OS in raid 1, and 6 drives in a raid 5 array.

 

I was in the process of using Teracopy to copy a shared folder from their old server to the new raid 5 array which is a just a huge share.  the original copy went ok, but the users were still doing updates to files on the old server, so I was using the teracopy option to only copy newer files, so it barely has to write anything, just check files are not newer….(mind you it's over 4TB of files)

 

During this process, the server went offline,  when i went on site the system was showing a blue screen and both the OS drives were showing RED lights.    in the bios, the RAID card says they were both unconfigured, bad, foreign.     …..i used F2 to restore previous settings in the raid card menu, then i could change the drives back to ‘good’ in the list, and upon reboot, the drives showed as good and i could boot into windows again with no read or write errors at all.  no damage to the 2nd raid 5 array at all…it never lost the raid 5.

 

about a week later the exact same thing happened when i was doing the exact same process…just copying the newer files and leaving the unchanged ones alone.  same exact thing…server crash, both os drives are red, bad, unconfigured…..   the same fix worked again, and again no errors, no corruption.  RAID 5 still perfectly good and intact.

 

i bought a slot fan thinking it's a cooling issue and will put that in.

 

but my vendor whom i bought the server from says they think the raid card needs a firmware update/ driver update…asked me to install the MSM app,  which i was able to do after a great deal of pain as it's got an issue with java or something …but i got it installed anyway.

 

Here is my question…   should i attempt to mess with the firmware/drivers?  as old as that card is i'm kinda shocked it might not be the latest, but i've not yet moved forward with it.

 

the vendor is certain nothing is wrong with the card since the raid 5 didn't fall apart, just the raid 1.

 

Is my idea of getting more air on the card bad or good?

 

Is there something different i can look at to see what the issue might have been?   

 

Mind you it's just happened 2 times doing exact same thing, and it takes litterally upwards of 2 days to do this operation, so it runs and runs doing this teracopy operation…but that should not be cause of a crash unless heat?   i'm just not sure.

 

I did set the server up as a DC and moved the 8 users onto it, but the server appears to be perfectly fine the rest of the time…i've tested power and the ups on it holds the system up on power loss without issue.

 

The server has redundant 1200w psu on it.

 

I'm open to ideas and suggestions, but i would REALLY REALLY prefer that anything tried is not destructive to the OS if i can help it.

 

Thank you all for your great advice

 

~ike

Zero AI Policy

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


Avatar of Ike SmithIke Smith🇺🇸

ASKER

Follow up…

 

I don't know which area to look for any logs that might help in this situation…the crash blue screen was always at 0% and stuck, so i don't think i got any logs from it.


Avatar of rindirindi🇨🇭

First of all, what hardware is the Server? Why aren't you virtualising the server (with Hyper-V for example)? 

 

“Real” servers which are dedicated as such from manufacturers like Dell or HP etc. are usually dimensioned so they are up to the load & don't overheat.

 

Virtualising servers these days is the way to go, as that makes it much simpler to move the system over to other hardware easily.

 

What are the disks in your RAID 5 array? RAID 5 shouldn't be used anymore, it was OK a decade or so ago when the disks had low capacities, but with the high capacities you have these days, it's a risky array type, as rebuilds take ages with large disks, which increases the chance of a 2nd disk failing which would kill your data. It is OK for disks up to 1TB size, but above that avoid RAID 5. If you are using SSD's then larger disks are less problematic. Also, never use consumer grade disks with a hardware RAID controller. You need enterprise class disks for that. Consumer disks have a longer timeout when they encounter read errors, hardware RAID controllers don't wait that long so they then take the disk offline if there are too many retries. Enterprise class disks are better matched for hardware RAID controllers. Most controllers also list which disks are best. Most disks also have a kind of color code, for example blue & green labelled disks are consumer disks, red disks are usually meant for NAS which use software RAID controllers, while Enterprise grade disks have a Yellow label.

 

Of course the firmware of the server, as well as the RAID controller & the disks should be as current as possible.


Updating the firmware and drivers for the RAID card is an easy check. See if it has an onboard battery and consider replacing it because a system crash shouldn't blow away your RAID config. Lastly, you can try moving the RAID card to a different slot and see if that helps. 

It's possible this is a heat problem, and it could be a firmware/driver issue, but I suspect a problem with that RAID adapter itself.


Reward 1Reward 2Reward 3Reward 4Reward 5Reward 6

EARN REWARDS FOR ASKING, ANSWERING, AND MORE.

Earn free swag for participating on the platform.


Using smart tools should allow you to read the temperature from the disks.
They should be able to show it.

Avatar of Philip ElderPhilip Elder🇨🇦

Avago? Are you sure about that?

They've not been around for a very long time.

 

Does the RAID controller have a battery backup so that the controller can be set to Write-Back instead of Write-Through for that RAID 5 array?


Avatar of kevinhsiehkevinhsieh🇺🇸

Not sure why a new server seems to have an older card installed. 

At any rate, latest firmware and drivers should have been done when the server was setup. It sounds like this isn't live yet, or I would ask about your backups. Upgrade the firmware for RAID, firmware for NICs, BIOs, etc. Also make sure drivers are latest.


Free T-shirt

Get a FREE t-shirt when you ask your first question.

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


Avatar of Ike SmithIke Smith🇺🇸

ASKER

I think i found the issue…

 

found the temp finally in the msm ….raid card chip is 97C and the server is nearly idle.

 

If anyone thinks the issue is OTHER than this, please let me know.

 

Ike


Avatar of Ike SmithIke Smith🇺🇸

ASKER

To answer your questions Rindi,

 

It's a server with 

Supermicro H13SSL-N

AMD Epyc 9124 16 core 3ghz

32GB  DDR5 4800Mhz ECC 

WD Blue enterprise drives  SA510 model

(2) 1TB drives for OS RAID 1

(6) 2TB drives for network share RAID 5

Supermicro case with dual 1200w psu

OS is windows server 2025

 

 

I only learned just last week of the raid card model,  and the server has been in lab environment and barely in use for about a month, and only this issue came up last weekend as i explained above…  

 

As for why i didn't choose to visualize.   I just came on the scene of this office about 2 months ago and am following the instructions of the customer.   


Avatar of Ike SmithIke Smith🇺🇸

ASKER

Hi Philip,

 

Yes, it's Avago branded in the MSM utility.

 

I see and realize this is a very old card, but if it works as intended I am ok with it.  

 

I will bring it up with my vendor though.    

 


Reward 1Reward 2Reward 3Reward 4Reward 5Reward 6

EARN REWARDS FOR ASKING, ANSWERING, AND MORE.

Earn free swag for participating on the platform.


Avatar of kevinhsiehkevinhsieh🇺🇸

The CPU seems like massive overkill for a file server, but I understand that you didn't order it.

 

It is still possible that newer firmware will fix the heating issue.


Avatar of Ike SmithIke Smith🇺🇸

ASKER

The last drive update on the card looks to be 2022, and the card i am using can certainly use the update….

 

Is there any trick to this firmware updating a raid card?  (i'm not doing it until the card is in a optimal temp setting)

 

But can i actually do the update through the MSN utility?  it looks like it's possible.   I've never updated a raid card before….and not sure it's even needed if it's a temp issue?

 

as for RAID 5 not being used anymore…what RAID is used in it's place then?   I forgot to ask that above to my first reply.

 

Ike


Avatar of kevinhsiehkevinhsieh🇺🇸

RAID 5 on five 2 TB SSD drives is fine. That is what I would pick.

 

Again, the firmware you are using could be causing the high temp issue.


Free T-shirt

Get a FREE t-shirt when you ask your first question.

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


anything over 12TB use RAID 6

Avatar of Ike SmithIke Smith🇺🇸

ASKER

Hi Kevin,

 

do you think even though it's running hot i could use the MSM utility to do the update ?

 

i've never updated a raid card before…will i need to go back in and reset my raid arrays?

 

the link i got from my vendor for the card is actually broadcom branded and the 3108 version.  the numbering scheme looks correct from the readme.

 

Are there any considerations i need to worry about?

 

Thank you

 

ike


Avatar of kevinhsiehkevinhsieh🇺🇸

I don't have your card. I haven't looked at the docs. If the utility says you can do the upgrade from there, then that should work.

 

There is always an off chance that the upgrade process causes data loss, or bricks the controller. Chances are very low of that happening. Proceed accordingly.


Reward 1Reward 2Reward 3Reward 4Reward 5Reward 6

EARN REWARDS FOR ASKING, ANSWERING, AND MORE.

Earn free swag for participating on the platform.


Avatar of kevinhsiehkevinhsieh🇺🇸

David, the 12 TB RAID 5 array sizing recommendation only apply to disks with a UBR of 10^-14. SSDs have a much lower error rate, so that RAID 5 recommendation is not applicable here.


Avatar of Ike SmithIke Smith🇺🇸

ASKER

….jumping in the water…  will report back….

 

ike


Avatar of Ike SmithIke Smith🇺🇸

ASKER

I survived.

 

firmware update and driver update fully successful.

 

no errors, no issues.

 

The card still reads at 92C after the last reboot.   I saw it should be no higher than 50C in the operating temps section…so i'm going to go forward with adding a slot fan.   Those types of fans are so crappy and cheap, i wish they made a quality verison of a slot fan …similar to how you get high rpm cpu fans for servers vs. crappy cheap fans for home pc's..etc.  Anybody know of a high end REAL fan for pci slot?  Might be a market for it since water cooling doesn't really seem practical on a server.  (or am i behind the times?)

 

I'll report the temp change once the fan is installed…i might be able to get it done Friday, but might not…this customer is a lawfirm and they stay pretty busy, but Fridays tend to be more relaxed.  I don't have a key to this customer's office so i have to be there during open hours.

 

Thank you everyone for the great info and advice…i'll report again soon.

 

ike

 


Free T-shirt

Get a FREE t-shirt when you ask your first question.

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


Avatar of rindirindi🇨🇭

Maybe you should first consider removing the heatsink(s) from the RAID adapter first, clean off all residue of old thermal transfer paste (or whatever has been used previously, as with age that paste tends to dry out & get brittle, which negates it's value to the opposite of what it is supposed to achieve. You need to do that on both the chips' surfaces & the heatsink(s), then reapply a very small drop of fresh paste (or use a new thermal transfer pad if that was previously used. If the thermal transfer from the chip to the heatsink doesn't work properly, additional fans won't help.

 

Also check for firmware updates to the disks, if available.


Avatar of Philip ElderPhilip Elder🇨🇦

As a system builder I find it insulting that a fellow system builder would pass off an old RAID Controller in a new system like that. A new one should be installed. Period.

 

That's unacceptable behaviour.


WD Blue is not designed for RAID usage.. you need WD RED's

Reward 1Reward 2Reward 3Reward 4Reward 5Reward 6

EARN REWARDS FOR ASKING, ANSWERING, AND MORE.

Earn free swag for participating on the platform.


Avatar of rindirindi🇨🇭

I'm not that sure whether the disk color codes apply to SAS disks too. SAS is rarely used outside of servers, & servers are normally designed for RAID.

 

With SATA that is a different issue.


WD Blue SA510 SATA Internal Solid State Drive SSD - SATA III
Not SAS 10 (no such thing) but SA 510

Avatar of Ike SmithIke Smith🇺🇸

ASKER

Thank you all for the input,

 

an update:

 

The card runs at 67C now with the slot fan in, but I can verify that the in case fans are just spinning way down since the cpu is staying cool…and the raid card is VERY far from the cpu, so the tech support of my vendor suggests using IPMI to raise the floor speed of the fans which should keep more air moving over the card.

 

The case is absolutely silent idle, and so i know there isn't enough air flow.

 

as for the other stuff…wd blue, older raid cards…  i have a 3 year warranty on this machine, so if these issues are as bad as some say, i'm pretty sure they will honor it as they are a major system builder in the Dallas area.

 

But i found several forums where many use the wd blues in raid and have reported no issues….so maybe not all agree in this matter?   I'm not disputing anything here, just passing along that others may not fully agree that blue's can't do raid.   At this point I'm committed to the machine for now, so it's a moot point after the fact, but going forward I believe i'll be more discriminating as some suggest it should be the case.

 

Ike


Free T-shirt

Get a FREE t-shirt when you ask your first question.

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


Avatar of rindirindi🇨🇭

I must have misinterpreted “WD Blue enterprise drives  SA510”. It looks like “SAS 10”. Also it shows “Enterprise”, & that usually doesn't include consumer disks which green & blue usually are.


Avatar of rindirindi🇨🇭

If you use Software RAID or FakeRAID controllers (which is similar to Software RAID, those controllers are often integrated into cheap mainboards of PC's), you can use Blue or Green Consumer disks. But for real RAID controllers you need the Enterprise class disks mainly because of the different timeout for retries I mentioned earlier.


ASKER CERTIFIED SOLUTION
Avatar of Ike SmithIke Smith🇺🇸

ASKER

Link to home
membership
Log in or create a free account to see answer.
Signing up is free and takes 30 seconds. No credit card required.
Create Account
Windows OS

Windows OS

--

Questions

--

Followers

Top Experts

This topic area includes legacy versions of Windows prior to Windows 2000: Windows 3/3.1, Windows 95 and Windows 98, plus any other Windows-related versions including Windows Mobile.