Link to home
Start Free TrialLog in
Avatar of DesertDawg
DesertDawgFlag for United States of America

asked on

SBS 2008 loses NICS

A couple of weeks ago there was a lengthy power outage in the area where my server is located which drained the UPS battery and shut down the server.  When the power came back up, I re-started the server which booted fine but didn't find the on board NICs so had no network services and was consequently useless.  It also, somewhat strangely, showed that there was only three days left to activate the operating system despite it being in use for over three years!

After much tinkering and several re-boots, I did get the NICs showing up again in device manager and all of the services running.

Everything has been O.K. for two weeks but on Saturday I decided to replace the UPS as the old one was showing signs of a failing battery.  I shut down the system normally, replaced the UPS, but when I started the server I had the lost NICs and false activation notice problem all over again.  As before, after much tinkering and re-booting, I did get everything back up again but I need to figure out what is causing this.  The false activation alert seems to indicate that this is a software issue.

Hopefully, someone may have experienced this and rectified the problem.
Avatar of DesertDawg
DesertDawg
Flag of United States of America image

ASKER

Apart from the regular monthly updates from Microsoft, the only other software change made to the server was to rectify losing the Internet connection.

http://blogs.technet.com/b/sbs/archive/2010/04/22/you-may-lose-the-default-gateway-on-sbs-2008-every-time-you-reboot.aspx
Sounds like your system battery is bad.

Have you checked the system time?
No problem with the clock.  Right on time so the CMOS battery seems to be O.K.  However, I'm not against changing the battery at the next shut down.
For the on board nics your power outage may have done some damage to your system board.

Do you have any PCI nics that you can use?

Do you use both nics?
Avatar of Rob Williams
Not actually your question, but SBS only support's 1 NIC, you said NIC's ?  Multiple NIC's on  SBS 2008/2011 can cause all sorts of odd behaviors, though I am not familiar with this being one of them.
Avatar of drudesill
drudesill

The SBS licensing component for OEM software is sometimes tied to your Primary NIC's MAC address.  When it's not showing up and addressable licensing will fail, thus the activation warning.

As RobWill mentioned, you may want to disable the second NIC. It's not supported in SBS 2008.

Are you keeping up with urgent firmware updates for the server if any exist?

David
O.K.  The Primary NIC MAC address issue makes sense.

The second on board NIC is disabled and has been since day one.  It's quite clearly disabled now that I've got it up again in device manager.

No firmware updates for the system board which is a Supermicro X7DVL-3 although there has been an Intel chipset upgrade that hasn't been applied.

I was about to try a PCI based NIC but once I got the system going again I was reluctant to try it.  The interesting thing about both failures is that they only occurred when the power was completely shut down.  It didn't happen during maintenance re-starts of which there have been two since the problem first arose.
It definitely sounds like a BIOS issue.  I don't know about the software being tied to the NIC, but if 3 or more key components change, the software will definitely want to re-activate.  Not suggesting they have changed, but it sounds like they are not detected, or are detected differently.
I was originally thinking along those lines re the BIOS but once I got the NICs up again the activation warning disappeared too.

There is a BIOS upgrade available to V 2.1a but I'm reluctant to install that if it doesn't help and the whole error process starts again.  I'll call Supermicro support first to find out the reason for the BIOS update and whether it has anything to do with on board NIC support.
Just a heads up.  I talked to Supermicro support who are reviewing the BIOS update specs to see if there is a network support component.  They agree that without that, a BIOS update is probably of no value.
DesertDawg,

Has the unit been subject to any abnormal heat?  Were/are the chassis internals dirty or dusty?  Can you perform a visual inspection of the electrolytic capacitors ("cans" with a " Y " or "X" Impression on the top)?  Look for any capacitor that may have a bulging appearance or caps that may be leaking brownish or black crystallizations.

The line where you mentioned complete power off versus restart has an effect on the failure mode support the possibility of a bad motherboard cap.

If the UPS was going bad and the output voltage went low to the server, then power supply output voltage could also go low and the current draw would increase.

David
No evidence of unusual heat.  The server is in a rack cabinet and there is a data thermometer above it.  The Supermicro "Superdoctor" application shows no history of thermal events prior to the power going off.

I did a quick visual with the cover off but see no evidence of capacitor problems.  The server is quite clean externally and internally.

Supermicro just responded that there is no change to NIC support in the one BIOS update released for this sytem board.

I originally thought this was a "dirty shutdown" problem, especially when I first got the system back up and running.  However, the second shutdown was conducted correctly but the scenario repeated.

For the moment, unless there is someone else with a similar issue, I'll probably just keep a spare PCI NIC on top of the unit so that if it doesn't bring the on board NICs back up at the next shutdown I can install the new one.
Are you sure it is just the NIC that is not recognized upon reboot?  I would be suspicious of other devices, which you may not have noticed, if it is wanting to activate.
I am guessing that the power outage did more damage than we think.

Have you run system diags yet?

Check you power supply outputs also

Did you get a chance to try a PCI nic?
Only the NICs were not showing up in device manager.  Everything else started normally.  Now that I have the NICs up again, everything is running as it should and the activation notice has returned to the usual "Windows is Activated" format.

The event logs have been running normally since I got the server back up correctly on Sunday.

Power supply is acting normally.

I haven't tried a PCI NIC yet.  Supermicro are suggesting an update of the on board Intel LAN drivers and INF file.
I would update the chipset driver as well.  To me it sounds like the BIOS is not advertising hardware properly or the software is not detecting it properly.  If the chipset driver were corrupted it might have that effect.
I agree.  The chipset INF file may have been corrupted when the power originally was shut down by the supplier.  Still, if that was the case, I'm not sure exactly how I've managed to get things running again!  

I'll download the drivers to a flash drive and install them at the next suitable opportunity.  Of course, a re-start will be required so I'll know right away if this remedies the problem.
I agree that updating the chipset inf file etc is a good thing to do, but it is not the reason or cause of your problem.

As you said if the chipset or bios was corrupted you would have never been able to restart your server.

I have seen many on board devices go bad in my day  for many reasons.

Does your UPS shutdown the server normally?

We have ups units that will shutdown the server after being on battery for a length of time.

I still think you are going to need the PCI NIC no matter what you update.
You may well be right.  I was close to trying the PCI NIC solution just before I got the server systems up and running again.

The UPS has been totally replaced by a new model that will soft shutdown the server when on battery power.  So was the previous model but it appears that the routine didn't start as the battery was on the point of failing.  Quite why there was no audio or visual alert regarding the battery state, I don't know.
Another minor issue has cropped up here.  When the server ran for a few hours without network support, three critical alerts popped up in the console related to a lack of network support.

Now that I have the system running with network support, how do I remove these alerts?
What are the messages? where did they appear ?

Did you put in a PCI Nic?

Did you restart the server again after you had the alerts?
The 3 alerts appeared in the computer alerts section of the SBS console while the network support was down on Saturday.  They remained after the system was re-booted and started with network support on Sunday.

They are....

1) FSMO roles licensing error
2) BPA error
3) Licensing error for the additional server number check

All of these were rectified when the server was re-started with network support.

A new PCI NIC was not installed.

I had expected the alerts to have dropped out of the console by now but they're still there.
Ok check to see if DCOM is enabled

From what I understand the alerts will clear themselfs after a few cycles of the services

Yes those errors will occur if the network support is not working.
Yes, DCOM is enabled.

I'd have thought that the errors would have cleared by themselves too, but I'll leave them for another few days to see what happens.
Ok keep me posted

Sorry for the delay tons of snow here
Oh, no problem.  Hope you're O.K.

Alerts still up there this morning after 8 days.
Can you take a snap shot of the console and post would like to see what you are getting.
I'll try to do that.  Funny thing is, the errors don't show up on the detailed network report.
Which network report you talking about?
The detailed network report in the SBS 2008 console.
ASKER CERTIFIED SOLUTION
Avatar of Member_2_6492660_1
Member_2_6492660_1
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
O.K.  I switched off the alerts relating to the false alarms which removed them from the report.  I'll try switching them on again after a few days to see what happens but, for now, everything is fine.