We help IT Professionals succeed at work.

XP Pro SP3 on Dell Inspiron 9400 notebook with OCZ SSD: BSOD 0xF4 after resume from standby

ChrisEddy
ChrisEddy asked
on
Gentlemen,

I have just replaced the hard drive in this 3 year old Dell Inspiron 9400 notebook computer with a new and very quick OCZ SSD, manually configured the partition with a 1024 offset, freshly installed the OS, freshly downloaded all of the latest and greatest drivers from Dell, and applied all currently available OS updates from Msft.

The problem is that when the machine resumes from Standby, it will /reliably/ (4 out of 4 attempts) produce a BSOD 0xF4 after the power button is pressed to resume the machine from standby.  

Here's the sequence to recreate the problem:

0) Machine is booted normally into Windows, and log in to an account which has administrative privs.
1) Click on Start -> Shut Down -> Standby.
2) See display turn black, disk I/O light flashes then stops, then the power indicator light begins to flash on and off slowly.
3) Wait until the power light has made 2 slow flashes.
4) Press the power button.
5) See the Dell Bios splash screen, then disappear
6) Boom: See the BSOD 0xF4

The values reported after the STOP are:
(0x00000003, 0x865b3020, 0x865b3194, 0x805d2954)

Note that I've been in contact with OCZ before about this SSD+computer, because the previous BSOD that was produced was 0x77.  Their recommendation was to create the partition with an offset with a 64 interval, and to reflash the SSD with their modern firmware.  This was done, the OS was reinstalled as described, and now I'm getting a different BSOD code.  Another mention was a question whether the notebook computer uses a SATA2 controller (definitely compatible) or SATA1 (which may have troubles).

I've run Spinrite on the SSD, and there are lots of ECC errors being reported.  I've been in contact with Spinrite, and they chalk this up to the SSD being chatty (which they like), but since SSD's are new and magnetic disks are common, they want to stay focussed on magnetic disks.  

When the machine boots back up, the OS reports that a serious error has occurred, and asks that a problem report be submitted, which I do.  Then an attractive but somewhat generic page is displayed with common causes (Aging or failing hard disks, large file transfers from secondary media to local hd, loss of power to a hard drive, hard disk intensive processes (eg: antivirus scanners), recently installed hardware that might have compatibility and performance problems)

This link has an interesting description of BSOD codes in general:
http://www.aumha.org/a/stop.php
It briefly describes a CRITICAL_OBJECT_TERMINATION.
It also contains a link to a 2007 article:
http://support.microsoft.com/default.aspx?scid=kb;en-us;330100
This link describes the symptom, but mentions the PATA drive not configured as a primary.  

I also found this link in EE:
http://www.experts-exchange.com/OS/Microsoft_Operating_Systems/Windows/XP/Q_22908820.html#a20146646
Unfortunately, it recommends to "Nuke and pave" - which has already been done.

The System event log shows:
Error code 00000077, parameter1 c000000e, parameter2 c000000e, parameter3 00000000, parameter4 00fdf000.
Category (102), EventID: 1003
Note that BSOD x77 was being produced at resume from standby /before/ the partition was put onto an aligned boundary and the OS was freshly reinstalled.

Currently in the c:\windows\Minidump folder, there are 4 minidump files.

I am hoping for guidance, to at least learn more about the problem, and possibly solve it.

Thank you in advance for your help!
Comment
Watch Question

Most Valuable Expert 2013

Commented:
FWIW are you on the A10 BIOS on the Inspiron?
The problem may well be with the bespoke Dell system drivers and handling of the TRIM function with SSD's (see Wiki including the OCZ reference).
Dell haven't yet addressed TRIM and the Inspirons already have a reputation for coming out of standby without hardware acceleration active and the two are likely to be playing off each other.
Currently I think you may be stuck with this if you suspend to a "chatty" SSD :(

Author

Commented:
Thank you for the snappy response!

Yeah, I've been following Anand's SSD research for a while now, and think he has made some super points and enlightenment.  

Yes, I've already reflashed the Bios to A10.  

The chipset driver (Dell file R117079) is dated 1/17/2006.  The fixes and enhancements are described as: "Added support for systems with the Intel 945GM/945PM and ICH-7M chipset.".  Since this is all pre-SSD and pre-TRIM stuff, I don't think this is a factor.

I ran Everest to get some objective insight into what the machine actually has, and it shows the Motherboard chipset is: "Calistoga-PM i945PM"

The Intel driver download site for "Intel 945 Express Chipset Family" has a reference to "32-bit intel RST driver for F6 install", but I'd like to think that the Dell OS install disk or latest chipset package has a reasonably current and functional driver (which it is, and it is - /except/ when resuming from Standby), and is therefore unnecessary.

Note that OCZ has offered to RMA the drive, but I'd prefer to first learn more about the drive and it's possible incompatibilities and problems.  I can reinstall the OS in about 2 clock hours, which is much less time than shipping the drive to the manufacturer, they process the exchange, then the replacement drive is returned.  Reinstalling is a much smaller time hit for me.  

Author

Commented:
I've downloaded and installed a trial version of HDtune.

The benchmark shows the minimum read transfer rate is 96.7 MB/s, max is 110.1 MB/s.  Access time is 0.2 ms.

The SMART data shows this:

                                                   Current  Worst  Threshold  Data     Status
(05) Reallocated Sector Count    95          95        0               2048     warning
(09) Power on hours count         100        100      0               137       ok
(0C) Power cycle count               100        100      0               51         ok
(BB) Reported uncorr errors       100         100      0               0           ok
(C3) Hardware ECC recovered   110        110       0               45795342  ok
(C4) Reallocated event count     100        100       0               0           ok
(EA) (unknown attribute)           0             0          0               320        ok
(F1) (unknown attribute)            0            0          0               320        ok
(F2) (unknown attribute)            0            0          0               576        ok

Any thoughts on how significant or insignificant this may be?
Most Valuable Expert 2013

Commented:
Not sure how valid HDTune will be if it's looking for mechanical defects, if this were a conventional drive the reallocated sectors would be worrying particularly if it looked like the numbers were going up and the sector pool was being exhausted.
 
However I think it's unlikely the figures are relevant for the SSD.
See what you get with CrystalDisk which is more SSD aware.
http://crystalmark.info/software/CrystalDiskInfo/index-e.html 

Author

Commented:
Interesting tool!  Thanks!!

OK, the results are:

 - Ricoh MMC Host Controller [ATA]
 - Ricoh Memory Stick Controller [ATA]
 - Ricoh xD-Picture Card Controller [ATA]
 + Intel(R) 82801GBM/GHM (ICH7-M Family) Serial ATA Storage Controller - 27C4 [ATA]
   + Primary IDE Channel (0)
     - OCZ VERTEX-LE
   + Secondary IDE Channel (1)
     - TSSTcorp DVD+-RW TS-L632D

-- Disk List ---------------------------------------------------------------
 (1) OCZ VERTEX-LE : 100.0 GB [0-0-0, pd1]

----------------------------------------------------------------------------
 (1) OCZ VERTEX-LE
----------------------------------------------------------------------------
           Model : OCZ VERTEX-LE
        Firmware : 1.10
   Serial Number : f04440006
       Disk Size : 100.0 GB (8.4/100.0/100.0)
     Buffer Size : Unknown
     Queue Depth : 32
    # of Sectors : 195371568
   Rotation Rate : ---- (SSD)
       Interface : Serial ATA
   Major Version : ATA8-ACS
   Minor Version : ATA8-ACS version 6
   Transfer Mode : SATA/300
  Power On Hours : 138 hours
  Power On Count : 51 count
     Temparature : Unknown
   Health Status : Unknown
        Features : S.M.A.R.T., 48bit LBA, NCQ, TRIM
       APM Level : ----
       AAM Level : ----

-- S.M.A.R.T. --------------------------------------------------------------
ID Cur Wor Thr RawValues(6)       Attribute Name
01 110 110 __0 000002C0B518 Read Error Rate
05 _95 _95 __0 000000000800 Reallocated Sectors Count
09 100 100 __0 28820000008A Power-On Hours
0C 100 100 __0 000000000033 Power Cycle Count
AB __0 __0 __0 000000000000 Unknown
AC __0 __0 __0 000000000000 Unknown
AE __0 __0 __0 000000000013 Unknown
B1 __0 __0 __0 000000000000 Unknown
B5 __0 __0 __0 000000000000 Unknown
B6 __0 __0 __0 000000000000 Unknown
BB 100 100 __0 000000000000 Vendor Specific
C2 __0 __0 __0 000000000000 Temperature
C3 110 110 __0 000002C0B518 Unknown
C4 100 100 __0 000000000000 Reallocation Event Count
E7 _94 _94 __0 000000000001 Unknown
E9 __0 __0 __0 000000000000 Vendor Specific
EA __0 __0 __0 000000000140 Vendor Specific
F1 __0 __0 __0 000000000140 Vendor Specific
F2 __0 __0 __0 000000000240 Vendor Specific


Most Valuable Expert 2013

Commented:
Again I think this is pretty typical of what you would see for an SSD
Having looked around there seem to be some posts where users have tried (with varying degrees of success) to get the generic Intel Drivers for the systemboard installed as these are more recent than Dell's not sure if you want to go there.
ATM I still think what you are seeing is an error caused by the system "waking up" and finding the drive ready to go before it is.

Author

Commented:
I've posted a copy and paste of this output to the trouble ticket I have open with OCZ re: this, also seeking guidance and enlightenment.

I'll post their response when it arrives, but from past experience it's been a couple days, so I expect their response will be a couple days...  
Most Valuable Expert 2013

Commented:
No problem ~ have a personal interest as am looking to put an SSD into a Latitude about the same age as your 9400.

Author

Commented:
I figured that.  The basic idea is sweet.  But the devil is in the details ...


Well, I've exchanged several trouble ticket messages with the manufacturer (OCZ), and it turns out that there can be a subtle problem with the drive when installed into an older computer.

I've talked with Dell technical support, and confirmed that the computer does indeed have a SATA1 interface.  

The drive has a SATA2 interface, which has a higher rated bandwidth than a SATA1 interface.  The SATA1 interface does not throttle the data when the bandwidth gets too high (read: saturates).  

Here's a link to more information:
http://en.wikipedia.org/wiki/Serial_ATA#SATA_Revision_1.0_.28SATA_1.5Gb.2Fs.29

I've posted two suggestions to OCZ for possible firmware improvements.  The first is to set the drive to operate at the SATA1 bandwidth (which will go far in solving incompatibility problems like this, and still retain zero seek times).  The second is to introduce a start delay, which would cause the SSD to delay the initial time it begins to produce data, to allow the SATA controller and associated drivers to become ready before the data onslaught begins to occur.

So the lesson learned is: installing a much faster SSD than what the computer originally had will not automatically work, because there are one or more corner cases where the SATA1 interface will get saturated and cause unintended results.  

So for now, I'm done.  

Either the customer is going to accept that the Standby functionality of the OS is not going to work with this drive, or the drive needs to be replaced with something slower.  

Most Valuable Expert 2013

Commented:
I prefered "the system "waking up" and finding the drive ready to go before it is." :)
But that makes perfect sense, shame there's no way to buffer the extra data.  Can't see the PC manfacturers wanting to provide a fix although OCZ could gain a marketshare on allowing their entry level SSD's to go into older laptops.
 
Thanks for the info.

Author

Commented:
I'm really not thrilled at the thought of the SATA2 drive not autonegotiating the speed down to SATA1.  That is standard stuff in an IDE or SCSI drive.   Even dial-up modems will automatically evaluate and reevaluate the line quality, and determine the maximum reliable transmit and receive speeds.

Author

Commented:
The embedded SSD controller of the Vertex-LE drive is made by SandForce, which is currently State Of The Art (but wait until next month).  Depending on the benchmarks you look at, this is the fastest, or the Intel XM-E is fastest, or the Crucial RealSSD is fastest.  These are all super quick drives, and are all available right now.

There are other companies which offer SSD's with the SandForce controller.  OCZ is one of the resellers.  But OCZ is big enough and progressive enough that one would hope they could and would aggregate customer experience responses, and motivate a firmware update to occur sooner rather than later.  

Interesting that we seem to finally be at a place in time when the drive is faster than the "computer", rather than the other way around.  One of my diagnostics showed an average read speed of about 100MB/sec, showing the "computer" is limiting the disk bandwidth not the SSD.