Link to home
Start Free TrialLog in
Avatar of 800grader
800graderFlag for Sweden

asked on

BSOD on Windows 2008 with ExtremeZ-IP

I have done everything exept solving the error in eventviewer, not quite sure how and if it is the problem.

We have the lastest firmware from Dell, patches and the latest version of ExtremeZ-IP. There max 10 clients, all with the latest Leopard version, patched a few weeks ago.

I get bluescreen on the server and no error logs.

Suggestions?
Avatar of sk_raja_raja
sk_raja_raja

Avatar of Paul Knight
humm no dump file??
Avatar of 800grader

ASKER

ExtremeZ-IP have one volume, size is 850 GB.
grader,

is there no minidump file on the server?

/Fox
Yeah we use backup exec, will look into that.

Minidump attached


Mini102208-01.txt
The version of Backup Exec 11d (Version 11.0 Rev. 7170).
Can anyone tell about the issue in eventviewer?

Memory settings for this server are not optimized correctly.  You should reconfigure your server to maximize throughput for network applications (not file sharing).

ok...can you post the event log ?
i am not able to open the attachment...can you copy and paste
Yes,  hope xml is ok...

- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
  <Provider Name="ExtremeZ-IP" />
  <EventID Qualifiers="32768">2</EventID>
  <Level>3</Level>
  <Task>0</Task>
  <Keywords>0x80000000000000</Keywords>
  <TimeCreated SystemTime="2008-10-22T15:11:36.000Z" />
  <EventRecordID>9569</EventRecordID>
  <Channel>Application</Channel>
  <Computer>MyComputer.mydomain.com</Computer>
  <Security />
  </System>
- <EventData>
  <Data>Memory settings for this server are not optimized correctly. You should reconfigure your server to maximize throughput for network applications (not file sharing).</Data>
  </EventData>
  </Event>
Grader,

you can safely ignore this message on a server 2008.

Summary:
During the initial launch or start-up of ExtremeZ-IP, the following error message may appear in the ExtremeZ-IP Operation Log, or the "Application Log" of the Windows Event Viewer:

Memory settings for this server are not optimized correctly. You should reconfigure your server to maximize throughput for network applications.

Description:
On a Windows 2003 or earlier system, this message is an indication that present memory optimization can 'choke' connections to Macintosh clients. On Windows Server 2008, the setting related to this message is no longer user configurable and this message can be disregarded.

Additionally, if no problems with throughput, connection, or performance are recognized, this warning may be disregarded. However, if any of these problems are recognized, it is recommended that throughput be optimized for "Network Applications," as opposed to "File Sharing."

This adjustment may be made by selecting:

Settings > Network and Dial-Up Connections > {NIC Card} Properties > File and Printer Sharing for Microsoft Networks > Properties

According to Microsoft (KB Article Q110255):

"The 'Maximize Throughput for File Sharing' option permits the system cache to use more available memory than it would otherwise. In this situation, the available memory can drop to levels that result in heavy swapping activity on the hard disks in order to accomodate requests from user or system applications that may subsequently need to be swapped into memory."

We make the recommendation that ExtremeZ-IP users enable throughput for "Network Applications" for the above memory reasons. Specifically, the File Sharing optimization helps Windows file sharing connections only. As a result, ExtremeZ-IP and other similar applications are forced to swap more frequently, causing performance to degrade. Of note, the setting of this option should only minimally impact the file sharing performance of your Windows users.

If Mac-centric connections are dealing with a limited chunk of memory, and a large number of change notifications (based on other systems accessing files), then it is possible that Mac users will experience decreased performance.

It is also the case that the Macintosh will continue to "ping" or "tickle" the shared resource at a set interval, whether the file sharing is actively doing something or not. Even if a volume is simply mounted on the desktop, the Mac will continue to talk to the AppleShare server, to maintain the connection. If the mac repetitively makes these requests (which are being queued on the server) and waits for responses, the macs will "hang" until the response is reached. If there is a delay involved, a delay will be evident to the Mac user as well.

http://support.microsoft.com/default.aspx?scid=kb;EN-US;q110255

http://www.grouplogic.com/knowledge/index.cfm/fuseaction/view_Info/docID/68

/Fox
Thanx for the excellent answer!

I have not enabled Apple Talk in ExtremeZ-IP. I read somewhere that its not needed. Is that correct? I got errors before as I had it enabled.
thats correct, your clients should either be using SMB or APF to connect to you server.

/Fox
Ok.

What is the difference and what is to prefer?
I mean the difference in performance for the Mac clients. Would SMB be to prefer?
Has anyone looked into the minidump?

Do you need additional information? Let me know, I need to solve this case ASAP.
I've posted all Minidumps here, http://85.227.21.8/minidump.zip

I do think it could be Backup Exec that is causing this, but I haven't been able to analyse the dump files.
perfect,

I'm just installing the symbols file and ill take a look.. backup exec is well know to leek memory all over the place.. ill let you know.

/Fox
ok.. here is your debug file:

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except,
it must be protected by a Probe.  Typically the address is just plain bad or it
is pointing at freed memory.
Arguments:
Arg1: d47c6000, memory referenced.
Arg2: 00000000, value 0 = read operation, 1 = write operation.
Arg3: 818be773, If non-zero, the instruction address which referenced the bad memory
      address.
Arg4: 00000000, (reserved)

Debugging Details:
------------------


READ_ADDRESS: GetPointerFromAddress: unable to read from 81940868
Unable to read MiSystemVaType memory at 81920420
 d47c6000

FAULTING_IP:
nt!memcpy+33
818be773 f3a5            rep movs dword ptr es:[edi],dword ptr [esi]

MM_INTERNAL_CODE:  0

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  DRIVER_FAULT_SERVER_MINIDUMP

BUGCHECK_STR:  0x50

PROCESS_NAME:  ExtremeZ-IP.exe

CURRENT_IRQL:  0

TRAP_FRAME:  98ae06fc -- (.trap 0xffffffff98ae06fc)
ErrCode = 00000000
eax=d47d5572 ebx=98ae0800 ecx=00003d5d edx=00000000 esi=d47c5ffe edi=cda96ba8
eip=818be773 esp=98ae0770 ebp=98ae0778 iopl=0         nv up ei pl nz ac po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010212
nt!memcpy+0x33:
818be773 f3a5            rep movs dword ptr es:[edi],dword ptr [esi] es:0023:cda96ba8=???????? ds:0023:d47c5ffe=????????
Resetting default scope

LAST_CONTROL_TRANSFER:  from 81863bb4 to 818ae155

STACK_TEXT:  
98ae06e4 81863bb4 00000000 d47c6000 00000000 nt!MmAccessFault+0x10a
98ae06e4 818be773 00000000 d47c6000 00000000 nt!KiTrap0E+0xdc
98ae0778 819b6039 cda96244 d47c569a 0000fed8 nt!memcpy+0x33
98ae07b8 81a36801 00000004 98ae0800 98ae07f8 nt!FsRtlNotifyUpdateBuffer+0x57
98ae0854 82a2889a 86a78440 86b193b8 98ae0938 nt!FsRtlNotifyFilterReportChange+0x560
98ae08d4 82a81cc8 00000000 ed5a20f8 86b190d8 Ntfs!NtfsReportDirNotify+0xa8
98ae0b28 82abcf57 8863c620 8763e028 876bba58 Ntfs!NtfsSetRenameInfo+0x14c7
98ae0ba8 82a29a54 8863c620 876bba58 1a0ab77e Ntfs!NtfsCommonSetInformation+0x5bf
98ae0c14 818c5053 86b19020 876bba58 876bba58 Ntfs!NtfsFsdSetInformation+0x104
98ae0c2c 81f92ba7 86a7b418 876bba58 00000000 nt!IofCallDriver+0x63
98ae0c50 81f92d64 98ae0c70 86a7b418 00000000 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x251
98ae0c88 818c5053 86a7b418 876bba58 00000000 fltmgr!FltpDispatch+0xc2
98ae0ca0 819fe0bc 8177a6e2 00027718 0910f038 nt!IofCallDriver+0x63
98ae0d48 81860a7a 00027718 0910f080 86e62d78 nt!NtSetInformationFile+0x978
98ae0d48 77ac9a94 00027718 0910f080 86e62d78 nt!KiFastCallEntry+0x12a
WARNING: Frame IP not in any known module. Following frames may be wrong.
00640438 00000000 00000000 00000000 00000000 0x77ac9a94


STACK_COMMAND:  kb

FOLLOWUP_IP:
nt!KiTrap0E+dc
81863bb4 85c0            test    eax,eax

SYMBOL_STACK_INDEX:  1

SYMBOL_NAME:  nt!KiTrap0E+dc

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

IMAGE_NAME:  ntkrpamp.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  47918b12

FAILURE_BUCKET_ID:  0x50_nt!KiTrap0E+dc

BUCKET_ID:  0x50_nt!KiTrap0E+dc

Followup: MachineOwner
---------
So we can safely say that extreemzIP is your issue.. ill am just going to check a couple of the other dmp files.  Looks you have a memory leek in extreemezip...

sit tight... we're getting closer

/Fox
Grader,

Do you have a support contract with group logic??

I just found this on the grouplogic website...

Summary:
A few customers have reported BSOD STOP x50 errors when running Windows 2008. ExtremeZ-IP.exe may be referenced as the process victimized in the MEMORY.DMP file.

DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
BUGCHECK_STR: 0x50
PROCESS_NAME: ExtremeZ-IP.exe

Description:
ExtremeZ-IP runs as a user mode application and cant cause a BSOD. The bugcheck is listing us as a victim.

We have had a few reports from users that updating LSI drivers resolved the problem.
- megasas.sys
- dump_megasas.sys

LSI also rebrands as Dell Perc and HP
- HpCISSs2.sys
- dump_HpCISSs2.sys

Microsoft lists other possible causes here:
http://msdn.microsoft.com/en-us/library/ms793437.aspx


Note:
If you are working with LSI, your Hardware Vendor and/or Microsoft and would like for us to also review any MEMORY.DMP files, please open a support case at:

http://www.grouplogic.com/support-center/?fa=submit-request

/Fox



Thanx Fox,

Case started at Grouplogic, will get back when I get an answer.

I will look into the LSI drivers, maybe I should have a chat with Dell. I'll see what Grouplogic say first.

This tread alone covers the subscription fee, amazing site.

Thanx in advance...

//Marcus
No problem..

your right, EE should be in ALL engineering staffs tool kit :0)

/Fox
Busy week, the problem remains but "only" one BSOD this week, of course one to much...

I don't manage to upgrade the PERC drivers, Windows tells me I'm using the latest drivers but that's obviously not the case. I'm don't want to uninstall the driver and blow my raid set...  (I believe that could happen...)

This  is the answer from Grouplogic;

You should have Dell and/or Microsoft review a complete dump.
Your dump has some key information missing

Also - I see the older drivers are still running.
81f38000 81f42000   megasas    (deferred)
    Image path: \SystemRoot\system32\drivers\megasas.sys
    Image name: megasas.sys
    Timestamp:        Fri May 25 18:19:58 2007 (4657610E)

82b56000 82b5e000   spldr      (deferred)
    Image path: \SystemRoot\System32\Drivers\spldr.sys
    Image name: spldr.sys
    Timestamp:        Thu Jun 21 20:29:17 2007 (467B17DD)

96af5000 96ba4000   spsys      (deferred)
    Image path: \SystemRoot\system32\drivers\spsys.sys
    Image name: spsys.sys
    Timestamp:        Thu Jun 21 20:33:02 2007 (467B18BE)

99279000 99357000   peauth     (deferred)
    Image path: \SystemRoot\system32\drivers\peauth.sys
    Image name: peauth.sys
    Timestamp:        Mon Oct 23 04:55:32 2006 (453C8384)

99357000 99361000   secdrv     (deferred)
    Image path: \SystemRoot\System32\Drivers\secdrv.SYS
    Image name: secdrv.SYS
    Timestamp:        Wed Sep 13 09:18:32 2006 (45080528)


Best Regards,
Charles Kim
ExtremeZ-IP Support


I will face Dell now in order to upgrade the drivers.

TBC

perfect well keep me updated on the progress...

/Fox
Update

We noticed yesterday what caused the BSOD on the server, at least one issue. When a Macintosh user saves a Word document the BSOD occurs (not when "saving as") This user has Entourage 2004, but it may apply to 2008 as well.


Group logic answer

Yes, this exactly matches another Stop x50 case I have where the BSOD occurs just after a Macintosh client saves a Word file.
The dump shows the last command we issue is an "exchange files" command.
This command simply swaps out the file currently being edited (temp file) - with the original file.

Microsoft did a dump analysis and confirmed the problem was caused by Microsoft's NTFS.sys driver.
http://support.microsoft.com/kb/957535

Note that the KB Article references a "Stop Error x24" where yours was a "Stop x50".
Microsoft confirmed that my other customer's "Stop x50 error" applies to this KB.

Dell may not be aware of this issue, so please share this email with them.
The hot fix isn't publically available - so you must open a case with Microsoft in order to get access to the patch.

Best Regards,
Charles Kim
ExtremeZ-IP Support


I just recieved the hotfix and will apply it tonight.

TBC...

perfect... well ill keep my fingers crossed.. BSOD's are always a pain in the rear to resolve.. but its a feel good when you do get to the bottom of it..

/Fox
Bad news!

Hotfix applied yesterday but the problem remains...

Will talk to Group Logic again today.

//Marcus
Marcus,

please do keep us posted

/Fox
Of course Fox, I'll keep the tread updated until its solved.

I've spoken to Dell now and they will look into the Memory dump and the logs created by their own support tool.

I'm really looking forward solving this....

//Marcus
After talking to Dell today I updated bios (2.4.3) and PERC6/I firmware to (6.1.1-0047, A08) (I've overlooked this, the driver is ok though...)

We'll see tomorrow if that did the trick.

//Marcus


BIOS: http://tinyurl.com/5qnk9g
PERC6/I: http://tinyurl.com/5wdtso

Links above will probably not work forever...
Well that didn't help us...

I will continue tomorrow or next week, all Mac users are informed not to save Word documents so the problem should not occur, still I would like to solve this, I don't feel like going back to Win2k3... (i don't feel like giving up either...)

This is the answer from Grouplogic.

I have several other cases who reported the problem was resolved after applying firmware updates and patches.
However, 2 cases remain open with Microsoft and Dell involved.

Both customers provided complete dumps which showed the BSOD occurs just after a Macintosh client saves a Word file.
The command we issue is an "exchange files" command (this is what you see in the crash log).
This common command simply swaps out the file currently being edited (temp file) - with the original file.

Microsoft's initial finding pointed to the NTFS.sys driver and they recommended running "chckdsk /r" and applying the following HF.
http://support.microsoft.com/kb/957535
However both customers reported the problem persists after applying the hot fix (neither was able to run a chkdsk).
The latest dumps are now under review by Microsoft and Dell.

We are recommending using 2003 until Microsoft comes up with an explanation and/or fix.

Best Regards,
Charles Kim
ExtremeZ-IP Support


//Marcus
oh dear... well i guess the problem has been identified (which is a good thing) or at least the source of the issue.  I only sorry it couldnt be resolved more quickly.

Please keep us posted.

/Fox
Today I spoke to Dell again. They've got some people working on it.

They are puzzeled, so we're not the only ones. They do see loads of network activity before the BSOD and they have an firmware upgrade for the NIC

http://support.dell.com/support/downloads/format.aspx?c=us&l=en&s=gen&deviceid=17555&libid=5&releaseid=R197246&vercnt=1&formatcnt=0&SystemID=PWE_2950&servicetag=&os=WLHS1&osl=en&catid=-1&impid=-1

And, although TOE is disabled in BIOS, they have had cases were the TOE we're interfering with the internal trafic causing problems... So I was told to remove this from the motherboard. (Honestly, I have no clue what it does or what is looks like, som kind of plastic with "TOE" on it...)

So I'll do this tomorrow morning, although this feels like chasing ghosts now... I really hope this tread will lead to something good for other users. The most sane thing to do would be to give up and use Windows 2003.

Not yet... :-)
absolutly...  keep pushing..

good luck

/Fox
Well it looks like grouplogic found something here. I've forwarded it to Dell, strange that they haven't noticed that.

Please share out analysis with Dell and tell them to feel free to contact us directly if they have any questions.


ExtremeZ-IP is NOT in the call stack of this dump.

ChildEBP RetAddr  Args to Child              
9cdf86e4 818a6b54 00000000 ca3a5000 00000000 nt!MmAccessFault+0x10a
9cdf86e4 81901693 00000000 ca3a5000 00000000 nt!KiTrap0E+0xdc (FPO: [0,0] TrapFrame @ 9cdf86fc)
9cdf8778 819f9061 ca3b413c ca39dc60 0000f7aa nt!memcpy+0x33
9cdf87b8 81a7990d 00000004 9cdf8800 9cdf87f8 nt!FsRtlNotifyUpdateBuffer+0x57
9cdf8854 82a1d91a 8657f748 86b73778 9cdf8938 nt!FsRtlNotifyFilterReportChange+0x560
9cdf88d4 82a76cc8 00000000 ca39d0f8 86b73498 Ntfs!NtfsReportDirNotify+0xa8 (FPO: [Non-Fpo])
9cdf8b28 82ab1fc7 87b130a8 87b16cc8 88175008 Ntfs!NtfsSetRenameInfo+0x14c7 (FPO: [Non-Fpo])
9cdf8ba8 82a1ead4 87b130a8 88175008 1e7b8725 Ntfs!NtfsCommonSetInformation+0x5bf (FPO: [Non-Fpo])
9cdf8c14 81907fd3 86b733e0 88175008 88175008 Ntfs!NtfsFsdSetInformation+0x104 (FPO: [Non-Fpo]) 9cdf8c2c 81f8cba7 86b236e0 88175008 00000000 nt!IofCallDriver+0x63 9cdf8c50 81f8cd64 9cdf8c70 86b236e0 00000000 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x251 (FPO: [Non-Fpo])
9cdf8c88 81907fd3 86b236e0 88175008 00000000 fltmgr!FltpDispatch+0xc2 (FPO: [Non-Fpo]) 9cdf8ca0 81a411b8 62100758 0000582c 08e9f038 nt!IofCallDriver+0x63
9cdf8d48 818a3a1a 0000582c 08e9f080 86f20d78 nt!NtSetInformationFile+0x978
9cdf8d48 77839a94 0000582c 08e9f080 86f20d78 nt!KiFastCallEntry+0x12a (FPO: [0,3] TrapFrame @ 9cdf8d64)
WARNING: Frame IP not in any known module. Following frames may be wrong.
00640438 00000000 00000000 00000000 00000000 0x77839a94

It looks like you are running the LSI Megasas drivers - on windows 2008 server.
But they are listed as drivers for windows 2003 server, as well as being old.
"Perc" is Dell's modified LSI driver.


81f32000 81f3c000   megasas    (deferred)            
    Image path: \SystemRoot\system32\drivers\megasas.sys
    Image name: megasas.sys
    Timestamp:        Fri May 25 18:19:58 2007 (4657610E)
    CheckSum:         0000CAD7
    ImageSize:        0000A000
    Translations:     0000.04b0 0000.04e0 0409.04b0 0409.04e0

The re-branded LSI Megasas.sys drivers include this one:

93df6000 93e00000   dump_percsas   (deferred)            
    Image path: \SystemRoot\System32\Drivers\dump_percsas.sys
    Image name: dump_percsas.sys
    Timestamp:        Tue Jul 01 12:05:49 2008 (486A55DD)
    CheckSum:         0000BE4E
    ImageSize:        0000A000
    File version:     2.23.0.32
    Product version:  2.23.0.32
    File flags:       8 (Mask 3F) Private
    File OS:          40004 NT Win32
    File type:        3.7 Driver
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      LSI Corporation
    ProductName:      MEGASAS Storport Driver for Windows Server 2003 for x86
    InternalName:     percsas.sys
    OriginalFilename: percsas.sys
    ProductVersion:   2.23.0.32
    FileVersion:      2.23.0.32 built by: WinDDK
    FileDescription:  MEGASAS RAID Controller Driver for Windows Server 2003 for x86
    LegalCopyright:   Copyright © LSI Corporation        


Best Regards,
Charles Kim
ExtremeZ-IP Support
lol.. Charles Kim,

I have an open issue with him as well at the moment.. Well this does deifne the issue quite clearly as an RAID controller driver issue.  Lets see what dell come back with this time.. its like playing tennis with vendors sometimes..

/Fox
Bad news!

Dell didn't come up with any solution... The removal of TOE wasn't even possible since the part didn't exist in my server. The megasas driver they say, should work fine with 2008 and they didn't understand what Charles meant as he refered to it as an old driver.

And as I found out, they didn't start a case at MS, as I thought. I'm not impressed by Dell in this case...

So maybe I'll start a case at MS or simply downgrade to 2003...

I'll keep posting...

//Marcus

Grouplogic have ordered a Dell server with our configuration in order to get more attention from MS and Dell. They have another costumer with similar problem and that way they will be able to adress the problem to MS and Dell directly.

Of course it's impossible to get an eta but I hope it can be solved soon.

I put my trust in Charles Kim...

//Marcus
well i have everything crossed... I admire your perseverence though, I would have backed out to 2003 by now :0)

As always keep us updated.

/Fox
Well, not much concrete is happening...

Charles Kim from Grouplogic is getting a little bit furher though. Feedback from progress.

"Microsoft has been able to reproduce in their lab and is investigating.
We suspect this is happening deep inside the NT Code which only they will be able to troubleshoot and fix (so theres no eta I can provide as this will have to come from Microsoft)."

//Marcus
Thanks for the update

/Fox
Grader,

I recieved this via email this morning... thought it would make you smile..

Support for Services for Macintosh is Gone in Windows Server 2008

Let ExtremeZ-IP Be the Solution

If you have you been relying on Services for Macintosh (SFM) built into your Windows Server for Mac to Windows file and print sharing, it is time to reevaluate your strategy. SFM has been completely removed from Windows Server 2008 and is no longer supported by Microsoft. No worries  all you need is ExtremeZ-IP File & Print Server. ExtremeZ-IP 5.3 is the first release to support Windows Server 2008, providing the most compatible file and print sharing experience for Mac clients on Microsoft's new operating system. Customers that rely on Services for Mac, Print Services for Mac, or AppleTalk and want to migrate to Windows 2008 now have a solution in ExtremeZ-IP. Read more about your SFM replacement options in this latest white paper from Group Logic, "The End of Services for Mac (SFM):  Evaluating Your Replacement Options"

/Fox
Thanx for the laugh Fox...

//Marcus

ExtremeZ-IP 5.3 is completely compatible with Windows 2008. That does not mean that there are not issues with Windows 2008 Server itself. There is a bug in a Microsoft file system driver that can cause a BSOD on file renames. Because Microsoft Word on the Mac as well as several other applications use temporary files, this bug is easily triggered with Macintosh clients. An FPExchangeFile is effectively a series of renames, deletes, and file creates (rename myfile.doc to myfile.doc.bak, rename myfile.doc.temp to myfile.doc, delete myfile.doc.bak,create myfile.doc.temp). Because this file exchange happens on every Word save, this fairly unusual condition in the kernel will still be triggered fairly frequently. Microsoft is aware of the problem and one would assume at some point they will release a fix for it.

Geordie Korper, ExtremeZ-IP Product Manager
Geordie,

Thanks for the update, however the issue still remains that this should have been fully beta'd prior to release.   Although I agree that this is a MS issue, the "bug" was not present before grader installed the ExtreemeZ product.  I can only urge you guys to push MS on this, ExtreemeZIP is a good product and experience has proven that it is stable in most enviroments.  I cant help but feel this is another case of releasing a product to market to quickly because of market preassure.

//Fox
ExchangeFile testing is part of our standard release testing.  For ExchangeFile testing we have a custom test application (which along with several other of our custom testing tools is available from http://www.grouplogic.com/files/ez/testingtools/). We don't directly use Word for that particular test but my Quality Assurance team does do manual testing with Word and the Macintosh Business Unit has copies of ExtremeZ-IP in their labs.

Unfortunately this issue appears to be timing dependent and we never saw it in our test lab on our server. After we started receiving reports from the field I wrote an Applescript that opened a Word document added some text, saved the document and then closed it. On our main QA test server they could run this hundreds of times without any problems.

Based upon the reports the issue seemed to be occurring primarily on a specific vendor's hardware with a specific RAID controller, we went ahead and purchased one of those servers. With that server we could reproduce the issue with only a few repetitions of the test script. Later we were able to determine on low-end commodity hardware or in virtual machines it can often be reproduced with only one or two saves. Once we were able to consistently reproduce the issue we opened a bug with Microsoft and eventually they were able to determine what the problem was.

I assure you that we are putting pressure on various parties and providing impact statements and the like. It isn't as simple as Microsoft changing one line of code in ntfs.sys and then shipping it. The amount of testing that is required to ship even a single component hotfix particularly one in a filesystem driver is much more than you would think. There is only so much I can say since we are on under NDA with MS but based upon public information one can come to the conclusion that the MS kernel development team might have other priorities right now:
http://blogs.technet.com/windowsserver/archive/2008/12/02/windows-server-2008-and-windows-vista-service-pack-2-sp2-customer-preview-program-cpp.aspx

     ...We are tracking to ship SP2 in the first half of 2009. We
      value your feedback, so please download the SP2 beta!
     
      Justin Graham Senior PM Windows Server

Believe me, I don't want my Support team having to take frantic calls from System Administrators if there is any way to avoid it. Everyone from the CEO down is aware of the issue and trying to figure out a way to mitigate the pain our customers are experiencing with this issue.  We are also actively pursuing workarounds. As I said it seems to be much more frequent on some hardware. If it is something like the disks being too slow for the CPU or vice versa we might be able to add delays somewhere to avoid the bug. Or we may be able to work around it by copying and deleting files instead of doing renames. All of this takes time to figure out and more importantly to test and could have performance repercussions. I promise you though we will keep working on it.
Hi

Thank you all for the information. Of course, a hotfix isn't done in a day.

I'm fully aware of that my problem is rather unusual. I've been a little unlucky.

The Mac people don't save Word documents on the server no more, so we get around. If it would have been a big problem, we would have downgraded to W2K3.

Working as an admin you must be able to work around the problem, I could have done that but since its not really longer a bug problem I want to solve this.

I will look into the SP2 beta.

Have a nice weekend!
ASKER CERTIFIED SOLUTION
Avatar of Geordie_Korper
Geordie_Korper

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thats superb news, Grader have you been able to test this yet?

/Fox
I've done the changes from Grouplogic.

I'm waiting for a moment to test this, hopefully this afternoon.

We'll see... :-)

Hi

We tested this today and the problem remains. I'm not sure if it's exactly the same problem but BSOD remains. (wasn't arounbd when they tested it)

Opening a Word document on the ExtremeZ-IP volume and saving it causes a BSOD.

Memory dump here: http://85.227.21.8/memory.rar

//Marcus
You are the first person to report that the problem was not fixed for them when they upgraded. I think you have already done so but if not, please contact our Support department and they will try to figure out if you are having some other issue.
I misread the instructions from Grouplogic and edited the registry key (which then bypasses the fix from Grouplogic)

Now it works!!!

I'm not trying to be a drama queen, I'll blame it on the present workload... :-)

Many thanx to Fox and Charles Kim from Grouplogic, this was an interesting journey.

Ok to split points to you guys? (if I can't give you both 500)

//Marcus
Great work from Charles Kim at Grouplogic, he handled this case like a true professional. I can't say enough nice word about that guy.

I would also like to thank all other that helped me to solve this case, especially Fox.