Avatar of Edman555
Edman555

asked on 

Exchange 2010 - Disk IO errors causing database issues on both servers in DAG

We are in the process of migrating exchagne 2003 to 2010.  We purchased two new IBM x3650 m3 servers, installed Win Server 2008r2 enterprise, installed exchange 2010, configured DAG.  Everything is happy, the new servers are even sending out e-mail.
Added integration with Cisco Unity 7, everything still working fine.
I started migrating a handful of users, and saw the the process stopped halfway though a user.  I checked and the secondary database says failed and suspended.  Right click on it and say resume, and it re-synches.  Users again start moving from the old 2003 to the new 2010 servers.  This happens 3 or 4 times in the process of moving 10 users.  And then I notice that it's happening throughout the day as well, sometimes to the active database, and then it fails over to the backup.

IN event logs, I'm getting a ton of IO errors.  I open a case with IBM, they check it over and say there is no hardware issue.  I'm inclined to believe them because i'm getting errors on both servers.  How can this be?  These are some of the errors I"m getting in the application event log.

Log Name:      Application
Source:        MSExchange Mailbox Replication
Date:          5/16/2011 3:32:13 PM
Event ID:      1007
Task Category: Service
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      PGTEXCHANGE1.pgt.com
Description:
The Mailbox Replication service was unable to determine the set of active mailbox databases on a mailbox server.
Mailbox server: PGTEXCHANGE1.pgt.com
Error: MapiExceptionNetworkError: Unable to make admin interface connection to server. (hr=0x80040115, ec=-2147221227)
Diagnostic context:
    ......
    Lid: 10648   dwParam: 0x6D9      Msg: EEInfo: Generating component: 2
    Lid: 14744   dwParam: 0x6D9      Msg: EEInfo: Status: 1753
    Lid: 9624    dwParam: 0x6D9      Msg: EEInfo: Detection location: 501
    Lid: 13720   dwParam: 0x6D9      Msg: EEInfo: Flags: 0
    Lid: 11672   dwParam: 0x6D9      Msg: EEInfo: NumberOfParameters: 4
    Lid: 8856    dwParam: 0x6D9      Msg: EEInfo: prm[0]: Unicode string: ncacn_ip_tcp
    Lid: 8856    dwParam: 0x6D9      Msg: EEInfo: prm[1]: Unicode string: PGTEXCHANGE1.pgt.com
    Lid: 12952   dwParam: 0x6D9      Msg: EEInfo: prm[2]: Long val: -1988875570
    Lid: 12952   dwParam: 0x6D9      Msg: EEInfo: prm[3]: Long val: 382312662
    Lid: 24060   StoreEc: 0x80040115
    Lid: 23746  
    Lid: 31938   StoreEc: 0x80040115
    Lid: 19650  
    Lid: 27842   StoreEc: 0x80040115
    Lid: 20866  
    Lid: 29058   StoreEc: 0x80040115
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="MSExchange Mailbox Replication" />
    <EventID Qualifiers="32772">1007</EventID>
    <Level>3</Level>
    <Task>1</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2011-05-16T19:32:13.000000000Z" />
    <EventRecordID>2987756</EventRecordID>
    <Channel>Application</Channel>
    <Computer>PGTEXCHANGE1.pgt.com</Computer>
    <Security />
  </System>
  <EventData>
    <Data>PGTEXCHANGE1.pgt.com</Data>
    <Data>MapiExceptionNetworkError: Unable to make admin interface connection to server. (hr=0x80040115, ec=-2147221227)
Diagnostic context:
    ......
    Lid: 10648   dwParam: 0x6D9      Msg: EEInfo: Generating component: 2
    Lid: 14744   dwParam: 0x6D9      Msg: EEInfo: Status: 1753
    Lid: 9624    dwParam: 0x6D9      Msg: EEInfo: Detection location: 501
    Lid: 13720   dwParam: 0x6D9      Msg: EEInfo: Flags: 0
    Lid: 11672   dwParam: 0x6D9      Msg: EEInfo: NumberOfParameters: 4
    Lid: 8856    dwParam: 0x6D9      Msg: EEInfo: prm[0]: Unicode string: ncacn_ip_tcp
    Lid: 8856    dwParam: 0x6D9      Msg: EEInfo: prm[1]: Unicode string: PGTEXCHANGE1.pgt.com
    Lid: 12952   dwParam: 0x6D9      Msg: EEInfo: prm[2]: Long val: -1988875570
    Lid: 12952   dwParam: 0x6D9      Msg: EEInfo: prm[3]: Long val: 382312662
    Lid: 24060   StoreEc: 0x80040115
    Lid: 23746  
    Lid: 31938   StoreEc: 0x80040115
    Lid: 19650  
    Lid: 27842   StoreEc: 0x80040115
    Lid: 20866  
    Lid: 29058   StoreEc: 0x80040115</Data>
  </EventData>
</Event>


Log Name:      Application
Source:        MSExchangeRepl
Date:          5/16/2011 3:02:40 PM
Event ID:      4082
Task Category: Service
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PGTEXCHANGE1.pgt.com
Description:
The replication network manager encountered an error while monitoring events. Error: Microsoft.Exchange.Cluster.Replay.AmClusterApiException: An Active Manager operation failed. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"OpenCluster(PGTEXCHANGE1.pgt.com) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"' failed.. ---> System.ComponentModel.Win32Exception: There are no more endpoints available from the endpoint mapper
   --- End of inner exception stack trace ---
   at Microsoft.Exchange.Cluster.Replay.NetworkManager.DriveMapRefresh()
   at Microsoft.Exchange.Cluster.Replay.NetworkManager.TryDriveMapRefresh()
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="MSExchangeRepl" />
    <EventID Qualifiers="49156">4082</EventID>
    <Level>2</Level>
    <Task>1</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2011-05-16T19:02:40.000000000Z" />
    <EventRecordID>2987707</EventRecordID>
    <Channel>Application</Channel>
    <Computer>PGTEXCHANGE1.pgt.com</Computer>
    <Security />
  </System>
  <EventData>
    <Data>Microsoft.Exchange.Cluster.Replay.AmClusterApiException: An Active Manager operation failed. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"OpenCluster(PGTEXCHANGE1.pgt.com) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"' failed.. ---&gt; System.ComponentModel.Win32Exception: There are no more endpoints available from the endpoint mapper
   --- End of inner exception stack trace ---
   at Microsoft.Exchange.Cluster.Replay.NetworkManager.DriveMapRefresh()
   at Microsoft.Exchange.Cluster.Replay.NetworkManager.TryDriveMapRefresh()</Data>
  </EventData>
</Event>


Log Name:      Application
Source:        Application Error
Date:          5/16/2011 3:02:36 PM
Event ID:      1005
Task Category: (100)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PGTEXCHANGE1.pgt.com
Description:
Windows cannot access the file C:\Windows\Cluster\clussvc.exe for one of the following reasons: there is a problem with the network connection, the disk that the file is stored on, or the storage drivers installed on this computer; or the disk is missing. Windows closed the program Microsoft Failover Cluster Service because of this error.

Program: Microsoft Failover Cluster Service
File: C:\Windows\Cluster\clussvc.exe

The error value is listed in the Additional Data section.
User Action
1. Open the file again. This situation might be a temporary problem that corrects itself when the program runs again.
2. If the file still cannot be accessed and
      - It is on the network, your network administrator should verify that there is not a problem with the network and that the server can be contacted.
      - It is on a removable disk, for example, a floppy disk or CD-ROM, verify that the disk is fully inserted into the computer.
3. Check and repair the file system by running CHKDSK. To run CHKDSK, click Start, click Run, type CMD, and then click OK. At the command prompt, type CHKDSK /F, and then press ENTER.
4. If the problem persists, restore the file from a backup copy.
5. Determine whether other files on the same disk can be opened. If not, the disk might be damaged. If it is a hard disk, contact your administrator or computer hardware vendor for further assistance.

Additional Data
Error value: C0000185
Disk type: 3
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Application Error" />
    <EventID Qualifiers="49152">1005</EventID>
    <Level>2</Level>
    <Task>100</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2011-05-16T19:02:36.000000000Z" />
    <EventRecordID>2987696</EventRecordID>
    <Channel>Application</Channel>
    <Computer>PGTEXCHANGE1.pgt.com</Computer>
    <Security />
  </System>
  <EventData>
    <Data>C:\Windows\Cluster\clussvc.exe</Data>
    <Data>Microsoft Failover Cluster Service</Data>
    <Data>C0000185</Data>
    <Data>3</Data>
  </EventData>
</Event>

Log Name:      Application
Source:        ESE
Date:          5/16/2011 10:33:07 AM
Event ID:      482
Task Category: General
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PGTEXCHANGE1.pgt.com
Description:
edgetransport (804) Transport Mail Database: An attempt to write to the file "C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\data\Queue\trn.log" at offset 3878400 (0x00000000003b2e00) for 512 (0x00000200) bytes failed after 0 seconds with system error 1117 (0x0000045d): "The request could not be performed because of an I/O device error. ".  The write operation will fail with error -1022 (0xfffffc02).  If this error persists then the file may be damaged and may need to be restored from a previous backup.

For more information, click http://www.microsoft.com/contentredirect.asp.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="ESE" />
    <EventID Qualifiers="0">482</EventID>
    <Level>2</Level>
    <Task>1</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2011-05-16T14:33:07.000000000Z" />
    <EventRecordID>2987335</EventRecordID>
    <Channel>Application</Channel>
    <Computer>PGTEXCHANGE1.pgt.com</Computer>
    <Security />
  </System>
  <EventData>
    <Data>edgetransport</Data>
    <Data>804</Data>
    <Data>Transport Mail Database: </Data>
    <Data>C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\data\Queue\trn.log</Data>
    <Data>3878400 (0x00000000003b2e00)</Data>
    <Data>512 (0x00000200)</Data>
    <Data>-1022 (0xfffffc02)</Data>
    <Data>1117 (0x0000045d)</Data>
    <Data>The request could not be performed because of an I/O device error. </Data>
    <Data>0</Data>
  </EventData>
</Event>
ExchangeServer Hardware

Avatar of undefined
Last Comment
amitwani

8/22/2022 - Mon