Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


Domino 8.5.3 running on Dell servers attached to RDM IBM SAN with disk problems

Posted on 2013-01-10
Medium Priority
Last Modified: 2016-11-23
Our problem is that our domino databases keep becoming corrupt.  Not all of them, just a few at a time.  Domino will report that it cannot read the database, and then in the filesystem we can see that the DB is 0 bytes.

This all started about 3 weeks ago, and the first corrupt DB showed up after we upgrade to the latest version of BE from 12.5   We've worked a lot with Symantec and haven't gotten anywhere, we've removed the remote agent from the Domino server and still corruption occurs.

We were running Domino Defrag but we've since disabled that.

We've sent SAN diagnostics to IBM and they didn't say anything was wrong.

The domino server is running on Server 2008 R2 vm with the operating system on a C:, the D: is a Raw Device Mapping to another dedicated SAN array.  

This was working perfectly for 6 months or so in this setup without a hitch...

We are perplexed as to what can cause this, and we aren't getting anywhere with our vendors.

I'm looking for advice on what to investigate and whether anyone else is running Domino in this way successfully.

FYI, last reboot of the virtual domino server triggered Checkdisk and it said it found a lot of empty space that was marked as allocated...I'm guessing those are the DBs dissappearing.
Question by:ITDharam
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 46

Expert Comment

by:Sjef Bosman
ID: 38763760
What types of databases are corrupted? Is it always the same database? Is it only mail databases, or also application databases?

Could it be that someone used Notes to access the databases directly, i.e. bypassing the server?

Author Comment

ID: 38764255
We haven't been able to find a pattern to the corruption, and I should clarify that it isn't just NSF files, some NTF files,, and full text indices also become corrupt.

It isn't the same databases, although it has happened that 1 database becomes corrupted again.

For now we've just been deleting the file remnant, and then pulling down a fresh replica from another cluster member and it doesn't appear that we've ever lost anything at this point.

We have never been in the habit of opening files on the server files system through anything other than the Domino administrator, so no, I think it is safe to say that nobody is opening the files directly.

One oddity, and I haven't been able to get a 100% satisfactory answer whether this is a problem or not.  Server1 used to be a physical server with the D: drive consisting of a dedicated RAID 10 array on our IBM MD3400 SAN.  We decided to virtualize the mail server so we setup a new OS install with Domino.  We just repointed the LUN to our VMWare cluster and attached the disk as an RDM to that VM.  It worked fine for several months but then Symantec pointed this out, but didn't not say it is actually a problem, in fact, they said that it is not a problem for Symantec BE.  OK, I'm getting to the point here, the actual thing they pointed out was that the Domino server LUN was/is visible to our Backup Exec server, it shows up in disk management, but isn't initialized and doesn't have a drive letter assigned.  The reason for this is that in a SAN, BE, VMWare environment, the BE server is supposed to have access to the VMWare LUNs for SAN backups.  Because of the grouping limitations, BE also sees the Domino LUN...I'm just throwing this out there because we're pretty much at a loss.

Thanks for the response.
LVL 46

Expert Comment

by:Sjef Bosman
ID: 38764768
Hope you find a working solution!
Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why


Author Comment

ID: 38765273
Ha ha, thanks for the help.  I knew it was a long shot.  I'll keep this open for a bit and see if anyone else responds otherwise you get the points for taking a moment out of your day.
LVL 46

Expert Comment

by:Sjef Bosman
ID: 38765313
Hehe :-)  Can you move back some steps? They all say that you're not supposed to do what you did, but apparently it worked for you. I can imagine you're reluctant to downgrade from 12.5, but it could be a good test in order to prove it is or it isn't the BE version you now use. A lot of work, you say... Yep, sorry...
LVL 15

Assisted Solution

akhafaf earned 600 total points
ID: 38767161
Hi there  ITDharam,,,

You Mentioned ,,,,
>>>Domino will report that it cannot read the database, and then in the filesystem we can see that the DB is 0 bytes<<<  let me ask the following
- Does this problem take problem on daily bases or just sometime ?? if it does ,, Does it take place on a certain time of the day e.g. at 10:00 AM every day??
- What do you get on the log files of domino when you attempt to access this currupted databases ??
-  Did you run the maintainance commands ( fixup , updall and compact ) on these databases??? Do you have these commans scheduled .. ( On the confiuration tab go to programs and configure them ) then check what happens.

Best Wishes

Author Comment

ID: 38779021
Gentlement, sorry for the delay, I have multiple clients and I don't get to visit this problem daily.

sjef_bosman, I'd appreciate if you could expand on the part where you say "They all say you aren't supposed to do what you did..." I'm sure that applies to a lot of things I've done, can you clarify what you're referring to though?  Also, we removed the BE agent from this particular domino server, we're now backing up from one of our other domino servers, but this problem continues so I don't think it is a Symantec problem...

akhafaf, here are your answers:
The problem does not occur daily, this first started 1 month ago, and since then we'll find DBs unavailable on a somewhat random basis, we may go up to 3 days without any files becoming corrupt, and then we might have 8 in a day.  We've also noticed that the corruption will happen on the same file multiple times and so far hasn't touched many others.

We'll get a series of messages
01/13/2013 03:19:20 PM  Warning: Fixup purged corrupt document UNID (534F80F6:425C50A3:87257A41:007D09EA) from D:\Lotus\Domino\data\mail\jgunn.nsf
01/13/2013 03:19:20 PM  Document NT001B92E6 in database D:\Lotus\Domino\data\mail\jgunn.nsf is damaged: This database cannot be read due to an invalid on disk structure
01/13/2013 03:19:20 PM  Document (UNID OF534F80F6:425C50A3-ON87257A41:007D09EA) in database D:\Lotus\Domino\data\mail\jgunn.nsf has been deleted

And another message that says "cannot allocate space"

At this point the file will show as 0Kb and these messages will repeat if you attempt to open it.

Compact was scheduled daily, and Defrag was set as a scheduled program but was disabled.  The primary domino admin believes Defrag was making the problem worse.

I've been told that in some cases running fixup -f will fix access but only at a certain stage and I don't have details on that, I'm told this tends to be a temporary fix.

What we end up having to do is dbcache flush, delete the file from domino administrator, and then we create an accelerated replica and we're back up and running.

Here is something else interesting, we're getting found.000, found.001 folders in the root of the data drive.  I always thought this was associate with disk corruption so we contacted IBM and we sent them the logs of our SAN and they said that while they found one disk that was reporting 'predictive failure warnings', they didn't see anything that could case problems.  We replaced the drive anyways and still encountered problems.  We sent the logs again and this time they say one of the HBAs in the server isn't connected, we determined that the HBA was bad and that it was only connected via one SAN path, which happens to be its default path so I'm not sure that the HBA was failing on an active connection, if that happened I'd imagine a few corrupt DBs would be a blessing.

Thanks for the response, I hope to hear back on some brilliant ideas!
LVL 46

Assisted Solution

by:Sjef Bosman
Sjef Bosman earned 700 total points
ID: 38779421
About the non-standard thing: that's what I read in your last paragraph here, the way you virtualised Domino. Hence my remark: if it worked, and you someone advised you to modify your configuration, you could may revert to an earlier configuration that worked.

Are you backing up from Domino server B, and the file corruption occurs on A? Did you remove Symantec from server A, including the Extension Manager inserts in notes.ini ?

Author Comment

ID: 38779480
Well, I wasn't specifically told it was a problem.  I asked Symantec and they said that it is specifically NOT a problem for BE.  I haven't found anything worthwhile, besides my intuition, that says this is a problem.

The setup was working for approx 6 months with no errors in this configuration, and then we updated from 12 to 12.5 and 2 days later this corruption started.  We've removed BE Remote agent, and I've just confirmed there is nothing in the notes.ini referring to Symantec or Extension Manager so in effect we have reverted to a prior configuration that was known to be working.

I spoke with the Domino admin and she is saying that as this problem continues, it is becoming apparent that the corruption does occur on the same DBs, however, there is not apparent reason or connection between those DBs.  And it is still the case that one DB will corrupt, and then 5 more will corrupt the next day type of thing...


Accepted Solution

Andrew_Luder earned 700 total points
ID: 38797501
This may help. I found the Windows 2008 Microsoft hotfix discussed below fixed my Domino 8.5.3 "cannot allocate space" and "insufficient memory" issues  with large and/or heavily fragmented databases.

DominoDefrag - News: How fragmentation on incorrectly formatted NTFS volumes affects Domino!!Projects%5Cpmt.nsf&documentId=027517F9D756864D86257A670069EC1E&action=openDocument

Also make sure Domino data and temp directories are excluded from any Windows file level anti-virus scanning (which you've covered I think)

Author Closing Comment

ID: 38803582
Andrew_Luder, thanks for that, this may be the most relevant direction to take, but we'll never know.

After nearly a month of fairly regular corruptions I did go ahead and rebuild the server from scratch and the corruption hasn't happened since (about 4 days now but we're pretty hopeful)

I took the RDM volume from the original VM, and just wiped it and formatted it with VMFS and built the new VM on that same storage (10 disks in RAID 10 array), so the other suggestion that allocating an RDM to a virtual machine while the BE server can still see that LUN is the cause of this problem is high on my list as well.

Thanks for the help gentlemen, until next time...

Featured Post

Back Up Your Microsoft Windows Server®

Back up all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Sometimes clients can lose connectivity with the Lotus Notes Domino Server, but there's not always an obvious answer as to why it happens.   Read this article to follow one of the first experiences I had with Lotus Notes on a client's machine, my…
Resolving an irritating Remote Desktop connection that stops your saved credentials from being used.
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…
This Micro Tutorial steps you through the configuration steps to configure your ESXi host Management Network settings and test the management network, ensure the host is recognized by the DNS Server, configure a new password, and the troubleshooting…

670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question