Mongo restore fails on large collection's index creation

I have a production mongo 2.6 DB running across 9 Linux instances in AWS. It is replicated and sharded across 2 regions, and all disks are in RAID10. I'm working to restore the prod DB into a staging environment that has replication turned on but sharding not configured yet. The dump is 103GB. 91GB of which resides in a single collection.

Attempting to restore the entire DB in stage fails reliably when the restore reaches the creation of indexes on the large collection. The error is not very informative, and unfortunately this is all I get even with log verbosity turned up:

Socket recv() conn closed? 
SocketException: remote: error: 9001 socket exception [CLOSED] server [] 
DBClientCursor::init call() failed 
User Assertion: 10276:DBClientBase::findN: transport error: ns: c_knowledgebase.$cmd query: { getlasterror: 1 } assertion: 10276 DBClientBase::findN: transport error: ns: c_knowledgebase.$cmd query: { getlasterror: 1 }

Open in new window

The nature of the error made me suspect the restore exhausted a resource. Therefore in order to work around the issue I tried various permutations of the following:

1. using beefy EC2 instances on SSDs
2. review common mongo Linux tuning suggestions (overcommit, and ulimit settings)
3. restoring the DB in a phased approach (restoring the large collection as a separate step and making the indexes manually one at a time)

If I do #1 and #2 together I'm able to get the entire DB to restore. This doesn't solve my problem, however, as I need to understand what the root problem is, and more importantly I need to have a consistent DB restore procedure that is running on 1:1 hardware that prod is on. I'm hoping someone can offer some insight into why a restore fails under these conditions.
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Dan CraciunIT ConsultantCommented:
From my 8 months experience with MongoDB 2.6, I can tell you that it's a bit like a black box: when it works it's OK, when it does not, you usually have no idea why.

The people behind MongoDB make their living by selling support subscriptions, so the error messages are not very informative.

You could try the user mailing list, but don't get your hopes up too high.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
machrisodAuthor Commented:
Thanks for the reply. You're using 2.6 as well? Any reason you haven't gone to 3.0 yet? Part of the reason I need good backup strategy is so that I can lift the hardware and then the version to 3.0. I'm experienced with other DBs, but relatively new to Mongo.
Dan CraciunIT ConsultantCommented:
I've monitored the user list and WiredTiger seems to have problems. Still too new for production.

I've tried on a test system and you can do an in-place upgrade to 3. But I would not rush into it. If you don't want to switch to WiredTiger I see no reason for upgrade.

My impression is that after losing Viber the management pushed the developers to make a general DBMS out of Mongo. The result was backwards incompatibility with 2.4, somewhat chaotic driver versions, at least on C and C++ and lack of developer presence on both the list and the IRC channel.

The only large web app that I know that runs on MongoDB is running on 2.4 with express interdiction on upgrades.

My opinion is that if you run MongoDB for any mission-critical app you need to pay for support. Community support is lacking at best.
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

machrisodAuthor Commented:
I see what you're saying. I've heard others warn about the mongo support model as well. I'm finding community support to be a little lacking too. For at least my current problem I was able to find a work around in preseeding the modes with a filesystem copy of the DB and then running the restore on top of that DB for all collections accept for the big one. Luckily the big collection changes so infrequently that leaving that as restored from filesystem is OK. At any rate I think the backup strategy I've inherited is wrong in general. Dump and restores in mongo are no different than relational DB land - they are great for portability, but are horrible for large DBs. Dump and restore might be part of the overall strategy going forward, but it's simply not going to cut it given the size we've grown to.

Thanks again for the reply.
Dan CraciunIT ConsultantCommented:
Again, the backup problem is the intended behavior.
If you need more robust backup solution, then pay for support and you get access to Cloud Manager Backup with all the bells and whistles.
Or get Ops Manager (with an Enterprise Advanced subscription) and run your own backup server.

MongoDB ended the enthusiastic phase and entered the "make money" phase. We'll see if they manage to make it an enterprise tool or if the project will slowly die.
Dan CraciunIT ConsultantCommented:
To see the direction MongoDB is going, see this:

Looks like any new and useful feature will be enterprise only...
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Storage Software

From novice to tech pro — start learning today.