Best solution to sync HUGE QTY of files across slow WAN?

Best solution to sync HUGE QTY of files across slow WAN?

Hello creative experts!

I have 1.7 million files (4 TB) in one of our remote offices.  

We presently use Vice-Versa to replicate the remote office data to headquarters.

Vice-versa is installed on a server here in HQ.

The initial "comparing source vs. destination" part of the run takes 21 hours over our 50 mbps wan connection.  

The actual file copy of changed files typically takes about 3 hours.

I'm guessing it's soooo slow because it's having a chatty conversation across a slow WAN connection to determine which files got added and deleted.  [This is a total guess as I don't know how the software is written]

What's the best solution?

Is there a solution that perhaps has agent software running on the other side such that each side determines local changes and THEN compares notes?

Is there a solution that works something like OneDrive?  (For example: there's no day long process evaluating local vs cloud before figuring out what to sync.  I presume if you delete a local folder, the agent gives the path of what to delete to the cloud and it's deleted.  Likewise, if a file gets added locally, just that file gets uploaded).

As I type this: I wonder if there's a microsoft solution which could leverage our E3 Office 365 subscription for our 400 users?

We started getting quotes for cloud backup and it was surprisingly expensive: maybe $90,000 per year for 8TB  - that's super approximate but gives me an order of magnitude.

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

David FavorLinux/LXD/WordPress/Hosting SavantCommented:
Seems like the problem relates to Vice-Versa, which... fails to implement sensible file comparison algorithms.

For example, rsync (standard everywhere) checks things like file stamps first + only syncs files which have changed... then rsync only syncs parts of files which have changed, rather than the entire file.

Walking a directory of 1.7M files should be fairly quick. No more than a few minutes.

As an experiment, install one of the many rsync ports for Windows + test time required to do your file sync.
Plus one for rsync - it would always be my first choice 'go to' for file replication / sync.

David FavorLinux/LXD/WordPress/Hosting SavantCommented:
Just did a couple of checks on various machines, related to file scan time.

Linux machine - 14 minutes to scan 12,441,746.

Recent iMac - 2 minutes to scan 1,738,534 files.

So fast file comparison of timestamps should be very fast for any OS... even Windows...

If you have small files, sometimes using the rsync whole file option actually saves time, because only the last modified time is checked, then the entire file synced.

Your mention of 50 mbps wan connection suggests a solution. This slow a speed suggests you're running on premise storage, where running remote storage will be better.

OVH Storage Server Pricing is very inexpensive. $100/month USD for 18TB storage or $200/month USD for 60TB RAID storage, from a quick scan of their pricing.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Your Guide to Achieving IT Business Success

The IT Service Excellence Tool Kit has best practices to keep your clients happy and business booming. Inside, you’ll find everything you need to increase client satisfaction and retention, become more competitive, and increase your overall success.

David FavorLinux/LXD/WordPress/Hosting SavantCommented:
rsync - Unsung hero of the Internet.
mike2401Author Commented:
Thanks David.

Yes it's from a remote office to HQ over our VPLS WAN (50mbps).

Were those impressive times you mentioned rsync across the internet or WAN?  (or local pc to external USB3 hard drive)?

The pricing you mentioned is dirt cheap.  Though I never heard of OVH,  I'll have to check out raw storage from some more familiar names like Amazon :-)

Thanks so much!
mike2401Author Commented:
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.