Link to home
Start Free TrialLog in
Avatar of Mike R.
Mike R.

asked on

Sudden, dramatic performance drops with Glusterfs

I'm new to Glusterfs in general. We have chosen to use it as our distributed file system on a new set of HA file servers.

The setup is:
  • 2 SUPERMICRO SuperStorage Server 6049PE1CR36L with 24-4TB spinning disks and NVMe for cache and slog.
  • HBA not RAID card
  • Ubuntu 18.04 server (on both systems)
  • ZFS filestorage
  • Glusterfs 5.10

Step one was to install Ubuntu, ZFS, and gluster. This all went without issue.
We have 3 ZFS raidz2 identical on both servers
We have three glusterfs mirrored volumes - 1 attached to each raidz on each server. I.e.

And mounted the gluster volumes as (for example) "/glusterfs/homes -> /zpool/homes". I.e.
gluster volume create homes replica 2 transport tcp server1:/zpool-homes/homes server2:/zpool-homes/homes force
(on server1) server1:/homes     44729413504 16032705152 28696708352  36% /glusterfs/homes


The problem is, the performance has deteriorated terribly.

We needed to copy all of our data from the old server to the new glusterfs volumes (appx. 60TB).
We decided to do this with multiple rsync commands (like 400 simultanous rsyncs)
The copy went well for the first 4 days, with an average across all rsyncs of  150-200 MBytes per second.
Then, suddenly, on the fourth day, it dropped to about 50 MBytes/s.
Then, by the end of the day, down to ~5MBytes/s (five).
I've stopped the rsyncs, and I can still copy an individual file across to the glusterfs shared directory at 100MB/s.
But actions such as "ls -la" or "find" take forever!

Are there obvious flaws in my setup to correct?How can I better troubleshoot this?
Avatar of David Favor
David Favor
Flag of United States of America image

Start with GlusterFS + ext4, to see if all performance problems magically resolve.

The GlusterFS code works very well.

The ZFS code... well... I avoid ZFS personally, because in long term testing ZFS seems to loose it's mind + just stalls sometimes, then even more ZFS becomes glitchy.

During stalls performance zeros out. During glitchiness performance drops to close to zero.

Likely the problem you're seeing relates to ZFS, rather than GlusterFS.
Avatar of Mike R.
Mike R.

ASKER

@David Favor

Thanks for the comment. This is the second time I've heard this.

Although...

Right now I have a ZFS datastore included in a gluster mirror, and mounted to glusterfs

I.e.
ZFS Datastore = /zpool/homes (identical on two servers)
Gluster volume /homes as a mirrored pair with server1:/zpool/homes and server2:/zpool/homes as bricks in the volume
server1:/zpool/homes mounted to server1:/glusterfs/homes

Two observations:

1. If I do a find /glusterfs/homes/user and a find /zpool/homes/user, the zpool one finishes in about a minute, the gluster one takes 3 hours :O

2. AFTER I do a find /glusterfs/homes/user , if I do ANOTHER find /glusterfs/homes/user ... it speeds up about 20 times :)

This makes me think it has something to do (perhaps) with the glusterfs file indexing?
ASKER CERTIFIED SOLUTION
Avatar of David Favor
David Favor
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Note: You're above detail just means you're connecting to a ZFS instance which might be working at the moment + will likely go south (impossible to debug performance problems) sometime in the future.

If this is a real/money project (pays your bills), likely best to use ext4.
a few possibilities :

- i'm unsure how you use rsync, but rsyncing periodically the same directory with default options would mechanically produce a drastic fall of the performance : the more data to sync, the more time lost in checksum calculations.

- parallel writings to drives require adequate disk setups : raidz is not your best bet in that respect. raid10 would be much more adequate

- your NVMe drives must be at least partially allocated to the ZIL (write cache). not juste the ARC read cache. i'd recomment partitionning them with a small partition on each drive for the write cache and a bigger for reads. then provide 2 raid 0 arrays respectively for the read and write cache.

- ZFS works MUCH better using entire drives on BSD or Solaris. FreeBSD would be the most adequate. the linux port is barely past childhood.

- your first observation seems to incriminate glusterfs. but not necessarily. is the find slow on gluster even AFTER you do the find on the local ZFS ? it seems possible your gluster find was fast because ZFS was slow, but if you tested ZFS after, you have the data loaded in the cache.