Avatar of marrowyung
marrowyung
 asked on

optimize PostgreSQL pg_dumpall speed

hi,


I have create a ticket :


https://www.experts-exchange.com/questions/29228175/pg-upgrade-from-PostgreSQL-9-3-6-to-11-14.html#a43359764


and this topic is related to that.


if I use pg_dumpall to backup and restore DB from 9.3.6 to 11.14:


pg_dumpall -U postgres -p 5432 --jobs=8 > <output file.sql>


it will say :

pg_dumpall: illegal option -- jobs=8
Try "pg_dumpall --help" for more information.

Open in new window


for Pg_dump, we can optimize the backup progress by adding --jobs=N, can how we do the same thign for pg_dumpall?




PostgreSQL

Avatar of undefined
Last Comment
David Favor

8/22/2022 - Mon
ASKER CERTIFIED SOLUTION
lcohan

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
SOLUTION
David Favor

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
lcohan

Just be aware of differences between the two:
"pg_dumpall also dumps global objects that are common to all databases. (pg_dump does not save these objects.) This currently includes information about database users and groups, tablespaces, and properties such as access permissions that apply to databases as a whole. "
all from here: https://www.postgresql.org/docs/9.4/app-pg-dumpall.html  
marrowyung

ASKER
David  ,

This is impossible, as pg_dumpall has no --jobs option.

this is what I am wondering about, you can see from the message above.

Then run pg_dump against each database, as pg_dump does support the --jobs option. 

if we only have 1 x database, the --jobs can't help much in this case ?'

pg_dump only backup one DB at a time ?

and we are saying no way to speed up pg_dumpall  ?

marrowyung

ASKER
hi,

anyone tried the pg_dumpall in v13? and I wonder if use that one to dump from 9.3.6 and restore to 11.14 is much faster?
pg_dumpall from v13 bin folder much better?

Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
marrowyung

ASKER
one thing I can say for now is, pg_dumpall by V13 seems much faster and we are generating a 100GB and try it out later and compare with pg_dumpall by V11

can't tune like pg_dump anyway...
David Favor

All backup utilities tend to run slow, as they backup all tables, one by one, sequentially.

The speedup I'm working on with mysqldump, which applies to all database dumpers, is to parallelize backups.

1) Run some sort of queuing system.

2) Where queuing system runs 1x process/core, limited by number of cores specified for job or max queues.

3) For me, I run queue workers at nice -19 to unload system.

4) Now pull data showing table row count + reverse sort in descending order, largest to smallest.

5) Then spawn off 1x worker per table, so largest tables start backing up first.

The testing I've done with this approach suggests backups speed up dramatically.
marrowyung

ASKER
but it seems pg_dumpall can't be parallelize right ?
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
marrowyung

ASKER
tks both.

David Favor

You're welcome!

Hang in there!