Link to home
Start Free TrialLog in
Avatar of rstaveley
rstaveleyFlag for United Kingdom of Great Britain and Northern Ireland

asked on

A never vacuumed PostgreSQL 7.0.3 database getting sluggish

I have a ~5 year old  PostgreSQL 7.0.3 installation, which I don't believe has ever had the tender lover care of a database administrator, and yet it is reasonably complex. Performance has been reported to have deteriorated over the years, partly no doubt because if the increase in size of the data set.

Before I get into looking how the schema might be rationalised by (say) spooling data off non-current data into archives, I want to look into ways to improve performance, while leaving things intact. The database can afford to have some downtime over night say.

I looked briefly at taking a dump from the database and importing it into PostgreSQL 8, but there were oodles of errors thrown up, which I'd need to pick through, so I wanted to see if I could improve matters with the existing 7.0.3.

Here are my questions:

(1) Am I reasonably safe vacuuming it now, or would I be wiser to dump and regenerate it from scratch?
(2) Am I reasonably safe setting up a daily vacuum on cron thereafter?
(3) Is there anything else I ought to consider?
SOLUTION
Avatar of F. Dominicus
F. Dominicus
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of rstaveley

ASKER

Thanks, fridom and gheist. I have an 8.0.7 installation that I'm tring to reinstate the dump onto now.

> Please post select version(); just for statistical purposes.

My 7.0.3 installation is: "PostgreSQL 7.0.3 on i686-pc-linux-gnu, compiled by gcc 2.96". It is on Red Hat Linux release 7.3 (Valhalla) (with neglected up2date)

My 8.0.7 installation is: "PostgreSQL 8.0.7 on x86_64-redhat-linux-gnu, compiled by GCC x86_64-redhat-linux-gcc (GCC) 4.0.2 20051125 (Red Hat 4.0.2-8)". It is on Fedora Core release 4 (Stentz).

Reinstating hasn't been as bad as I initially feared. I fixed a bunch of user errors (changed a bunch of \connect statments to get users right) and handled some invalid UNICODE byte sequences by making the encoding of the database latin1 to match that of the original database.

I'm now left with the following:

(A) Errors about carriage returns
      ERROR:  literal carriage return found in data
      HINT:  Use "\r" to represent carriage return.
  - Do you reckon I need to do these by hand (i.e. perl script) or is there a PostgreSQL trick I should use for these?

(B) Two plpgsql errors
   (i) ERROR:  function "plpgsql_call_handler" already exists with same argument types
   (ii) ERROR:  unterminated - CONTEXT:  compile of PL/pgSQL function "uploaduser" near line 31
  - Are there known problems with upgrading plpgsql functions? I took a look at the functions with Navicat and saw no definitions. Has the formatting been chaged for these?

Sorry this question has drifted from its original subject.
Increasing points for the follow-on.
Stop increasing points, thank you.

My points against Linux:
ext2 will lose data on power failure
ext3,jfs,xfs etc will do every write two times.

So I recommend FreeBSD, it's filesystem does commit when program asks and does not lose commited data on poweroff.

(a) use pg_dump -Fc to use binary format of dump
(b) http://www.postgresql.org/docs/8.1/interactive/sql-createlanguage.html
Very interesting the points you raised about the file system, I have ext3 on both of the Linux boxes. I have a trusty old FreeBSD (4.9?) box at home, which I am very fond of, but the database is a citizen amongst other processes which would require some work to be ported.

I'll try the binary dump, but I'm surprised to hear that it is more portable than a text dump?!

With reference to (b) the language is created already. Or was I missing something in the link? Or was that a cue to RTFM ? :-)
Now FreeBSD is at 6.0 ... but still same from admins view as 4.9.

(a) strange or not - but binary dump is portable upwards. i.e if you want to downgrade it will not work.

(b) then you have to make us rewrite one simple procedure/function. Sure post refered to RTFM thing, but since you already did that, i cannot deny anymore there is a problem experts have to solve.
What is you point against using Linux. What do you think do the extended file system offer? You suggest a two fold migration
PSQL 7.x -> 8.1
and Linux -> FreeBSD in the same step

Very poor idea.

The plpsql problem is that there seems to be a
CREATE FUNCTION plpgsql_call_handler

And this thing is included with more then one table so you may have to replace it with
CREATE OR REPLACE FUNCTION ..

I would not just dump in binary format. Dump in text format also. Text format can be handeled by every tools down to an editor. If you do not need the text dump fine but what if something get's wrong with the binary dump?

Regards
Friedrich
Yes, I'll be hanging in with Linux. FreeBSD would be too much change.

I see the text dump has this:

\connect - postgres
CREATE FUNCTION "plpgsql_call_handler" ( ) RETURNS opaque AS '/usr/lib/pgsql/plpgsql.so' LANGUAGE 'C';
CREATE TRUSTED PROCEDURAL LANGUAGE 'plpgsql' HANDLER "plpgsql_call_handler" LANCOMPILER 'PL/pgSQL';
\connect - fred
...

I now see that plpgsql_call_handler is needed to provide plpgsql language support from http://www.faqs.org/docs/ppbook/c19610.htm . With these lines corrected I do not need to create the language prior to importing the dump. My Fedora system with the PostgreSQL 8.0 installation has plpgsql.so in /usr/lib64/pgsql/plpgsql.so, so I just need to change the path. Aside from being able to change users, this is a plus point of being able to keep with the text file format - though I guess I could symlink and hang in with the binary format if needs be.

Sorry - I'm still dithering.

gheist suggests the binary format to preserve the format of the plpgsql functions. This would be very handy, assuming it works. There are a hundred or so of them in this database, which I'd avoid putting in by hand.

fridom suggests dumping in the text format *also*. Is that as a fall-back in case the binary dump fails, or is it possible to reinstate a database using both dumps? If so, how would I be able to mix them?
It depends...

I'd suggest using binary and do pg_restore with -r switch - i.e when importing sort operations - create schema, constraints and import data after. maybe old pg_dump messes something up.

What does the -r switch do? I can't see it at http://www.postgresql.org/docs/8.0/interactive/app-pgrestore.html or in `pg_restore --help`

Version: pg_restore (PostgreSQL) 8.0.7
OK, I've reached a resolution for this now.

Vacuuming took very little time and was successful. Thanks for the reassurance.

With gheist and fridom's kind encouragement I persevered with an import into Postgres 8.0, and the problems fell away with perseverence.

For the sake of the EE PAQ database, here's the drill I now have for migrating my live 7.0.3 database to 8.0.7, when I bytes the bullet and upgrade. The database's name is morrell:

(1) Create a morrell database on my 8.0.7 installation with encoding latin1 and plpgsql support. I needed the latin1 encoding, because the original database was latin1 (I suspect that this is the 7.0.3 default) and it contains French accents in it, which throw up errors line "ERROR:  invalid byte sequence for encoding "UNICODE": 0xe27465" when â is encountered. I created the language externally, because the create language in the 7.0.3 dump had paths specific to that installation in it (specifically the whereabouts of the .so file).

      -bash-3.00$ dropdb morrell;createdb -E latin1 morrell;createlang plpgsql morrell
      DROP DATABASE
      CREATE DATABASE

(2) Take a snap-shot from the 7.0.3 database (my snap shot was prior to the vacuum, to be on the safe side).

      pg_dump morrell > morrell.database.pre-vacuum.sql

(3) Clean carriage returns replacing them with backslash followed by the carriage return, because I read that release 7.4 disallowed carriage returns unless they are escaped - see http://www.postgresql.org/docs/8.1/static/release-7-4.html - when I didn't do this I was getting messages saying ERROR:  literal carriage return found in data. HINT:  Use "\r" to represent carriage return. I could have converted to \r, but I chose to convert to \^M for maximum transparency.

      # ^M is generated at the bash prompt using Ctrl+V followed by Ctrl+M
      sed 's/^M/\\^M/' morrell.database.pre-vacuum.sql > x.sql

(3) I needed to comment out all \connect lines (because I'm using one all-powerful user for everything).

      sed 's/^\\connect/-- \\connect/' x.sql > x2.sql;mv x2.sql x.sql

(4) Comment out the dump's create lang code, because it is already created in (1) and the dumps code is system-specific.

      sed 's/CREATE FUNCTION \"plpgsql_call_handler\"/-- CREATE FUNCTION \"plpgsql_call_handler\"/' x.sql > x2.sql;mv x2.sql x.sql
      sed 's/CREATE TRUSTED PROCEDURAL LANGUAGE/-- CREATE TRUSTED PROCEDURAL LANGUAGE/' x.sql > x2.sql;mv x2.sql x.sql

(5) A hand-edit was required for one final error, in which a plpgsql function compares a string with '' to see if it is empty. I needed to change '' to \'\'. The 7.0.3 dump does not escape the single quotes. Happily there was only of of these. I didn't want to use sed/perl for this

      IF (paddleNumber<> \'\') THEN

(6) I the populated the 8.0.7 database with no reported errors

      psql morrell < x.sql 2> stderr.log > stdout.log

It is looking good, but I'm disappointed not to be able to see my plpgsql functions listed in Navicat. That may be a Navicat problem (I'm not very familiar with it yet), or it may be that the functions need to be reformatted to conform to PostgreSQL and/or Navicat conventions. I'll run some tests now to see if they work - the function code is certainly in there if I look at the data dumped from...

      pg_dump morrell > morrell.database.post-import.sql
Well I think you did it right, the steps taken sound reasonable to me. Good luck
Friedrich
Thanks for the encouragement, Friedrich, and all your help both of you.
You can practice your procedure couple of times before altering working environment.

You can avoid most vacuuming so:

do >
vacuum full analyze verbose;
record the report
repeat after a week

Now count freed pages in second report and record doubled sum into postgresql.conf

max_fsm_pages = 80000

I am not sure about 7.0 having that setting, etc etc. so all the pressure to actualize databse version.
As far as I know for sure
7.2, 7.3 - you have to tune max_fsm_pages to avoid lengthy database blocking
7.4 - You can tune - fsm page pointer is 6 bytes , vacuum does not block
8.0+ - It will be hard to find if/when tuning is needed.