Link to home
Start Free TrialLog in
Avatar of simonphoenix
simonphoenix

asked on

PostgreSQL: pg_clog/0736: No such file or directory

I'm a Linux and Postgres newbie that need to figure out why I suddenly get this error message on my website:
Query failed: ERROR: could not access status of transaction 1935960436 DETAIL: Could not open file "pg_clog/0736": No such file or directory.
I have searched Google but failed to find anything useful. Is 0736 a special code error og actually a missing file?
In my postgres directory: var/lib/postgresql/8.2/main/pg_clog
I have the following files:
0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000a 000b 000c

It is only the forum page that fails to run in table debatforum.
On that table I can't run:
select * from debatforum - I get the above error
But I can run this:
select * from debatforum limit 1

So there must be some corrupt data I need to get rid of.

When trying to help then please tell me where to run any commands, e.g. vacuum in phpPgAdmin tool, or directly on the Linux prompt (I use puTTy to connect as root)
ASKER CERTIFIED SOLUTION
Avatar of gheist
gheist
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of simonphoenix
simonphoenix

ASKER

I copied the pg_clog directory and tried to use your:
dd if=/dev/zero of=pg_clog/0736 bs=32768 count=512
, but got this error:
dd: opening `pg_clog/0736': No such file or directory

Am I missing a file named 0736? I only have the ones mentioned in main question. I read about creating an empty clog file manually, but how is that done? Can I copy one of the existing and renaming to 0736 and use your command?
What is your Linux distribution?
it could be different pg_clog and data directory is in use.

You may succeed using "locate pg_clog" if locate is installed and indexed.
Well, now I made the command on my backup directory and got a file named 0736. I copied this into the pg_clog directory and restarted the server.
Now it runs again.
I'm still baffled - what happened to 0736 in the first place? Was it ever there before?
Power failure or application crash happened. That should be evident from "messages" log.

What type of filesystem you are using?
What is your distribution?

We can improve matters fro next time bad things happen.
Filesystem: you mean PostgreSQL 8.2.7?
Distribution: Ubuntu Linux 7.10
Interface: Webmin 1.410 / puTTy (SSH)
filesystem type: from mount -v command output like ext3 or ntfs.


http://www.linux-security.cn/ebooks/ulk3-html/0596005652/understandlk-CHP-18-SECT-7.html
Filesystem is ext3
Into /etc/fstab edit respective line which leads to postgresql data:
/dev/sda1 / ext3 defaults,noatime,data=journal 1 1

(doubles written data compared to writeback)

Have you found out what caused damage?
What happens when using you config change?
Do I need to restart postgresql server?

A possible cause for the damage:
The virtual server maxed out its memory and deadlocked. After restart it seemed to work fine for a few days.
You have to remount filesystem.

So problem falls under "power fault" and having well-journaled filesystem will prevent future damage if you power off computer.

Did you trace guilty application (Like apache+php memory leak)? Did you apply any corrective actions(MaxRequestsPerChild 1000)???
Our System Administrator said that mysqld and apache were dead due to out of memory. But mysql is not used at the moment.

Do you refer to apache2.conf file when you mention MaxRequestsPerChild?

The value is set to 0 by default for:
<IfModule mpm_prefork_module>
    StartServers          5
    MinSpareServers       5
    MaxSpareServers      10
    MaxClients          150
    MaxRequestsPerChild   0
</IfModule>
and
<IfModule mpm_worker_module>
    StartServers          2
    MaxClients          150
    MinSpareThreads      25
    MaxSpareThreads      75
    ThreadsPerChild      25
    MaxRequestsPerChild   0
</IfModule>
Set MaxRequestsPerChild value to something else (as documented in comments around where you copied module parameters from) and disable mysql (using services-admin)
These actions will prevent future memory leaks.

You have to ensure that postgresql databases and logs are always synced to disk to accomodate future power failures. Your admin must be able to help.
I have implemented your suggestions (MaxRequestsPerChild set to 1000 and /dev/sda1 / ext3 defaults,noatime,data=journal 1 1), and restarted apache2 and postgresql.

I really appriciate your help - well earned points. I love when possible reasons for preventing future errors are provided. That is the way to help, thanks.
I understand that you shall not tamper with this kind of thing if you are a rookie, but I had to write this since I see a lot of implied knowledge about Linux and Postgresql that I can't decifer when reading documentation. It is not user friendly, at least not until you have had some training or experience.
Thanks again
I am not sure if Ubuntu reacts adequately to / mount options change
You should redo mkinitrd and grub/lilo at least (I do not know how this is done in Ubuntu)
Read tune2fs manual page ("man tune2fs" -e continue) - in worst case reboot will end up in root not mounting and running fsck & co off the livecd.