asked on

why small number records table grow so big?

backgroud: I did a project two years ago for a client, the database is very tiny so I didn't bother to optimize/maintain it.

the autovacuum is not started ( here is the log information):
2010-10-19 14:59:34 EDT WARNING: autovacuum not started because of misconfiguration
2010-10-19 14:59:34 EDT HINT: Enable options "stats_start_collector" and "stats_row_level".

basically this postgresql 8.1.9 server ( Suse Linux Server OS) is running without maintenance for two years. the table with biggest records is a increase-only log table, which has about only 80,000 records.

Recently it started to have performance issue, so I revisited the server, and found the pgsql/data/dboid/ folder has increased to 3G now. Using "SELECT relfilenode, relpages,relname FROM pg_class" query, I found one table vmserver has used more than 2G space ( over 1G for table, 1G for a index vmserver_idx1.)

here is the table definition:

\d vmserver;
Table "public.vmserver"
Column | Type | Modifiers
-------------+-----------------------------+------------------------------------------------------------
id | bigint | not null default nextval(('vmserver_seq'::text)::regclass)
name | character varying(100) | not null default ''::character varying
ip_address | character varying(100) | not null default ''::character varying
cpu_percent | integer | default 0
ram_percent | integer | default 0
isactive | boolean | default false
last_update | timestamp without time zone | default ('now'::text)::timestamp without time zone
Indexes:
"vmserver_ppk" PRIMARY KEY, btree (ip_address)
"vmserver_idx2" UNIQUE, btree (name)
"vmserver_idx1" btree (ip_address, last_update)

It has 6 records : ( Yes, 6 records.)
select * from vmserver;
id | name | ip_address | cpu_percent | ram_percent | isactive | last_update
----+---------+----------------+-------------+-------------+----------+----------------------------
5 | server1 | 192.168.123.204 | 13 | 78 | t | 2010-11-17 07:58:22.950904
4 | server3 | 192.168.123.137 | 11 | 75 | t | 2010-11-17 07:58:28.743459
1 | server2 | 192.168.123.136 | 24 | 88 | t | 2010-11-17 07:58:33.780282
3 | server6 | 192.168.123.143 | 15 | 92 | t | 2010-11-17 07:58:34.781151
6 | server7 | 192.168.123.144 | 46 | 88 | t | 2010-11-17 07:58:39.104961
2 | server4 | 192.168.123.138 | 14 | 87 | t | 2010-11-17 07:58:41.794194

There are a few application access this database:
1. a .Net application , which only run select query again it:
DataSet ds = db.SelectQuery("select 'Server '||id||':'||name||':'||ip_address as name, id from vmserver where isactive=true order by id ");

2. a perl script running on each server to report their status every 20 seconds, It only run two querys:
SELECT id,cast (date_part('epoch',now()-last_update) as int ) as update_time FROM vmserver where ip_address='$ip'
update vmserver set cpu_percent=100-$cpu_idle,last_update=now(),ram_percent=$ram_percent where ip_address='$ip'

3. two store procedure which will be called by a web application also SELECT this table.

So I run reindex "reindex database mydb"; now the index file shrinked to 16K. but the vmserver table is still over 1G ( it actually has two data file now since postgres database can only use 1G data file). I run "vacuum vmserver;", I am supprised, it didn't shink the file.
run "SELECT relfilenode, relpages,relname FROM pg_class where relname='vmserver' " query, the relpages number is still about 160000 after I vacuum it.

Finally I run:
create table backup_vm_server as select * from vmserver;
TRUNCATE vmserver ;
insert into vmserver select * from backup_vm_server ;

This shrinked the table file.

Question 1: Why the vmserver_idx1 grow so huge? I know I should drop it ( it's just a habid I want every select query running on index, seems very bad idea in this case, since the table is so small and frenquently change on index field.) I will drop it after I fully understand it. Although last_update is keep changing, I still think the database should be smart enough not let the index file grow to 1G. Anybody can explain it?

Question 2: Why vmserver table grow so huge with 6 records? All the two years, There is no records insert/deleted, only select/update. This is even weird than first question. And the vacuum won't correct it ( the vacuum command took almost 10 minutes.)

I don't need any solution, I can turn on the autovacuum or setup a cronjob or something to do maintenance, I just need to have a better understanding why this is happenning.

Hatrix76

That is all very easy to explain:

in postgresql, an update does internally:
DELETE
INSERT

so for every update you have have one dead tuple (the deleted one). For performance reasons deleted tuples are only marked as deleted but not deleted from the filesystem. The vacuum - process recoups this space.

-- pg-doc --
VACUUM reclaims storage occupied by deleted tuples. In normal PostgreSQL operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present until a VACUUM is done. Therefore it's necessary to do VACUUM periodically, especially on frequently-updated tables.
-- pg-doc --

So you have to run vacuum frequently (best solution is to run the autovacuum daemon).

Best
Ray

Hatrix76

regarding the index, an index on a small table, where the whole table fit's into a page, rather slows down than speed things up, because the sequential search through one memory page, which is in memory, is nothing compared to what has to be done with the index:

It has to be maintained and upgraded on every update
It has to be shrunk with vacuum
It cost's more to search in the index than sequential in one memory page

you can use:

explain <sql-query>

in postgres and read/learn a bit about interpreting the output, this helps a lot to find the most performant versions of your queries.

best

jackiechen858

ASKER

Thanks so much Hatrix76. I agree I should delete the index. also I will setup autovacuum. It's interesting to learn that Postgresql do DELETE&INSERT for UPDATE, I tried to find some information about it, but couldn't.

Would you mind to tell me where you get this information? Is this the unique implementation for Postgresql database, or common practice for all the databases? Doesn't sound very reasonable to me.

Thanks.

ASKER CERTIFIED SOLUTION

Hatrix76

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

jackiechen858

ASKER

Thanks!