Link to home
Start Free TrialLog in
Avatar of sternocera
sternocera

asked on

PostgreSQL: accrueing information using automated partitioning, dynamically generating tables

Hello,

I'm creating a database front-end system which has to accrue sales.

At the moment, there is one table that represents discrete transactions, and another that represents individual items sold. They have a one-to-many relationship. Both tables can contain an arbitrary amount of data, though, potentially a massive amount of data ( in the order of tens of millions of rows, though it can only grow, never shrink).  99% of the time, the user will only want to access recent data. It starts off with no data.

I've been told that  this necessitates partitioning. I'd like to create a table that can *dynamically* generate new tables for every, say , month worth of data, to keep things efficient. This needs to be totally automated and encapsulated to the user, as this database is used by a front-end for non-technical people.

Here is a the manual page for PostgreSQL partitioning: http://www.postgresql.org/docs/current/static/ddl-partitioning.html

My questions are:

1.is this "the way it's done"? This has to be a common sort of problem. There has to be an approach that is generally favoured. Is this it?
2.If so, how? The manual doesn't cover creating tables dynamically.
3.If not, what should I do instead?

Thanks in advance,
Sternocera
ASKER CERTIFIED SOLUTION
Avatar of grant300
grant300

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sternocera
sternocera

ASKER

Bill,

At a push, I can see the second table ( The "many" table) containing about 20 million rows (but probably more like 5 million). Since this, as you've said, isn't that unreasonable an amount of data for a modern CPU to sift through, the current and archive solution seems very attractive. Also, the user will need that data very infrequently, so the fact that it's possibly sub-optimal is not a concern.

Fiddling after the fact isn't really an option, because this thing has to work in the field, largely without support.

Moving a few tens of thousands of rows from the current to the archive shouldn't be too computationally expensive, right? I'd like it to happen almost instantaneously, maybe under 5 seconds on a modern CPU. Not having much experience with databases, I'm not sure that this is the case. Please advise.

Anyway, I think that you addressed my main concern, so I'll give you your points. Thanks.

Regards,
Sternocera

You have to move a logical chunk of rows at a time, based on date, so you can update the rules (or triggers) that maintain the partitions.

Moving 50K rows in 5 seconds is probably unrealistic, but 20 to 25 seconds is reasonable.  If you need to make it faster than that, you probably have to cut the number of rows down, maybe doing the move to archive once per day.  Remember, no matter how much CPU you have, you still have disk I/O to deal with and the disks, while they have gotten bigger, definitely have not gotten much faster.

BTW, we have had very good success on the two-chip dual-core Opteron machines.  They have great memory bandwidth and low latencies and really scream.  IBM makes a very nice, reasonably priced box called the x3655 that will take up to 8 drives in a 2-U form factor.

VERY IMPORTANT:  You have to use a hardware RAID controller with a battery backed cache.  Even if you don't RAID the disks, setting the controller up with write caching makes these little boxes much, much faster.

Regards,
Bill