PostgreSQL: accrueing information using automated partitioning, dynamically generating tables

Posted on 2007-10-17
Last Modified: 2010-08-05

I'm creating a database front-end system which has to accrue sales.

At the moment, there is one table that represents discrete transactions, and another that represents individual items sold. They have a one-to-many relationship. Both tables can contain an arbitrary amount of data, though, potentially a massive amount of data ( in the order of tens of millions of rows, though it can only grow, never shrink).  99% of the time, the user will only want to access recent data. It starts off with no data.

I've been told that  this necessitates partitioning. I'd like to create a table that can *dynamically* generate new tables for every, say , month worth of data, to keep things efficient. This needs to be totally automated and encapsulated to the user, as this database is used by a front-end for non-technical people.

Here is a the manual page for PostgreSQL partitioning:

My questions are: this "the way it's done"? This has to be a common sort of problem. There has to be an approach that is generally favoured. Is this it?
2.If so, how? The manual doesn't cover creating tables dynamically.
3.If not, what should I do instead?

Thanks in advance,
Question by:sternocera
    LVL 19

    Accepted Solution

    First, your data volumes (10s of millions of rows) is not trivial, it is not "massive" assuming you are not trying to run this on some broken down old machine.  Admittedly, you don't want the users having to sift through years worth of data if they are only interested in the most recent most of the time.

    You can automate the process of addition partitions using stored procedures but I don't believe there is a way to create a table that partitions itself automatically.

    The closest you could come is to put an insert trigger on the table and, at a certain date threshold each month, have it create the next months partition.  Quite frankly, I am not sure that is the best way to go.

    What you might want to do is opt for a much simpler mechanism, namely two partitions, Current and Archive.  On a periodic basis you can move rows from the "Current" partition to the "Archive" partition and change the definition (date range) on the partitions so that the query processor knows where things are actually located.  If after two or three years you decide the Archive partition is getting too large, you can create Archive2 and stuff it in the middle between Archive and Current.  In fact, you could create yearly partitions but fiddle with the definitions so that the Current one really has the Current data in it and kind of makes a hole in the current year's partition.

    In any event, you can generate and/or regenerate the rules for the partitions using stored procedures that create the text of the rule and then execute it.  You can also create a procedure for migrating the older data from the Current partition to the Archive or YearX partition.

    If I were you, I think I would start with the Current and Archive partitions and build the tools necessary to manipulate that simple scheme then enlarge it if and when it becomes necessary.


    Author Comment


    At a push, I can see the second table ( The "many" table) containing about 20 million rows (but probably more like 5 million). Since this, as you've said, isn't that unreasonable an amount of data for a modern CPU to sift through, the current and archive solution seems very attractive. Also, the user will need that data very infrequently, so the fact that it's possibly sub-optimal is not a concern.

    Fiddling after the fact isn't really an option, because this thing has to work in the field, largely without support.

    Moving a few tens of thousands of rows from the current to the archive shouldn't be too computationally expensive, right? I'd like it to happen almost instantaneously, maybe under 5 seconds on a modern CPU. Not having much experience with databases, I'm not sure that this is the case. Please advise.

    Anyway, I think that you addressed my main concern, so I'll give you your points. Thanks.


    LVL 19

    Expert Comment

    You have to move a logical chunk of rows at a time, based on date, so you can update the rules (or triggers) that maintain the partitions.

    Moving 50K rows in 5 seconds is probably unrealistic, but 20 to 25 seconds is reasonable.  If you need to make it faster than that, you probably have to cut the number of rows down, maybe doing the move to archive once per day.  Remember, no matter how much CPU you have, you still have disk I/O to deal with and the disks, while they have gotten bigger, definitely have not gotten much faster.

    BTW, we have had very good success on the two-chip dual-core Opteron machines.  They have great memory bandwidth and low latencies and really scream.  IBM makes a very nice, reasonably priced box called the x3655 that will take up to 8 drives in a 2-U form factor.

    VERY IMPORTANT:  You have to use a hardware RAID controller with a battery backed cache.  Even if you don't RAID the disks, setting the controller up with write caching makes these little boxes much, much faster.


    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Free Trending Threat Insights Every Day

    Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

    APEX (Application Express) is used to develop a web application from Oracle. SQL Workshop is one of the tools that comes with Oracle APEX to query or modify the database objects or to make any changes to the structure.
    CCModeler offers a way to enter basic information like entities, attributes and relationships and export them as yEd or erviz diagram. It also can import existing Access or SQL Server tables with relationships.
    Steps to create a PostgreSQL RDS instance in the Amazon cloud. We will cover some of the default settings and show how to connect to the instance once it is up and running.
    Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

    759 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    12 Experts available now in Live!

    Get 1:1 Help Now