Solved

Creating rules in Postgres

Posted on 2007-12-04
8
695 Views
Last Modified: 2008-02-01
Please help me understand this rule.  I created it but I am wondering if some of the criteria in the WHERE clause is even necessary, particularly "setdata.seriesid=new.seriesid".   I think I need to understand "NEW" better.  It seems to be that setdata.seriesid will always equal new.seriesid, right?

Here is what I'm trying to do.  I want to update a record and set SETDATA.OTHERIDS after or while a row is being inserted.  I get this OTHERIDS  value from the get_otherids() function, which queries a table called SERIESDATA.

I have tested everything and it seems to be correct but I've only tested on a very small dataset.  I will end up using this on tables that have over 40,000 rows so I want to make sure the rule only acts on the row in question so it is very fast.  The last thing I want is for a user to sit around for 10 minutes for each row they insert.

One more thing.  You may wonder why not do the updating at the application level at the same time the row is initially being inserted.  I would like to avoid rolling out a new version of the application with changes.  I figured it would be fine to just create a rule or trigger (though I've read that a rule is better in my case) so that this change is transparent to the user.  I'm not sure what best practice would be for using rules and triggers.



SERIESDATA TABLE:

   id    | otherids |  group  

---------+-----------+----------

 2256798 | 888       | groupQ

 1234567 | 789       | groupR

 7654321 | 562       | 

 2943747 | 511       | groupR

 2944517 | 123       | groupR
 

insert into setdata(seriesid,name,group) values(1234567,'John','groupR');

insert into setdata(seriesid,name,group) values(7654321,'Sue','groupR');
 

SETDATA TABLE:

 seriesid |  name  | otherids |  group  

-----------+--------+-----------+----------

   1234567 | John   | 789       | groupR

   7654321 | Sue    | 562       | groupQ
 

Here is the rule:

CREATE OR REPLACE RULE upd_setdata_on_insert AS

    ON INSERT TO setdata DO UPDATE setdata SET otherids = get_otherids(new.id)

  WHERE (setdata.group::text = 'groupR'::text OR setdata.group::text = 'groupQ'::text) AND setdata.seriesid = new.seriesid AND setdata.otherids IS NULL;
 

Here it the function:

CREATE OR REPLACE FUNCTION get_otherids(integer)

  RETURNS character varying AS '

declare

  serid alias for $1;

  other varchar(10);

begin

  select into other otherids from seriesdata where seriesdata.id=serid;

  return other;

end;

' LANGUAGE 'plpgsql' VOLATILE;

Open in new window

0
Comment
Question by:mock5c
  • 3
  • 3
  • 2
8 Comments
 
LVL 19

Expert Comment

by:grant300
Comment Utility
Normally you would use a trigger for this kind of thing but there is nothing wrong with a rule.  PostgreSQL seems to run rules faster than triggers in some cases (e.g. for partitioning) but we are not talking about staggering differences either way.

If you are worried about performance, I would dump the get_ortherids() function and incorporate the code in the trigger.  You just don't need the overhead of an additional function call even if it does make the code a little bit easier to read/maintain.  In fact, you can rewrite the view with a correlated UPDATE query and probably get the best performance and do it in a single SQL statement.

The "new" pseudo table contains the values of the after image; what the INSERTed or UPDATEd row(s) will look like after the verb.  In your case, you are updating the target row(s) based on values in the "new" table row(s).  You are essentially joining the "new" pseudo table to the base table in your UPDATE statement.  Your WHERE clause looks O.K. but it might not perform quite as well as it could.  In order for the optimizer to do the least amount of work in the join (assuming multi-row INSERTs), I would test the values of new.group rather than setdata.group and new.ortherids rather than setdata.otherids.  Because "new" is the driving table, the more rows you can eliminate from it (logically), the fewer rows remain to be joined to setdata to be updated.

I would experiment with the single update statement below to see what the optimizer does.  You might insert a bunch of rows with only one or two with NULL otherids and see what the optimizer says.  The alternative is to move the two SARGS (WHERE clauses) up into the ON clause for the JOIN between new and setdata.

Regards,
Bill
CREATE OR REPLACE RULE upd_setdata_on_insert AS

     ON INSERT TO setdata DO

        UPDATE setdata

    	   SET otherids = sd.otherids

    	  FROM new

    	  JOIN setdata st

    	    ON st.seriesid = new.seriesid

    	  JOIN seriesdata sd

    	    ON sd.id = new.id

         WHERE new.group::text in ('groupR'::text, 'groupQ'::text)

           AND new.otherids IS NULL;

 

Open in new window

0
 

Author Comment

by:mock5c
Comment Utility
What's confusing to me is the statement "ON INSERT TO" leads me to believe a record has already been inserted and now we will perform an update ("DO UPDATE") (so the rule actually fires after the insert has occurred).  So since the values have already been inserted, wouldn't that mean setdata.seriesid IS new.seriesid, therefore that statement is unnecessary in the update query?  I need to rely on setdata.seriesid already having a value (from the insert) and then when I update (through rule or trigger), I look at my newly inserted record's seriesid and find the corresponding otherid from the seriesdata table (I use the get_otherids() funct) and then update the newly inserted record otherids (initially null) to that otherids value I just got.

My rule (or trigger) should only update a single record which is in the setdata table.  I need the criteria to be very specific because I want to update one specific record as well as have it happen very quickly.  I don't want the query to look through 40,000 records just to update a single record.

0
 
LVL 19

Expert Comment

by:grant300
Comment Utility
The thing is, you need to know WHICH setdata record to update after it has been inserted.  When you refer to the setdata table in a rule or trigger, you are referring to the entire table unless you identify the specific row.  The "new" pseudo table has the values of the row that was just inserted so, with setdata.seriesid = new.seriesid, you are referring to the row in setdata that you just inserted.

You are always going to look through 40,000 records to update a single one; the question is how efficiently you can do it.  In this case, I presume there is a unique index on setdata.seriesid so the optimizer will use it.  Add to that the fact that the record is still in memory in a page on the dirty page list, it won't take long to find it.

If you do NOT have a unique index in setdata.seriesid, you need to be using whatever the primary key is to go back and update the record.

There are a couple of advantages to the UPDATE/JOIN syntax I suggested.  The first is that it is going to be the fastest at run time.  The second is that it works equally well whether you insert one row or 1000, e.g. INSERT / SELECT.  It is very poor practice to create a trigger or rule that does not handle multiple record SQL operations gracefully; it always comes back to bite you in the behind.

Regards,
Bill

0
 

Author Comment

by:mock5c
Comment Utility
Obviously I'm not thinking today.  Yes, I understand what you are saying in paragraph one of your last response.  So I've decided to try the update/join syntax in the rule that you suggested.  I get this error:

ERROR:  relation "*NEW*" does not exist

I'm assuming that NEW is a temporary table that postgres creates when a rule is executed.
0
Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

 
LVL 19

Accepted Solution

by:
grant300 earned 150 total points
Comment Utility
Let's see: Sybase, Oracle, DB2-UDB, PostgreSQL.  Kind of hard to keep them all straight.

I think I wrote the update in Oracle-ish syntax where NEW is indeed a "pseudo" table you can work from.

Somebody jump in if I am wrong (please) but I think a PostgreSQL rule implicitly fires for each row inserted or updated.  Some other databases fire triggers once per verb, and others give you a choice)

Try removing the "NEW" relation from the statement and simply use the prefix where required.  I have clipped in a new version for you.

BTW, it is important to try this with both single and multiple row insert statements.

Regards,
Bill
CREATE OR REPLACE RULE upd_setdata_on_insert AS

     ON INSERT TO setdata DO

        UPDATE setdata

    	   SET otherids = sd.otherids

    	  FROM setdata st

    	  JOIN seriesdata sd

    	    ON st.seriesid = new.seriesid

    	   AND sd.id = new.id

         WHERE new.group::text in ('groupR'::text, 'groupQ'::text)

           AND new.otherids IS NULL;

Open in new window

0
 

Author Comment

by:mock5c
Comment Utility
Still the same problem:  relation "*NEW*" does not exist

It seems like it might have something to do with the joins.  I don't know why that would be.  So if I switch the rule back to:

CREATE OR REPLACE RULE upd_setdata_on_insert AS
    ON INSERT TO setdata DO UPDATE setdata SET otherids = get_otherids(new.id)
  WHERE setdata.otherids IS NULL AND setdata.seriesid = new.seriesid AND
(setdata.group::text = 'groupR'::text OR setdata.group::text = 'groupQ'::text);

This creates the rule but I had to bring that function back into it to get other_ids from the seriesdata table.  I'm trusting that the correct record would be found in the where clauses.

I also found some documentation about NEW and OLD.  In Postgres these keywords refer to data that has just been inserted or deleted, respectively.

So now I'm looking at the possibility of using a trigger instead of a rule.  I don't really know the difference between the two except for the fact that triggers refer exclusively to the table being acted on whereas rules can act on external tables (according to the docs I just read).

But now when I look at trigger syntax in this book example:

create trigger employee_update
   before delete
   on employee
   for each row execute procedure trig_delete_check_emp();

Does that mean for every single row in the employee table, execute the trigger??  What if your tables are huge?

I'm increasing points since I've basically asked another question here.

0
 
LVL 22

Expert Comment

by:earth man2
Comment Utility
CREATE OR REPLACE FUNCTION get_otherids( int serid ) RETURNS text AS $$
begin
  return select otherids from seriesdata where seriesdata.id=serid;
end;
$$ LANGUAGE plpgsql VOLATILE;

the function is so simple it could be written as a SQL function instead of plpgsql.
Or as Bill has suggested above you could import that SQL into the rule.

CREATE OR REPLACE RULE upd_setdata_on_insert AS ON INSERT TO setdata DO
   UPDATE setdata  SET otherids = sd.otherids  FROM seriesdata sd WHERE
sd.id = NEW.id AND
NEW.group::text in ('groupR'::text, 'groupQ'::text) AND
NEW.otherids IS NULL;

One problem I see is that the select from object SERIESDATA could possibly return more than one row, so you may need an aggregate function like "stragg"  or an order clause to retrieve the correct row.  Alternatively you may need a constraint to stop the return of multiple rows.
0
 
LVL 22

Assisted Solution

by:earth man2
earth man2 earned 50 total points
Comment Utility
trigger FOR EACH ROW means act on ONLY those rows affected by the query as opposed to ONCE for the SQL statement.
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Best database to use for Maps is PostgreSQL. This is an open source database. Comes as a package with most Linux OS. For more info visit the following site: http://www.postgresql.org/ (http://www.postgresql.org/) This requires some add-o…
Many developers have database experience, but are new to PostgreSQL. It has some truly inspiring capabilities. I have several years' experience with Microsoft's SQL Server. When I began working with MySQL, I wanted a quick-reference to MySQL (htt…
Steps to create a PostgreSQL RDS instance in the Amazon cloud. We will cover some of the default settings and show how to connect to the instance once it is up and running.
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now