asked on

Creating rules in Postgres

Please help me understand this rule. I created it but I am wondering if some of the criteria in the WHERE clause is even necessary, particularly "setdata.seriesid=new.seriesid". I think I need to understand "NEW" better. It seems to be that setdata.seriesid will always equal new.seriesid, right?

Here is what I'm trying to do. I want to update a record and set SETDATA.OTHERIDS after or while a row is being inserted. I get this OTHERIDS value from the get_otherids() function, which queries a table called SERIESDATA.

I have tested everything and it seems to be correct but I've only tested on a very small dataset. I will end up using this on tables that have over 40,000 rows so I want to make sure the rule only acts on the row in question so it is very fast. The last thing I want is for a user to sit around for 10 minutes for each row they insert.

One more thing. You may wonder why not do the updating at the application level at the same time the row is initially being inserted. I would like to avoid rolling out a new version of the application with changes. I figured it would be fine to just create a rule or trigger (though I've read that a rule is better in my case) so that this change is transparent to the user. I'm not sure what best practice would be for using rules and triggers.

SERIESDATA TABLE:
   id    | otherids |  group  
---------+-----------+----------
 2256798 | 888       | groupQ
 1234567 | 789       | groupR
 7654321 | 562       | 
 2943747 | 511       | groupR
 2944517 | 123       | groupR
 
insert into setdata(seriesid,name,group) values(1234567,'John','groupR');
insert into setdata(seriesid,name,group) values(7654321,'Sue','groupR');
 
SETDATA TABLE:
 seriesid |  name  | otherids |  group  
-----------+--------+-----------+----------
   1234567 | John   | 789       | groupR
   7654321 | Sue    | 562       | groupQ
 
Here is the rule:
CREATE OR REPLACE RULE upd_setdata_on_insert AS
    ON INSERT TO setdata DO UPDATE setdata SET otherids = get_otherids(new.id)
  WHERE (setdata.group::text = 'groupR'::text OR setdata.group::text = 'groupQ'::text) AND setdata.seriesid = new.seriesid AND setdata.otherids IS NULL;
 
Here it the function:
CREATE OR REPLACE FUNCTION get_otherids(integer)
  RETURNS character varying AS '
declare
  serid alias for $1;
  other varchar(10);
begin
  select into other otherids from seriesdata where seriesdata.id=serid;
  return other;
end;
' LANGUAGE 'plpgsql' VOLATILE;

Open in new window

grant300

Normally you would use a trigger for this kind of thing but there is nothing wrong with a rule. PostgreSQL seems to run rules faster than triggers in some cases (e.g. for partitioning) but we are not talking about staggering differences either way.

If you are worried about performance, I would dump the get_ortherids() function and incorporate the code in the trigger. You just don't need the overhead of an additional function call even if it does make the code a little bit easier to read/maintain. In fact, you can rewrite the view with a correlated UPDATE query and probably get the best performance and do it in a single SQL statement.

The "new" pseudo table contains the values of the after image; what the INSERTed or UPDATEd row(s) will look like after the verb. In your case, you are updating the target row(s) based on values in the "new" table row(s). You are essentially joining the "new" pseudo table to the base table in your UPDATE statement. Your WHERE clause looks O.K. but it might not perform quite as well as it could. In order for the optimizer to do the least amount of work in the join (assuming multi-row INSERTs), I would test the values of new.group rather than setdata.group and new.ortherids rather than setdata.otherids. Because "new" is the driving table, the more rows you can eliminate from it (logically), the fewer rows remain to be joined to setdata to be updated.

I would experiment with the single update statement below to see what the optimizer does. You might insert a bunch of rows with only one or two with NULL otherids and see what the optimizer says. The alternative is to move the two SARGS (WHERE clauses) up into the ON clause for the JOIN between new and setdata.

Regards,
Bill

CREATE OR REPLACE RULE upd_setdata_on_insert AS
     ON INSERT TO setdata DO
        UPDATE setdata
    	   SET otherids = sd.otherids
    	  FROM new
    	  JOIN setdata st
    	    ON st.seriesid = new.seriesid
    	  JOIN seriesdata sd
    	    ON sd.id = new.id
         WHERE new.group::text in ('groupR'::text, 'groupQ'::text)
           AND new.otherids IS NULL;

Open in new window

mock5c

ASKER

What's confusing to me is the statement "ON INSERT TO" leads me to believe a record has already been inserted and now we will perform an update ("DO UPDATE") (so the rule actually fires after the insert has occurred). So since the values have already been inserted, wouldn't that mean setdata.seriesid IS new.seriesid, therefore that statement is unnecessary in the update query? I need to rely on setdata.seriesid already having a value (from the insert) and then when I update (through rule or trigger), I look at my newly inserted record's seriesid and find the corresponding otherid from the seriesdata table (I use the get_otherids() funct) and then update the newly inserted record otherids (initially null) to that otherids value I just got.

My rule (or trigger) should only update a single record which is in the setdata table. I need the criteria to be very specific because I want to update one specific record as well as have it happen very quickly. I don't want the query to look through 40,000 records just to update a single record.

grant300

The thing is, you need to know WHICH setdata record to update after it has been inserted. When you refer to the setdata table in a rule or trigger, you are referring to the entire table unless you identify the specific row. The "new" pseudo table has the values of the row that was just inserted so, with setdata.seriesid = new.seriesid, you are referring to the row in setdata that you just inserted.

You are always going to look through 40,000 records to update a single one; the question is how efficiently you can do it. In this case, I presume there is a unique index on setdata.seriesid so the optimizer will use it. Add to that the fact that the record is still in memory in a page on the dirty page list, it won't take long to find it.

If you do NOT have a unique index in setdata.seriesid, you need to be using whatever the primary key is to go back and update the record.

There are a couple of advantages to the UPDATE/JOIN syntax I suggested. The first is that it is going to be the fastest at run time. The second is that it works equally well whether you insert one row or 1000, e.g. INSERT / SELECT. It is very poor practice to create a trigger or rule that does not handle multiple record SQL operations gracefully; it always comes back to bite you in the behind.

Regards,
Bill

mock5c

ASKER

Obviously I'm not thinking today. Yes, I understand what you are saying in paragraph one of your last response. So I've decided to try the update/join syntax in the rule that you suggested. I get this error:

ERROR: relation "*NEW*" does not exist

I'm assuming that NEW is a temporary table that postgres creates when a rule is executed.

ASKER CERTIFIED SOLUTION

grant300

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

mock5c

ASKER

Still the same problem: relation "*NEW*" does not exist

It seems like it might have something to do with the joins. I don't know why that would be. So if I switch the rule back to:

CREATE OR REPLACE RULE upd_setdata_on_insert AS
ON INSERT TO setdata DO UPDATE setdata SET otherids = get_otherids(new.id)
WHERE setdata.otherids IS NULL AND setdata.seriesid = new.seriesid AND
(setdata.group::text = 'groupR'::text OR setdata.group::text = 'groupQ'::text);

This creates the rule but I had to bring that function back into it to get other_ids from the seriesdata table. I'm trusting that the correct record would be found in the where clauses.

I also found some documentation about NEW and OLD. In Postgres these keywords refer to data that has just been inserted or deleted, respectively.

So now I'm looking at the possibility of using a trigger instead of a rule. I don't really know the difference between the two except for the fact that triggers refer exclusively to the table being acted on whereas rules can act on external tables (according to the docs I just read).

But now when I look at trigger syntax in this book example:

create trigger employee_update
before delete
on employee
for each row execute procedure trig_delete_check_emp();

Does that mean for every single row in the employee table, execute the trigger?? What if your tables are huge?

I'm increasing points since I've basically asked another question here.

earth man2

CREATE OR REPLACE FUNCTION get_otherids( int serid ) RETURNS text AS $$
begin
return select otherids from seriesdata where seriesdata.id=serid;
end;
$$ LANGUAGE plpgsql VOLATILE;

the function is so simple it could be written as a SQL function instead of plpgsql.
Or as Bill has suggested above you could import that SQL into the rule.

CREATE OR REPLACE RULE upd_setdata_on_insert AS ON INSERT TO setdata DO
UPDATE setdata SET otherids = sd.otherids FROM seriesdata sd WHERE
sd.id = NEW.id AND
NEW.group::text in ('groupR'::text, 'groupQ'::text) AND
NEW.otherids IS NULL;

One problem I see is that the select from object SERIESDATA could possibly return more than one row, so you may need an aggregate function like "stragg" or an order clause to retrieve the correct row. Alternatively you may need a constraint to stop the return of multiple rows.

SOLUTION

earth man2

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial