I have a very basic question. When you have the following query in Oracle10gr2, do we need to have index on table a for rid and bid to perform better? I know when you have a where clause based on rid and bid the index helps, but your query joins those tables to get c2 and c3 in the select clause. ( these bid and rid are foreign keys on table a and table a contains around 100 million rows)
select a.c1,r.c2,b.c3 from a left join r on a.rid = r.rid left join b on a.bid = b.bid where a.cid between :v1 and :v2
It depends on: - How many rows exist in tables r and b - How selective the data is in these tables - How many rows will be retrieved from table a i.e. identified by the where condition
It is common practise to create indexes on foreigh key columns as this usual has a positive impact on performance when deleting data from parent tables i.e. if you want to delete a row from table r, then assuming you have a foreign key check constraint on table a, a lookup will be performed on a to ensure no data exists, or to delete it if the cascade delete options was used. There are also locking implications for using indexes on foreign keys.
....should have added of course that there there may also be a down side to having indexes on tables with many rows - the index maintenance overhead due to inserting all these rows!
THis is dataware house environment. I might have 15 to 16 dimension keys on the fact table a. When I am not using a where clause and using the above query, do you think I still need to have all foreign key indexes, you know the impact on the load and space due to these indexes.
Actually, no. It is unnecessary to index a.rid or a.bid.
You are doing 2 left joins on the table a, which means that the query will run through ALL rows in a which match the where clause. It will then take each a.rid value and try to match it with rows from table r, and similarly will match a.bid with rows from table b.
It is essential that r.rid and b.bid are indexed. Presumably these are primary keys so will be indexed.
BTW if the joins were inner joins (so that only matching rows of table a are returned) it is possible that indexing a.rid and a.bid might help.
Table a will have matching rows of rid/bid but it can also have null values, that is why we had to use left join. If I join thru inner join are you sure that index will help to perform the query better though I don't use these rid and bid in my where clause.
What I mean is if I perform the following query, any way it needs to bring all the records from table a for a given cid because there is no where condition on bid and rid. So left join or inner join should not make much difference. But if you think indexes with inner join performance will be better, then I can think of adding a bid and rid values as 0 and add 0 to the rows in table a where these bid and rid (foreign keys are nulls). This is a huge change to our system. Before I make this decission it is very important for me be clear on this. select a.c1,r.c2,b.c3 from a join r on a.rid = r.rid join b on a.bid = b.bid where a.cid between :v1 and :v2
If you want to return all rows of table a which match the where condition, you should use a left join. Then, as I said, you won't need indexes on a.rid and a.bid.
However, if you only want those rows of table a which match the where condition AND which have corresponding rows in tables r and b, then you should use an inner join. Note that even though the conditions a.rid = r.rid and a.bid = b.bid are not part of the where clause, for an inner join they are treated as if they were part of the where clause.
You should use whichever of these gives you the data you are looking for.
If you want to achieve the effect of the left join by "cheating" - adding dummy values to the other tables and using an inner join - I would say, don't. I can't imagine it would run any faster and you would have added spurious data which might cause problems somewhere else.
For any given SQL query, the database engine tries to define an optimum execution plan - which table it looks at first, how it matches rows together and so on. If the execution plan was to start with table b, say, and for each row in b look for a row in table a and then check whether the where clause was satisfied, then an index on a.bid would improve the performance. However, the database would choose this execution plan only (a) if it was an inner join (b) if table b was much shorter than table a and (c) if the where clause was not very selective (in other words if a high percentage of rows in table a match the where condition).
In the case you describe - you do want all the rows in table matching the where condition - just use the left join and forget anything else.