Link to home
Start Free TrialLog in
Avatar of KarlisB
KarlisBFlag for Latvia

asked on

optimize query

Greetings!

I have a following query:
as you see, i have joined atfirst one "item" id table with a auto generated table of dates.
as a result i got table where every item_id has assigned about 365 last dates.
After that i join it to a massive table which contains every day statistics for items. in the end query returns item_id's and dates where data is missing and is not present in statistics table.

I wish to know, is there anything i can do to optimize the query?
or is there another way how to get id's of missing items at certain dates?

item table has about 12 000 records;(after filtering)
statistic table has about 50 000 000 records

first time query call takes 160 sec, after cashing 20 secs ;
am using multicolumn index acording to all 3 fields in where clause on statistics table;

query plan:
"Aggregate  (cost=375007.39..375007.40 rows=1 width=0)"
"  ->  Nested Loop Anti Join  (cost=0.01..375007.39 rows=1 width=0)"
"        ->  Nested Loop  (cost=0.00..3130.29 rows=30000 width=8)"
"              ->  Seq Scan on preces  (cost=0.00..2455.29 rows=30 width=4)"
"                    Filter: (sort_a AND (NOT stop_prece) AND (id = id))"
"              ->  Function Scan on generate_series a  (cost=0.00..12.50 rows=1000 width=4)"
"        ->  Index Scan using realizacija_pkey on realizacija  (cost=0.01..12.90 rows=1 width=8)"
"              Index Cond: ((realizacija.veikala_id = 1) AND (preces.id = realizacija.prece_id) AND (realizacija.datums = (('now'::text)::date - a.a)))"

database type: Postgres

select count(*) 
from(
     select * 
     from(
               select current_date - a as dat from generate_series(2,370) as a) Q1
          join(
               select id, artikuls  from preces where sort_a=true and stop_prece=false) Q2 
          on q2.id=q2.id
          
     left join(
              select prece_id,datums
              from realizacija 
              where veikala_id=1) Q3 
     on q3.datums=q1.dat and q2.id=q3.prece_id) as Q1 
where datums is null

Open in new window

SOLUTION
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of KarlisB

ASKER

speed is almost the same, your example is average 20ms slower.
I am far from being a PostGreSQL specialist, I only posted because nobody else had so far.
do you get the same execution plan?

you have a index on realizacija covering both fields dat and prece_id (1 single index with both fields!)?
if not: create one
if yes: in which order are the 2 fields in the index compared to the selectivity of the values in the fields?
  => to explain: if a single value of datums alone makes less rows to be returned from that table then prece_id, that field should be first in the index (for this query), otherwise it should be prece_id being first, followed by datums

Avatar of KarlisB

ASKER

realizacija table has an multicolumn index in following order:

veikala_id integer (contains about 20 unique values)
prece_id integer (contains about 8000-60000 unique values)
datums date (contains 400 unique dates)

yes, 3 fields in one index. this way select query's are fastest
can you add another index, with only:
prece_id integer (contains about 8000-60000 unique values)
datums date (contains 400 unique dates)

and eventually, add veikala_id at the end.

because this query does not use veikala_id as selective criteria, really.
also, compare the explain plans of your and my syntax for the 2 queries and 2 additional index scenarios, please.
Avatar of KarlisB

ASKER

with field order
it looks like query runs up to 3 secs faster.

"because this query does not use veikala_id as selective criteria, really."
can you explain?

i always thought that it should be fields with less unique values to be out first as primary fields for index.
Avatar of KarlisB

ASKER

doh,  forgot to post order ...


prece_id integer (contains about 8000-60000 unique values)
datums date (contains 400 unique dates)
veikala_id integer (contains about 20 unique values)
Avatar of KarlisB

ASKER

just noticed something, after re-indexing index i used at first, query-planer is using it again.
speed is same as your's ...
looks like field order does not mater ... .
or i might be mistaking something...
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of KarlisB

ASKER

realizacija is.
Avatar of KarlisB

ASKER

it might be that query speed is limited to disk read speed? since table itself is 2-3 GB heavy,
index itself takes up 700 -1000 MB.
 although am having  set up raid 5 with 3 hdd's.
>first time query call takes 160 sec, after cashing 20 secs ;
I fear you won't get any better, unless you plan to run this as a scheduled job, results stored in a table, so the user(s) just query that table ...
or are the parameters for generate_series(2,370)  "dynamic, aka user-input" ?
Avatar of KarlisB

ASKER

sad, but it's true,
  current_date - generate_series(2,370)
is dynamically generated, there is no other way...
Avatar of KarlisB

ASKER

select current_date - a as dat from generate_series(2,370) as a

generates a temporary table with all dates in 368 day range ...
>select current_date - a as dat from generate_series(2,370) as a
you could store that result in a table, and update it once, per day...
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial