asked on

optimize query

Greetings!

I have a following query:
as you see, i have joined atfirst one "item" id table with a auto generated table of dates.
as a result i got table where every item_id has assigned about 365 last dates.
After that i join it to a massive table which contains every day statistics for items. in the end query returns item_id's and dates where data is missing and is not present in statistics table.

I wish to know, is there anything i can do to optimize the query?
or is there another way how to get id's of missing items at certain dates?

item table has about 12 000 records;(after filtering)
statistic table has about 50 000 000 records

first time query call takes 160 sec, after cashing 20 secs ;
am using multicolumn index acording to all 3 fields in where clause on statistics table;

query plan:
"Aggregate (cost=375007.39..375007.40 rows=1 width=0)"
" -> Nested Loop Anti Join (cost=0.01..375007.39 rows=1 width=0)"
" -> Nested Loop (cost=0.00..3130.29 rows=30000 width=8)"
" -> Seq Scan on preces (cost=0.00..2455.29 rows=30 width=4)"
" Filter: (sort_a AND (NOT stop_prece) AND (id = id))"
" -> Function Scan on generate_series a (cost=0.00..12.50 rows=1000 width=4)"
" -> Index Scan using realizacija_pkey on realizacija (cost=0.01..12.90 rows=1 width=8)"
" Index Cond: ((realizacija.veikala_id = 1) AND (preces.id = realizacija.prece_id) AND (realizacija.datums = (('now'::text)::date - a.a)))"

database type: Postgres

select count(*) 
from(
     select * 
     from(
               select current_date - a as dat from generate_series(2,370) as a) Q1
          join(
               select id, artikuls  from preces where sort_a=true and stop_prece=false) Q2 
          on q2.id=q2.id
          
     left join(
              select prece_id,datums
              from realizacija 
              where veikala_id=1) Q3 
     on q3.datums=q1.dat and q2.id=q3.prece_id) as Q1 
where datums is null

Open in new window

SOLUTION

Guy Hengel [angelIII / a3]

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

KarlisB

ASKER

speed is almost the same, your example is average 20ms slower.

Guy Hengel [angelIII / a3]

I am far from being a PostGreSQL specialist, I only posted because nobody else had so far.
do you get the same execution plan?

you have a index on realizacija covering both fields dat and prece_id (1 single index with both fields!)?
if not: create one
if yes: in which order are the 2 fields in the index compared to the selectivity of the values in the fields?
=> to explain: if a single value of datums alone makes less rows to be returned from that table then prece_id, that field should be first in the index (for this query), otherwise it should be prece_id being first, followed by datums

KarlisB

ASKER

realizacija table has an multicolumn index in following order:

veikala_id integer (contains about 20 unique values)
prece_id integer (contains about 8000-60000 unique values)
datums date (contains 400 unique dates)

yes, 3 fields in one index. this way select query's are fastest

Guy Hengel [angelIII / a3]

can you add another index, with only:
prece_id integer (contains about 8000-60000 unique values)
datums date (contains 400 unique dates)

and eventually, add veikala_id at the end.

because this query does not use veikala_id as selective criteria, really.
also, compare the explain plans of your and my syntax for the 2 queries and 2 additional index scenarios, please.

KarlisB

ASKER

with field order
it looks like query runs up to 3 secs faster.

"because this query does not use veikala_id as selective criteria, really."
can you explain?

i always thought that it should be fields with less unique values to be out first as primary fields for index.

KarlisB

ASKER

doh, forgot to post order ...

prece_id integer (contains about 8000-60000 unique values)
datums date (contains 400 unique dates)
veikala_id integer (contains about 20 unique values)

KarlisB

ASKER

just noticed something, after re-indexing index i used at first, query-planer is using it again.
speed is same as your's ...
looks like field order does not mater ... .
or i might be mistaking something...

SOLUTION

Guy Hengel [angelIII / a3]

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

KarlisB

ASKER

realizacija is.

KarlisB

ASKER

it might be that query speed is limited to disk read speed? since table itself is 2-3 GB heavy,
index itself takes up 700 -1000 MB.
although am having set up raid 5 with 3 hdd's.

Guy Hengel [angelIII / a3]

>first time query call takes 160 sec, after cashing 20 secs ;
I fear you won't get any better, unless you plan to run this as a scheduled job, results stored in a table, so the user(s) just query that table ...
or are the parameters for generate_series(2,370) "dynamic, aka user-input" ?

KarlisB

ASKER

sad, but it's true,
current_date - generate_series(2,370)
is dynamically generated, there is no other way...

KarlisB

ASKER

select current_date - a as dat from generate_series(2,370) as a

generates a temporary table with all dates in 368 day range ...

Guy Hengel [angelIII / a3]

>select current_date - a as dat from generate_series(2,370) as a
you could store that result in a table, and update it once, per day...

ASKER CERTIFIED SOLUTION

KarlisB

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial