optimize query

Greetings!

I have a following query:
as you see, i have joined atfirst one "item" id table with a auto generated table of dates.
as a result i got table where every item_id has assigned about 365 last dates.
After that i join it to a massive table which contains every day statistics for items. in the end query returns item_id's and dates where data is missing and is not present in statistics table.

I wish to know, is there anything i can do to optimize the query?
or is there another way how to get id's of missing items at certain dates?

item table has about 12 000 records;(after filtering)
statistic table has about 50 000 000 records

first time query call takes 160 sec, after cashing 20 secs ;
am using multicolumn index acording to all 3 fields in where clause on statistics table;

query plan:
"Aggregate  (cost=375007.39..375007.40 rows=1 width=0)"
"  ->  Nested Loop Anti Join  (cost=0.01..375007.39 rows=1 width=0)"
"        ->  Nested Loop  (cost=0.00..3130.29 rows=30000 width=8)"
"              ->  Seq Scan on preces  (cost=0.00..2455.29 rows=30 width=4)"
"                    Filter: (sort_a AND (NOT stop_prece) AND (id = id))"
"              ->  Function Scan on generate_series a  (cost=0.00..12.50 rows=1000 width=4)"
"        ->  Index Scan using realizacija_pkey on realizacija  (cost=0.01..12.90 rows=1 width=8)"
"              Index Cond: ((realizacija.veikala_id = 1) AND (preces.id = realizacija.prece_id) AND (realizacija.datums = (('now'::text)::date - a.a)))"

database type: Postgres

select count(*) 
from(
     select * 
     from(
               select current_date - a as dat from generate_series(2,370) as a) Q1
          join(
               select id, artikuls  from preces where sort_a=true and stop_prece=false) Q2 
          on q2.id=q2.id
          
     left join(
              select prece_id,datums
              from realizacija 
              where veikala_id=1) Q3 
     on q3.datums=q1.dat and q2.id=q3.prece_id) as Q1 
where datums is null

Open in new window

LVL 3
KarlisBAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Guy Hengel [angelIII / a3]Billing EngineerCommented:
what about this:
select count(*) 
from ( select current_date - a as dat from generate_series(2,370) as a) Q1
join ( select id, artikuls  from preces where sort_a=true and stop_prece=false) Q2 
  on q2.id=q2.id
where not exists(
              select null
              from realizacija r
              where r.veikala_id=1
               and r.datums=q1.dat 
               and q2.id=r.prece_id
    ) 

Open in new window

0
KarlisBAuthor Commented:
speed is almost the same, your example is average 20ms slower.
0
Guy Hengel [angelIII / a3]Billing EngineerCommented:
I am far from being a PostGreSQL specialist, I only posted because nobody else had so far.
do you get the same execution plan?

you have a index on realizacija covering both fields dat and prece_id (1 single index with both fields!)?
if not: create one
if yes: in which order are the 2 fields in the index compared to the selectivity of the values in the fields?
  => to explain: if a single value of datums alone makes less rows to be returned from that table then prece_id, that field should be first in the index (for this query), otherwise it should be prece_id being first, followed by datums

0
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

KarlisBAuthor Commented:
realizacija table has an multicolumn index in following order:

veikala_id integer (contains about 20 unique values)
prece_id integer (contains about 8000-60000 unique values)
datums date (contains 400 unique dates)

yes, 3 fields in one index. this way select query's are fastest
0
Guy Hengel [angelIII / a3]Billing EngineerCommented:
can you add another index, with only:
prece_id integer (contains about 8000-60000 unique values)
datums date (contains 400 unique dates)

and eventually, add veikala_id at the end.

because this query does not use veikala_id as selective criteria, really.
also, compare the explain plans of your and my syntax for the 2 queries and 2 additional index scenarios, please.
0
KarlisBAuthor Commented:
with field order
it looks like query runs up to 3 secs faster.

"because this query does not use veikala_id as selective criteria, really."
can you explain?

i always thought that it should be fields with less unique values to be out first as primary fields for index.
0
KarlisBAuthor Commented:
doh,  forgot to post order ...


prece_id integer (contains about 8000-60000 unique values)
datums date (contains 400 unique dates)
veikala_id integer (contains about 20 unique values)
0
KarlisBAuthor Commented:
just noticed something, after re-indexing index i used at first, query-planer is using it again.
speed is same as your's ...
looks like field order does not mater ... .
or i might be mistaking something...
0
Guy Hengel [angelIII / a3]Billing EngineerCommented:
>i always thought that it should be fields with less unique values to be out first as primary fields for index.
no. fields like prece_id that have many different values should be first, as that will limit the search more effectively.

however, as it does not seems to make and difference for you, it won't matter here, visibly.

>statistic table has about 50 000 000 records
which on is that, actually?
0
KarlisBAuthor Commented:
realizacija is.
0
KarlisBAuthor Commented:
it might be that query speed is limited to disk read speed? since table itself is 2-3 GB heavy,
index itself takes up 700 -1000 MB.
 although am having  set up raid 5 with 3 hdd's.
0
Guy Hengel [angelIII / a3]Billing EngineerCommented:
>first time query call takes 160 sec, after cashing 20 secs ;
I fear you won't get any better, unless you plan to run this as a scheduled job, results stored in a table, so the user(s) just query that table ...
or are the parameters for generate_series(2,370)  "dynamic, aka user-input" ?
0
KarlisBAuthor Commented:
sad, but it's true,
  current_date - generate_series(2,370)
is dynamically generated, there is no other way...
0
KarlisBAuthor Commented:
select current_date - a as dat from generate_series(2,370) as a

generates a temporary table with all dates in 368 day range ...
0
Guy Hengel [angelIII / a3]Billing EngineerCommented:
>select current_date - a as dat from generate_series(2,370) as a
you could store that result in a table, and update it once, per day...
0
KarlisBAuthor Commented:
do the maths my Angel.
in simple terms, selecting dates from a table(has 370 records) takes up 30-65 ms ,generating it 30.
so, why to make another table if gain is none?
I suppose i cant do nothing more, to gain speed. well, except table partitioning. to which i wont resolve by any means.

Ok, thanks for trying.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Databases

From novice to tech pro — start learning today.