MYSQL Slow Nested Set Categories to Products Query

stilliard
stilliard used Ask the Experts™
on
Hi all,
Wondered if anyone could help me speed up this sql query im working on.
The basis of the query is to list categories from a table who have products relating to them, or categories relating to them who at some nested sub level will eventually relate to products.
To do this i was using a parent-child sort of adjacency list model, but i have now added support for the nested set model, using left right and level columns, so i can easily select all categories and sub categories relating to a category and then check if they are the parent category of any products.  
The code im using i have attempted to optimize but im very new to using this "nested list model" and so im not 100% im using it correctly, or maybe not efficiently.
The query has to run on arround 1,000 categories, and arround 30,000 products. But its taking minutes to load.
Any Ideas to change the sql, or go about i differently are welcome!

  SELECT parent.id, parent.category_name, parent.relatedid, parent.sort_order,
  IF((
    SELECT node.id
    FROM categories node
    WHERE node.lft BETWEEN parent.lft AND parent.rgt
    AND EXISTS (
        SELECT null FROM products p
        WHERE p.category = node.id
    )
    LIMIT 0, 1
  ) > 0, 1, 0) AS productsFound
  FROM categories parent
  HAVING productsFound > 0
  ORDER BY ABS(parent.sort_order), parent.sort_order, parent.category_name, parent.id

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Expert of the Quarter 2010
Expert of the Year 2010

Commented:
Try this?

  SELECT parent.id, parent.category_name, parent.relatedid, parent.sort_order,
    case when exists (
      select 1 from categories node inner join products p on p.category = node.id
      WHERE node.lft BETWEEN parent.lft AND parent.rgt) then 1 else 0 end AS productsFound
  FROM categories parent
  HAVING productsFound > 0
  ORDER BY ABS(parent.sort_order), parent.sort_order, parent.category_name, parent.id

Author

Commented:
I like the idea of using join istead of exist cyberkiwi, unfortunatly the new query is a little slower.
The query i posted works in 0.07 to 0.08 seconds on a much smaller test section of data.
While your new query unfortunatly takes between 0.10 and 0.12 seconds.

Any other ideas?
Expert of the Quarter 2010
Expert of the Year 2010

Commented:
I could be wrong, but I wouldn't take the timing from a small dataset as an indication for performance on a larger one.
If anything the query parsing and optimising takes a huge chunk of time on a small dataset.
Top Expert 2011
Commented:
which version of mysql ?
SELECT distinct parent.id, parent.category_name, parent.relatedid, parent.sort_order,
    1 AS productsFound
  FROM categories parent
  inner join (select node.lft 
                from categories node 
               inner join products p on p.category = node.id
               group by node.lft) as node
   on node.lft BETWEEN parent.lft AND parent.rgt
  
  ORDER BY ABS(parent.sort_order), parent.sort_order, parent.category_name, parent.id

Open in new window

Author

Commented:
@cyberkiwi :
You're right it was not a fair sample, my apologies, so i tested your query and unfortunately it took 1.2 minutes to load with all my products.

However...

@Lowfatspread :
BRILLIANT !
Your query worked like a dream, 0.004 seconds, awesome!

Cheers for this, wish i could award much more than 500 points for this, been working on it for ages!

Thanks both of you for your help on this, gotta give it to Lowfatspread though.

Author

Commented:
YES
Top Expert 2011

Commented:
i was wondering wether the distinct was actually necessary....
only you know how your data is structured and whether multiple rows could arise now or in the future...

Author

Commented:
The distinct was not really neccessary, but as you just explained, i can see why you put it in there.
The new query works very fast, but i forgot something while righting the original query.
On the products table, we have a category field for the category id, but we also have a category_2, category_3, category_4, category_5.
So for this im just showing the top level categories for example, and im using a union statement to join the roducts table 5 times selecting the category id from the products table into the same field 5 times.

FAST QUERY, but only one category:
[code]
  SELECT distinct parent.id, parent.category_name, parent.relatedid, parent.sort_order,
  IF((
    SELECT 1
    FROM categories sub
    WHERE sub.status != 'deleted'
    AND sub.status != 'draft'
    AND sub.relatedid = parent.id
    LIMIT 0, 1
  ) > 0, 1, 0) AS subsFound
  FROM categories parent
  INNER JOIN (
    SELECT node.lft
    FROM categories node
    INNER JOIN products p ON p.category = node.id
    WHERE node.status != 'deleted'
    AND node.status != 'draft'
    AND p.status != 'deleted'
    AND p.status != 'draft'
    GROUP BY node.lft
  ) AS node
  ON node.lft BETWEEN parent.lft AND parent.rgt
  WHERE parent.relatedid = '0'
  ORDER BY ABS(parent.sort_order), parent.sort_order, parent.category_name, parent.id
[/code]

New SLOW SQL, but 5 related categorys to each product :
[code]
  SELECT distinct parent.id, parent.category_name, parent.relatedid, parent.sort_order,
  IF((
    SELECT 1
    FROM categories sub
    WHERE sub.status != 'deleted'
    AND sub.status != 'draft'
    AND sub.relatedid = parent.id
    LIMIT 0, 1
  ) > 0, 1, 0) AS subsFound
  FROM categories parent
  INNER JOIN (
    SELECT node.lft
    FROM categories node
    INNER JOIN (
          SELECT category AS cat_id, status, id FROM products
        UNION DISTINCT
          SELECT category_2 AS cat_id, status, id FROM products
        UNION DISTINCT
          SELECT category_3 AS cat_id, status, id FROM products
        UNION DISTINCT
          SELECT category_4 AS cat_id, status, id FROM products
        UNION DISTINCT
          SELECT category_5 AS cat_id, status, id FROM products
    ) p ON p.cat_id = node.id
    WHERE node.status != 'deleted'
    AND node.status != 'draft'
    AND p.status != 'deleted'
    AND p.status != 'draft'
    GROUP BY node.lft
  ) AS node
  ON node.lft BETWEEN parent.lft AND parent.rgt
  WHERE parent.relatedid = '0'
  ORDER BY ABS(parent.sort_order), parent.sort_order, parent.category_name, parent.id
[/code]

Any speed tips, let me know cheers, although what you have helped me with will work great for now.
Top Expert 2011

Commented:
have you considered these?

  SELECT distinct parent.id, parent.category_name, '0' as relatedid, parent.sort_order, 
  IF((
    SELECT 1
    FROM categories sub
    WHERE sub.status != 'deleted' 
    AND sub.status != 'draft'
    AND sub.relatedid = parent.id
    LIMIT 0, 1
  ) > 0, 1, 0) AS subsFound 
  FROM categories parent
  INNER JOIN (
    SELECT node.lft 
    FROM categories node 
    INNER JOIN (select category from
               (select case when status in ('deleted','draft') then nullif(1,1) 
                            else case n when 1 then category 
                                        when 2 then category_2
                                        when 3 then category_3
                                        when 4 then category_4
                                        else category_5
                                        end
                             end as category
                  from products
                      ,(select 1 as n union all select 2 union all select 3
                        union all select 4 union all select 5) as x
               ) as Y 
                Where category is not null
                group by category
              ) as p
      ON p.category = node.id
    WHERE node.status != 'deleted' 
    AND node.status != 'draft'  
    GROUP BY node.lft
  ) AS node
  ON node.lft BETWEEN parent.lft AND parent.rgt 
  WHERE parent.relatedid = '0'
  ORDER BY ABS(parent.sort_order), parent.sort_order, parent.category_name, parent.id 

OR

SELECT distinct parent.id, parent.category_name, '0' as relatedid, parent.sort_order, 
  coalesce(case when sub.num > 1 then 1 else 0 end,0)  
   AS subsFound 
  FROM categories parent
  left outer join (
       select relatedid,count(nullif(NULLIF(status,'deleted'),'draft')) as num
         from categories
        group by relatedid
          ) as sub
     on sub.relatedid=parent.id

  INNER JOIN (
    SELECT node.lft 
    FROM categories node 
    INNER JOIN (select category from
               (select case when status in ('deleted','draft') then nullif(1,1) 
                            else case n when 1 then category 
                                        when 2 then category_2
                                        when 3 then category_3
                                        when 4 then category_4
                                        else category_5
                                        end
                             end as category
                  from products
                      ,(select 1 as n union all select 2 union all select 3
                        union all select 4 union all select 5) as x
               ) as Y 
                Where category is not null
                group by category
              ) as p
      ON p.category = node.id
    WHERE node.status != 'deleted' 
    AND node.status != 'draft'  
    GROUP BY node.lft
  ) AS node
  ON node.lft BETWEEN parent.lft AND parent.rgt 
  WHERE parent.relatedid = '0'
  ORDER BY ABS(parent.sort_order), parent.sort_order, parent.category_name, parent.id 

Open in new window

Author

Commented:
Cheers Lowfatspread,
Both those querys speed it up alot, the second ones faster, 0.21 seconds compared to 0.24 for the first, so not much in it, but every little helps.
Thanks for your help with these querys!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial