SQL query on a simple table to return realted items

Posted on 2011-10-03
Last Modified: 2012-05-12
Hi all,

I'm trying to do a query on the following table:

CREATE TABLE `filter_index` (
  `link_id` INT(10) UNSIGNED NOT NULL,
  PRIMARY KEY (`fid`,`link_id`)

Open in new window

It's a many 2 many table - about 6 million rows - fid range is about 650.

Now, the query I'm hoping to do is to lookup an fid (say 20) and return all fid's of which had link_id's associated with the fid 20.

Think of it as 'Ok, now you've picked filter option 20, here's the rest of the filter options available to you from the remaining pool'

Experimental, if there's a smarter way to do this, I'm all ears :)

Question by:dolythgoe
    LVL 59

    Assisted Solution

    by:Kevin Cross
    Cheers, David!

    Is this what you are looking for?
    It uses the initial fid to find link_id's that you then use in the outer query.

    SELECT `fid`, `link_id`
    FROM `filter_index` fi1
       SELECT 1
       FROM `filter_index` fi2
       WHERE fi2.`fid` = 20
       AND fi2.`link_id` = fi1.`link_id`

    Open in new window


    Author Comment

    Hi Kevin!

    Cool thanks for that - I did an explain:

    id  select_type         table  type    possible_keys  key      key_len  ref                             rows     Extra                    
    1   PRIMARY             fi1    index                  PRIMARY  6                                        4040187  Using where; Using index  
    2   DEPENDENT SUBQUERY  fi2    eq_ref  PRIMARY        PRIMARY  6        const,rampants_dbp.fi1.link_id  1        Using index              

    So it's doing a full table scan, do you think it's possible to improve on that with forcing an index?

    Thanks again for your help :)
    LVL 24

    Accepted Solution

    Try with a join instead of a subquery:

    select fi1.fid, fi1.link_id
    from `filter_index` fi1 join `filter_index` fi2 on (fi1.link_id=fi2.link_id)
    where fi2.fid=20

    I think you will still need a separate index on link_id though.

    Author Comment

    Ok, so that query was really fast but now I have the annoyance of the GROUP BY or DISTINCT command to make sure it only gives me the unique fid's. Don't need the link_id's

    I tried a few things:

    FROM `filter_index` fi1 JOIN `filter_index` fi2 ON (fi1.link_id=fi2.link_id)
    WHERE fi2.fid=20
    GROUP BY fi1.fid

    Open in new window

    8 seconds (without GROUP BY = 0.05)

    Which the query profiler in SQL yog says most of the time is spent 'Sending data' <- I'm still trying to understand what that actually means!


    Now, adding an index to link_id...

    Re-run the query with GROUP BY


    0.045 seconds! MUCH BETTER!

    FROM `filter_index` fi1 JOIN `filter_index` fi2 ON (fi1.link_id=fi2.link_id)
    WHERE fi2.fid=20
    GROUP BY fi1.fid

    Open in new window


    id  select_type  table  type  possible_keys      key        key_len  ref                       rows   Extra                                        
    1   SIMPLE       fi2    ref   PRIMARY,NewIndex1  PRIMARY    2        const                     13322  Using index; Using temporary; Using filesort  
    1   SIMPLE       fi1    ref   NewIndex1          NewIndex1  4        rampants_dbp.fi2.link_id  3                                                    

    Much better - uses filesort and temporary but that's cool as it's in memory.

    From here I want to entertain the small matter of 'AND' with different fids.

    So this is for an AJAX filter 'grey out' function - so I can ask mysql to give me all appliceable filter options that are still available if a number of filters are picked. That doesn't stop there though, the fid's are in groups so (1,2,3,4,5,6) are colours for example and (7,8,9,10,11,12) are sizes <- like ecommerce product selector.

    I'll give you the points but if you could share some brainpower on this I would be most grateful!

    SO there are 20 groups, 650 filter_ids and 1-2 million link_id's

    Each group has a number of filter_id's which have a number of link_ids.

    If filter_id's are picked within a group GROUP 1 (2, 4, 5) selected and GROUP 2 (8,9,10) selected, I want to be able to query so it looks at like:

    SHOW ME ALL AVALIABLE fids FROM links subset WHEN fid IS (1 OR 2 OR 3) AND (3 OR 4 OR 5)

    An example that uses this type of thing is - have a play with their product filter and you'll see what I mean - it greys out other options as and when you pick filter options. Great functionality but at how much computing cost I wonder..

    I can't thank you enough for helping me out!

    LVL 59

    Expert Comment

    by:Kevin Cross
    Very good catch on the JOIN.

    Author Comment

    Ahha so I've put together some answers from previous posts and came up with this (think Johann picked up on this one):

    SELECT fi1.fid
    FROM `filter_index` fi1 
    JOIN `filter_index` fi2 ON (fi1.link_id=fi2.link_id)
    JOIN `filter_index` fi3 ON (fi1.link_id=fi3.link_id)
    JOIN `filter_index` fi4 ON (fi1.link_id=fi4.link_id)
    JOIN `filter_index` fi5 ON (fi1.link_id=fi5.link_id)
    WHERE fi2.fid IN (50, 25, 30)
    AND fi3.fid IN (38)
    AND fi4.fid IN (5)
    AND fi5.fid IN (4)
    GROUP BY fi1.fid

    Open in new window

    Where I'm joining for every group and can use IN() to do the ORs and the JOINs to do the ANDs..

    This seems to work pretty fast - is that the holy grail I seek do you think?
    LVL 59

    Expert Comment

    by:Kevin Cross
    Seems so given your requirement and millions of rows. Very nice work, johanntagle!
    LVL 24

    Expert Comment

    Thanks mwvisa1!

    dolythgoe - yes most of the time joins do better than subqueries provided the join columns are also indexed.
    LVL 24

    Expert Comment

    oh, and glad I could help! =)

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    Fore-Foreword Today (2016) Maxmind has a new approach to the distribution of its data sets.  This article may be obsolete.  Instead of using the examples here, have a look at the MaxMind API ( …
    Foreword This is an old article.  Instead of using the MySQL extension that was used in the original code examples, please choose one of the currently supported database extensions instead.  More information is available here: MySQLi / PDO (http://…
    Internet Business Fax to Email Made Easy - With eFax Corporate (, you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
    Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

    760 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    7 Experts available now in Live!

    Get 1:1 Help Now