We help IT Professionals succeed at work.

MySQL Database design and query help

hbiz
hbiz asked
on
I have the following tables:

users (id, first_name, last_name)
category (id, name)
rank(id, user_id, rank)

Each user can belong to several categories.  And all users are in the rank table and have a value between 0.0 and 1.0, where 0 is the lowest rank and 1 is the highest. I’d like to setup additional tables to create the following webpage:

A visitor to the page (identified by either one of the recorded ids in the user table, or a numeric representation of their ip address) chooses a category and is presented with two randomly chosen users from the users table such that:

1) the visiting user_id has not seen this pairing in a period of 24 hours
2) the two users belong to the chosen category
3) the two users are within 1 rank value of each other.  Let me explain that last criteria - if all the ranks were sorted, the two chosen users would have adjacent ranks.

This is a hard one and I can’t for the life of me figure it out how to design my tables and query to answer this efficiently

I truly appreciate any help on this front.

Thanks
Comment
Watch Question

Kevin CrossChief Technology Officer
CERTIFIED EXPERT
Most Valuable Expert 2011

Commented:
hbiz,

Hopefully we can help. It appears this is academic in nature, so will try my best to explain so you learn HOW for yourself. Therefore, some of my responses will be probing questions, whose answers should help you determine the table design.

1) if you need to keep pairings unique over a specific time period, you need to consider add a column that has a timestamp. Additionally, where are you storing the "pairings?"
2) how are users associated to categories?
3) after handling the other aspects of this, it would seem like you can handle the adjacent requirement by selecting candidates for pairing based on an order by rank. I guess you could use that to pick a pair, then if they have been together, then move to the next two. Remember MySQL has LIMIT which has an OFFSET. So you could skip two rows, then grab the next two easily.

Hope that helps get started.

Author

Commented:
mwvisa1,

thanks for your responses but this is def not an academic question, it's for a personal project I'm working on. I have tried many variations already but they all suffer from poor efficiency. I phrased the question in that matter and omitted the details of my specific project in an effort to get to an answer to the core algorithm.

but to prove that here's my query at the moment which I don't think is efficient enough (note it is using different tables that are specific to my project but the general concept is the same as my initial question)



SELECT lowerImages.image_id AS lower_image,
       max(higherImages.image_id) AS higher_image
FROM global_rank AS lowerImages, global_rank AS higherImages
WHERE  lowerImages.rank  <  higherImages.rank

AND 1  NOT IN (select 1 from ranked_up where
    lowerImages.image_id = ranked_up.image_id
    AND ranked_up.user_id = $user_id
    AND ranked_up.created_at > DATE_SUB(NOW(), INTERVAL 1 DAY))

AND 1 NOT IN (
    SELECT 1 from matchups where user_id = $userId
            AND lower_image_id = lowerImages.image_id
            AND higher_image_id = higherImages.image_id
            UNION
            SELECT 1 from matchups where user_id = $user_id
            AND lower_image_id = higherImages.image_id
            AND higher_image_id = lowerImages.image_id
)
GROUP BY 1


Hopefully this proves that I'm trying to solve a real world problem and not doing homework :)

thanks for your time.





PS I forgot to add one more criteria to the initial quesiton:

4) the visiting user is presented with two users and with the one on the left having lower rank the one on the right. The user can click on the left image and that will swap the two users' ranks (essentially ranking up user on the left). This is tracked in a table called rankedUp.  

The final criteria in selecting the two random users is that the one on the left (lower rank) must not have been ranked up by the visiting user in the last 24 hours.


Kevin CrossChief Technology Officer
CERTIFIED EXPERT
Most Valuable Expert 2011

Commented:
hbiz, thanks for sharing the query. I am not sure it is clear to me how you are storing the pairings still, but it sounds like from your comment that the 24 hour test is based on rank up and not necessarily pairing. As far as performance, here are some notes (some are best practices and not necessarily required for efficiency):

SELECT lowerImages.image_id AS lower_image,
       max(higherImages.image_id) AS higher_image
-- I would use ANSI-compliant joins
-- global_rank AS lowerImages JOIN global_rank AS higherImages

FROM global_rank AS lowerImages, global_rank AS higherImages
-- this would become the ON of the JOIN above
-- the Cartesian Product based on "<" will have poorer performance, typically *[1]

WHERE  lowerImages.rank  <  higherImages.rank

-- I would try out NOT EXISTS or JOIN
-- A JOIN to rows that are "<=" may work better *[2]

AND 1  NOT IN (select 1 from ranked_up where
    lowerImages.image_id = ranked_up.image_id
    AND ranked_up.user_id = $user_id
    AND ranked_up.created_at > DATE_SUB(NOW(), INTERVAL 1 DAY))

-- I see now, these are the pairings. *[3]
AND 1 NOT IN (
    SELECT 1 from matchups where user_id = $userId
            AND lower_image_id = lowerImages.image_id
            AND higher_image_id = higherImages.image_id
            UNION
            SELECT 1 from matchups where user_id = $user_id
            AND lower_image_id = higherImages.image_id
            AND higher_image_id = lowerImages.image_id
)

-- This is VERY poor practice that you are allowed to get away with in MySQL
-- First, using column index does not work here, so this is not column 1, but literal '1'
-- MySQL lets you NOT include all columns in SELECT that are not aggregated
-- We may get rid of the need for this totally, but otherwise include ALL non-aggregate columns

GROUP BY 1

One way to escape the GROUP BY and possibly help with performance:

SELECT lo.image_id AS lower_image
-- rank should be indexed DESC or combination of rank, image_id if you truly need MAX image_id
-- sub-queries sometimes perform better, depending on the scenario.
     , (SELECT hi.image_id
         FROM global_rank AS hi
         WHERE hi.rank > lo.rank
         AND NOT EXISTS (
            SELECT 1
            FROM matchups AS m
            WHERE m.lower_image_id IN (lo.image_id, hi.image_id)
            AND m.higher_image_id IN (hi.image_id, lo.image_id)
         )
         ORDER BY hi.rank DESC LIMIT 1)  AS higher_image
FROM global_rank AS lo
-- this works if every user will be in ranked_up
JOIN (
   SELECT image_id
   FROM ranked_up
   WHERE user_id = $user_id
   AND created_at <= DATE_SUB(NOW(), INTERVAL 1 DAY)
) AS rk ON rk.image_id = lo.image_id
;


Hopefully the combination of techniques shown helps you find the best combination for performance. I would use EXPLAIN in front of your current SELECT to get a better idea of what it is doing or not doing  performance wise. Then you can try on something like I showed.

Kevin
Commented:
Thanks Kevin for the suggestions, I've reworked my approach from scratch and perhaps  you can give me feedback on this new design

Tables:

user
images
matchups ( tracks which image pairs have been displayed to which user)
global_rank (tracks rankings for (user,rank_type) pairs)
global_rank_random_map - a table that has a  global_rank_id for every entry in the global_rank table. This is kept in sync using triggers

the rankedUp table has been removed and instead I've added a swapped column in the matchups table to determine if the visiting user has swapped the rankings of the two images.

And here's the query to get those random images:

SELECT low.image_id as low_image,
       low.rank as low_rank,
       high.image_id as high_image,
       high.rank as high_rank

FROM global_rank low,
     (SELECT grm.global_rank_id AS id from global_rank_random_map grm,
                 (SELECT CEIL(RAND(NOW())* (SELECT MAX(grm2.id)
                  FROM global_rank_random_map grm2 WHERE grm2.rank_type = 2)) AS id) boo
      WHERE boo.id = grm.id) random_low,
     global_rank high

WHERE       low.id = random_low.id
            AND (   high.rank = (SELECT MIN(rank) as rank FROM global_rank WHERE rank > low.rank and rank_type = 2)
                 OR high.rank = (SELECT MAX(rank) as rank FROM global_rank WHERE rank < low.rank and rank_type = 2))

            AND NOT EXISTS (
                            SELECT 1 from matchups where user_id = 3494329885 AND lower_image_id = low.image_id and higher_image_id = high.image_id
                            UNION
                            SELECT 1 from matchups where user_id = 3494329885 AND lower_image_id = high.image_id and higher_image_id = low.image_id
                         )
            AND NOT EXISTS (
                              SELECT 1 FROM matchups
                              WHERE ((low.rank < high.rank and low.image_id = matchups.lower_image_id)  or (low.rank > high.rank and high.image_id = matchups.lower_image_id))
                              AND matchups.user_id = 3494329885
                              AND matchups.swap_images = 1
                              AND matchups.created_at > DATE_SUB(NOW(), INTERVAL 1 DAY)
                            )


This query is not guaranteed to return a result but running it several times should eventually get you something. It's the most efficient way I could think of getting random images that satisfy the criteria.

And to re-iterate, the requirements are:

1- two random images
2- image 1 and image 2 are within 1 rank of each other if you were to sort the global rankings
3- the two images have not had their ranks swapped by the current user id in the last 24 hours
4- the two images have not been matched up and presented to the current user

Author

Commented:
seems to be working for me.

Explore More ContentExplore courses, solutions, and other research materials related to this topic.