We help IT Professionals succeed at work.

how do I formulate a MySQL query to do the following?

bitt3n
bitt3n asked
on
372 Views
Last Modified: 2010-04-21
I want to formulate a query using the following tables:

table user_words, containing fields user_id, word_id
table word_tags, containing fields word_id, tag_id

I will use the following constant variables in the query:
$requesting_user_id
$tag_id1
$tag_id2
$tag_id3

I want the query to select word_ids based on the following criteria:

For each selected word_id, there exists in the word_tags table a record with that word_id and a tag_id that equals one each of $tag_id1, $tag_id2, and $tag_id3.
For each selected word_id, there exists a user_id such that the record $requesting_user_id/word_id exists in user_words

So far I have this query, which I expect should work if I am using only $tag_id1 and not $tag_id2 or $tag_id3:

SELECT user_words.word_id FROM user_words, word_tags WHERE word_tags.tag_id='$tag_id1' AND word_tags.word_id=user_words.word_id AND user_words.user_id='$requesting_user_id'

I want to extend this query so that it returns only those records that also satisfy the $tag_id2 and $tag_id3 criteria. How do I do this?
Comment
Watch Question

Commented:
Is this what you mean?

SELECT user_words.word_id FROM user_words, word_tags WHERE (word_tags.tag_id='$tag_id1' OR word_tags.tag_id='$tag_id2' OR word_tags.tag_id='$tag_id3')AND word_tags.word_id=user_words.word_id AND user_words.user_id='$requesting_user_id'

This will select the record if word_tags.tag_id = any of one the thee id's...

Author

Commented:
no I need it to select the record only if word_tags.tag_id = ALL of the three id's, ie there are three records in word_tags with word_id=$requesting_user_id, and each of these records has word_tags.tag_id= $tag_id1, $tag_id2, and $tag_id3 respectively.

Commented:
I'm still a bit confused but try the code below. If that doesn't work, perhaps you could post the table structure and some sample data?

Try this:

SELECT user_words.word_id
FROM user_words
WHERE user_words.user_id='$requesting_user_id' AND (
  SELECT count(*)
  FROM word_tags
  WHERE word_tags.tag_id='$tag_id1' AND
        word_tags.word_id=user_words.word_id) > 0 AND (
  SELECT count(*)
  FROM word_tags
  WHERE word_tags.tag_id='$tag_id2' AND
        word_tags.word_id=user_words.word_id) > 0 AND (
  SELECT count(*)
  FROM word_tags
  WHERE word_tags.tag_id='$tag_id3' AND
        word_tags.word_id=user_words.word_id) > 0

Author

Commented:
that looks like it works, although I still want to play with it a bit since my tables don't have a lot of data yet.

to be more specific, lets say we have a user_words table with (user_id,word_id) pairs:

1,20
2,34
3,34
5,18
5,39

and a word_tags table with (word_id,tag_id) pairs

20,934
39,230
34,392
20,674
39,392
4,732
20,112

then if we run the query using $requesting_user_id=1 and $tag_id1 = 934 and $tag_id2=674 and $tag_id3=112, then the query will return word_id=20.

if we run the query with $requesting_user_id=5 and $tag_id1=230 and $tag_id2=563 and #tag_id3=392, no result will be returned because the record (39,563) is not in the word_tags table.

I'm planning on using an indefinite number of tag variables (from 1 to approximately 10), and constructing the query in PHP to accommodate whatever number of tags is required for any given search.

It would be nice if there were some more compact way of doing it that doesn't involve subqueries, but that might not be possible. The word_tags table is probably going to end up being fairly large, so I am concerned whether this query might be expensive in that case. (For example, each of the subqueries must go through the whole table to count all the instances, when it would be better if it stops once it finds a single instance, since each (word_id,tag_id) pair will only occur once in the table.)

If there are any ideas on how to do this without the subqueries that would be great. Otherwise the solution provided does appear to do the trick. Thanks for your help.
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Author

Commented:
ok great. that looks like exactly what I want.

I have a question about DISTINCT. Given that in word_tags, any given (word_id,tag_id) pair will always appear a maximum of one time, is it true that using DISTINCT is for purposes of efficiency, since once the query finds one instance of this pair it will stop looking for another? That is, DISTINCT does not affect the result, correct? (This is just for purposes of my own understanding, since it would make sense to use it for efficiency if that is what DISTINCT is for.)

also let's say that in the future I wanted to combine two of these queries, so that I return all the u.word_ids for one set of tags (934,674,112 in the above case), and for some other set of tags (203,409,230, or something like that), and order them numerically using an ORDER BY clause. I assume I would just use two subqueries somehow in this case? I would not want any duplicate results (ie, if both the first and second queries return u.word_id 23, I would only want the query to return once instance of u.word_id=23). (If doing this is complicated, I am happy to open a new question about it, since you answered the original question already.)

Commented:
DISTINCT limits the resulting output so that no duplicate records are returned. Note that it is the OUTPUT rows that is distinct not the SOURCE that must be distinct.  If you have a table where one column has the same value for every row, and you only select that column, you will only get one row returned, but if you select that same column plus another column then you will get a row for each distinct column pair.

I don't know if it is more efficient or not, I used it so that it would not return 3 rows for each user_id. Otherwise the query would return a row for each tag_id found (934, 674, 112).

Expanding the query...

If you wanted to search for either tag_id set for the same user_id:

SELECT DISTINCT u.word_id
FROM user_words u INNER JOIN word_tags w1 on u.word_id=w1.word_id
 INNER JOIN word_tags w2 on u.word_id=w2.word_id
 INNER JOIN word_tags w3 on u.word_id=w3.word_id
WHERE u.user_id=1 AND
 (w1.tag_id=934 AND
 w2.tag_id=674 AND
 w3.tag_id=112)
 OR
 (w1.tag_id=209 AND
 w2.tag_id=409 AND
 w3.tag_id=230)


If you wanted to return a record for each set found for the same user_id (will include a record for each user_id, for each tag_id set):

SELECT DISTINCT u.word_id, CONCAT(w1.tag_id, w2.tag_id, w3.tag_id) as tag_id_combined
FROM user_words u INNER JOIN word_tags w1 on u.word_id=w1.word_id
 INNER JOIN word_tags w2 on u.word_id=w2.word_id
 INNER JOIN word_tags w3 on u.word_id=w3.word_id
WHERE u.user_id=1 AND
 (w1.tag_id=934 AND
 w2.tag_id=674 AND
 w3.tag_id=112)
 OR
 (w1.tag_id=209 AND
 w2.tag_id=409 AND
 w3.tag_id=230)


If you want to return a list of all users that have either set (will include a record for each user_id, for each tag_id set):

SELECT DISTINCT u.user_id, u.word_id, CONCAT(w1.tag_id, w2.tag_id, w3.tag_id) as tag_id_combined
FROM user_words u INNER JOIN word_tags w1 on u.word_id=w1.word_id
 INNER JOIN word_tags w2 on u.word_id=w2.word_id
 INNER JOIN word_tags w3 on u.word_id=w3.word_id
WHERE (w1.tag_id=934 AND
 w2.tag_id=674 AND
 w3.tag_id=112) OR
 (w1.tag_id=209 AND
 w2.tag_id=409 AND
 w3.tag_id=230)

Author

Commented:
Ok, I get it, that looks great.

This is just a minor detail, but fooling around with some data, it doesn't actually look like DISTINCT is necessary in the original query using joins. If I run the query without DISTINCT, I still only get one instance of word_id=20. I think this is because the query is selecting word_ids that satisfy all three conditions simultaneously, rather than selecting word_ids that satisfy any of the conditions independently. If I remove one of the requirements, eg w3.tag_id=112, I then get 3 records returned (word_id=20 three times), because three rows in table w3 now satisfy the query (instead of just the one with tag_id=112). Selecting for word_id, w1.tag_id, w2.tag_id, w3.tag_id having removed that requirement, I get

word_id       tag_id       tag_id       tag_id
20       934       674       934
20       934       674       674
20       934       674       112

Continuing with this pattern, removing 2 requirements similarly gives 9 results (3X3) and removing all three gives 27 results (3X3X3). So it looks like as long as the pair combinations are unique, so that there is only one tag_id for any given word_id in the word_tags table, and I limit each joined word_tags table by tag_id, I do not need DISTINCT, unless I am misunderstanding something (which is hardly impossible).

On the other hand, I think DISTINCT will be important for doing queries using the syntax in your subsequent examples, such as

WHERE (w1.tag_id=934 AND
w2.tag_id=674 AND
w3.tag_id=112) OR
(w1.tag_id=209 AND
w2.tag_id=409 AND
w3.tag_id=230)

because in this case, I may actually want to specify something like

WHERE (w1.tag_id=934 AND
w2.tag_id=674 AND
w3.tag_id=112) OR
(w1.tag_id=209)

and in that case, the (w1.tag_id=209) part will give me 3X3 more results than I want, because I am joining two tables (w2 and w3) that are not being limited to a specific tag id in the OR (w1.tag_id=209) part of the clause.

Does this sound right?



 

Commented:
Sounds right, because you are specifying which table must equal each value (w1.tag_id=934, w2.tag_id=674, etc...) it appears that DISTINCT is not required as long as there is no duplicate data. Say you had two records where tag_id=112, then it would return two records.

Author

Commented:
Thanks this was excellent help.

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.