Improve company productivity with a Business Account.Sign Up

x
?
Solved

mySQL FULLTEXT search with custom weighting

Posted on 2011-09-29
8
Medium Priority
?
567 Views
Last Modified: 2012-05-12
Hi all,

I've been playing around with FULLTEXT to apply my own weighting.

Here's the query:

SELECT link_id, link_title,
	((MATCH(link_title) AGAINST ('word search' IN BOOLEAN MODE)* 5) +
	(MATCH(link_tags) AGAINST ('word search' IN BOOLEAN MODE)) +
	(MATCH(link_category) AGAINST ('word search' IN BOOLEAN MODE)* 3)) AS score
FROM textdump
WHERE
	MATCH(link_title, link_tags, link_category) AGAINST ('word search' IN BOOLEAN MODE)
ORDER BY
	score DESC
LIMIT
	0,16;

Open in new window


There's 4 FULLTEXT indexes:

link_title,
link_tags,
link_category,
link_title, link_tags, link_category

On 1.6 million rows

This takes 8 seconds and I'm wondering if there's a way to apply a weighting within the MATCH() function or something so I don't need to use the 3 SELECT MATCH()'S to speed it up - or just a much better way!

EXPLAIN doesn't say much:

id  select_type  table     type      possible_keys                 key                           key_len  ref  rows  Extra                        
1   SIMPLE       textdump  fulltext  fulltext_title_tags_category  fulltext_title_tags_category  0             1     Using where; Using filesort  

Any pointers much appreciated!


0
Comment
Question by:dolythgoe
  • 4
  • 4
8 Comments
 
LVL 61

Accepted Solution

by:
Kevin Cross earned 2000 total points
ID: 36819023
I do not believe there is a better way to get the score other than I do not believe you want the IN BOOLEAN MODE there. The IN BOOLEAN MODE is for the WHERE condition where as you want the SELECT to return a "score" which you can then weight based on which column it was. Because of the nature of FULLTEXT, I am not sure what more you could do unless you have a more restrictive filter you can put on the data first.
0
 

Author Comment

by:dolythgoe
ID: 36819039
hmm you say more restrictive filter - perhaps a subselect might work better in this case? get a quick match first and then do a series fo smaller matches with scores within those? Just speculating but a theory...
0
 

Author Comment

by:dolythgoe
ID: 36819045
..actually that's what the WHERE is doing anyway, forget that!
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 61

Expert Comment

by:Kevin Cross
ID: 36819055
Yes, I was trying to think of a good way to show it, but not knowing the data. I also saw in your other question you are using PHP and trying to see if there is something you can do with that. One thought is that you start with just:

SELECT link_id, link_title, (MATCH(link_title) AGAINST ('word search') score
FROM textdump
WHERE MATCH(link_title) AGAINST ('word search' IN BOOLEAN MODE)
ORDER BY score DESC
LIMIT 0,16;

If you do not get 16 rows, then you can move on to grab the next rows from link_category matches.

Not sure if that works, but just a thought.
0
 
LVL 61

Expert Comment

by:Kevin Cross
ID: 36819070
Yes, you are sub selecting based on what matches, but then you have to get scores for that sub select, order by the whole row set, then limit to 16 rows. If this is a much more than 16 then the order by process can be your biggest headache. If the weighting of the scores always pushes a out 16 matches on link_title my suggestion may work out.
0
 

Author Comment

by:dolythgoe
ID: 36819083
Yes, this one is actually for a search itself but you know what, I'm kind of glad this has turned out not to be the best way as I developed a full set of indexing scripts to create many-to-many id tables for word_id 2 link_id which is scalable and scans pretty quick (aslong as you assign enough temp_table_size memory) which also scores in the way I want. I was getting scared that fulltext was faster and I guess it is when using it's own scoring and index but without being able to influence the weighting used in its algorithm you're stuck with MySQL's interpretation.

A few upgrades to fulltext would be handy, a way of indexing across multiple columns and add a multiplier to the score for those columns would be awesome!

Thanks for your help again on this subject...never an easy one!
0
 
LVL 61

Expert Comment

by:Kevin Cross
ID: 36819103
Very nice. Glad you got what you needed.
Best regards and happy coding,

Kevin
0
 

Author Closing Comment

by:dolythgoe
ID: 36819135
More of a confirmation than a solution - fulltext is great for some things but lacks flexibility in places. Might send this idea to MySQL..

Thanks Kevin
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
Backups and Disaster RecoveryIn this post, we’ll look at strategies for backups and disaster recovery.
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…

589 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question