Solved

mySQL FULLTEXT search with custom weighting

Posted on 2011-09-29
8
551 Views
Last Modified: 2012-05-12
Hi all,

I've been playing around with FULLTEXT to apply my own weighting.

Here's the query:

SELECT link_id, link_title,
	((MATCH(link_title) AGAINST ('word search' IN BOOLEAN MODE)* 5) +
	(MATCH(link_tags) AGAINST ('word search' IN BOOLEAN MODE)) +
	(MATCH(link_category) AGAINST ('word search' IN BOOLEAN MODE)* 3)) AS score
FROM textdump
WHERE
	MATCH(link_title, link_tags, link_category) AGAINST ('word search' IN BOOLEAN MODE)
ORDER BY
	score DESC
LIMIT
	0,16;

Open in new window


There's 4 FULLTEXT indexes:

link_title,
link_tags,
link_category,
link_title, link_tags, link_category

On 1.6 million rows

This takes 8 seconds and I'm wondering if there's a way to apply a weighting within the MATCH() function or something so I don't need to use the 3 SELECT MATCH()'S to speed it up - or just a much better way!

EXPLAIN doesn't say much:

id  select_type  table     type      possible_keys                 key                           key_len  ref  rows  Extra                        
1   SIMPLE       textdump  fulltext  fulltext_title_tags_category  fulltext_title_tags_category  0             1     Using where; Using filesort  

Any pointers much appreciated!


0
Comment
Question by:dolythgoe
  • 4
  • 4
8 Comments
 
LVL 59

Accepted Solution

by:
Kevin Cross earned 500 total points
ID: 36819023
I do not believe there is a better way to get the score other than I do not believe you want the IN BOOLEAN MODE there. The IN BOOLEAN MODE is for the WHERE condition where as you want the SELECT to return a "score" which you can then weight based on which column it was. Because of the nature of FULLTEXT, I am not sure what more you could do unless you have a more restrictive filter you can put on the data first.
0
 

Author Comment

by:dolythgoe
ID: 36819039
hmm you say more restrictive filter - perhaps a subselect might work better in this case? get a quick match first and then do a series fo smaller matches with scores within those? Just speculating but a theory...
0
 

Author Comment

by:dolythgoe
ID: 36819045
..actually that's what the WHERE is doing anyway, forget that!
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 59

Expert Comment

by:Kevin Cross
ID: 36819055
Yes, I was trying to think of a good way to show it, but not knowing the data. I also saw in your other question you are using PHP and trying to see if there is something you can do with that. One thought is that you start with just:

SELECT link_id, link_title, (MATCH(link_title) AGAINST ('word search') score
FROM textdump
WHERE MATCH(link_title) AGAINST ('word search' IN BOOLEAN MODE)
ORDER BY score DESC
LIMIT 0,16;

If you do not get 16 rows, then you can move on to grab the next rows from link_category matches.

Not sure if that works, but just a thought.
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 36819070
Yes, you are sub selecting based on what matches, but then you have to get scores for that sub select, order by the whole row set, then limit to 16 rows. If this is a much more than 16 then the order by process can be your biggest headache. If the weighting of the scores always pushes a out 16 matches on link_title my suggestion may work out.
0
 

Author Comment

by:dolythgoe
ID: 36819083
Yes, this one is actually for a search itself but you know what, I'm kind of glad this has turned out not to be the best way as I developed a full set of indexing scripts to create many-to-many id tables for word_id 2 link_id which is scalable and scans pretty quick (aslong as you assign enough temp_table_size memory) which also scores in the way I want. I was getting scared that fulltext was faster and I guess it is when using it's own scoring and index but without being able to influence the weighting used in its algorithm you're stuck with MySQL's interpretation.

A few upgrades to fulltext would be handy, a way of indexing across multiple columns and add a multiplier to the score for those columns would be awesome!

Thanks for your help again on this subject...never an easy one!
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 36819103
Very nice. Glad you got what you needed.
Best regards and happy coding,

Kevin
0
 

Author Closing Comment

by:dolythgoe
ID: 36819135
More of a confirmation than a solution - fulltext is great for some things but lacks flexibility in places. Might send this idea to MySQL..

Thanks Kevin
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Does the idea of dealing with bits scare or confuse you? Does it seem like a waste of time in an age where we all have terabytes of storage? If so, you're missing out on one of the core tools in every professional programmer's toolbox. Learn how to …
Creating and Managing Databases with phpMyAdmin in cPanel.
Two types of users will appreciate AOMEI Backupper Pro: 1 - Those with PCIe drives (and haven't found cloning software that works on them). 2 - Those who want a fast clone of their boot drive (no re-boots needed) and it can clone your drive wh…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question