Solved

mySQL FULLTEXT search with custom weighting

Posted on 2011-09-29
8
545 Views
Last Modified: 2012-05-12
Hi all,

I've been playing around with FULLTEXT to apply my own weighting.

Here's the query:

SELECT link_id, link_title,
	((MATCH(link_title) AGAINST ('word search' IN BOOLEAN MODE)* 5) +
	(MATCH(link_tags) AGAINST ('word search' IN BOOLEAN MODE)) +
	(MATCH(link_category) AGAINST ('word search' IN BOOLEAN MODE)* 3)) AS score
FROM textdump
WHERE
	MATCH(link_title, link_tags, link_category) AGAINST ('word search' IN BOOLEAN MODE)
ORDER BY
	score DESC
LIMIT
	0,16;

Open in new window


There's 4 FULLTEXT indexes:

link_title,
link_tags,
link_category,
link_title, link_tags, link_category

On 1.6 million rows

This takes 8 seconds and I'm wondering if there's a way to apply a weighting within the MATCH() function or something so I don't need to use the 3 SELECT MATCH()'S to speed it up - or just a much better way!

EXPLAIN doesn't say much:

id  select_type  table     type      possible_keys                 key                           key_len  ref  rows  Extra                        
1   SIMPLE       textdump  fulltext  fulltext_title_tags_category  fulltext_title_tags_category  0             1     Using where; Using filesort  

Any pointers much appreciated!


0
Comment
Question by:dolythgoe
  • 4
  • 4
8 Comments
 
LVL 59

Accepted Solution

by:
Kevin Cross earned 500 total points
ID: 36819023
I do not believe there is a better way to get the score other than I do not believe you want the IN BOOLEAN MODE there. The IN BOOLEAN MODE is for the WHERE condition where as you want the SELECT to return a "score" which you can then weight based on which column it was. Because of the nature of FULLTEXT, I am not sure what more you could do unless you have a more restrictive filter you can put on the data first.
0
 

Author Comment

by:dolythgoe
ID: 36819039
hmm you say more restrictive filter - perhaps a subselect might work better in this case? get a quick match first and then do a series fo smaller matches with scores within those? Just speculating but a theory...
0
 

Author Comment

by:dolythgoe
ID: 36819045
..actually that's what the WHERE is doing anyway, forget that!
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 36819055
Yes, I was trying to think of a good way to show it, but not knowing the data. I also saw in your other question you are using PHP and trying to see if there is something you can do with that. One thought is that you start with just:

SELECT link_id, link_title, (MATCH(link_title) AGAINST ('word search') score
FROM textdump
WHERE MATCH(link_title) AGAINST ('word search' IN BOOLEAN MODE)
ORDER BY score DESC
LIMIT 0,16;

If you do not get 16 rows, then you can move on to grab the next rows from link_category matches.

Not sure if that works, but just a thought.
0
Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

 
LVL 59

Expert Comment

by:Kevin Cross
ID: 36819070
Yes, you are sub selecting based on what matches, but then you have to get scores for that sub select, order by the whole row set, then limit to 16 rows. If this is a much more than 16 then the order by process can be your biggest headache. If the weighting of the scores always pushes a out 16 matches on link_title my suggestion may work out.
0
 

Author Comment

by:dolythgoe
ID: 36819083
Yes, this one is actually for a search itself but you know what, I'm kind of glad this has turned out not to be the best way as I developed a full set of indexing scripts to create many-to-many id tables for word_id 2 link_id which is scalable and scans pretty quick (aslong as you assign enough temp_table_size memory) which also scores in the way I want. I was getting scared that fulltext was faster and I guess it is when using it's own scoring and index but without being able to influence the weighting used in its algorithm you're stuck with MySQL's interpretation.

A few upgrades to fulltext would be handy, a way of indexing across multiple columns and add a multiplier to the score for those columns would be awesome!

Thanks for your help again on this subject...never an easy one!
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 36819103
Very nice. Glad you got what you needed.
Best regards and happy coding,

Kevin
0
 

Author Closing Comment

by:dolythgoe
ID: 36819135
More of a confirmation than a solution - fulltext is great for some things but lacks flexibility in places. Might send this idea to MySQL..

Thanks Kevin
0

Featured Post

Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Since I wrote the original article about Handling Date and Time in PHP and MySQL (http://www.experts-exchange.com/articles/201/Handling-Date-and-Time-in-PHP-and-MySQL.html) several years ago, it seemed like now was a good time to updat…
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Windows 10 is mostly good. However the one thing that annoys me is how many clicks you have to do to dial a VPN connection. You have to go to settings from the start menu, (2 clicks), Network and Internet (1 click), Click VPN (another click) then fi…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now