Solved

mySQL FULLTEXT search with custom weighting

Posted on 2011-09-29
8
554 Views
Last Modified: 2012-05-12
Hi all,

I've been playing around with FULLTEXT to apply my own weighting.

Here's the query:

SELECT link_id, link_title,
	((MATCH(link_title) AGAINST ('word search' IN BOOLEAN MODE)* 5) +
	(MATCH(link_tags) AGAINST ('word search' IN BOOLEAN MODE)) +
	(MATCH(link_category) AGAINST ('word search' IN BOOLEAN MODE)* 3)) AS score
FROM textdump
WHERE
	MATCH(link_title, link_tags, link_category) AGAINST ('word search' IN BOOLEAN MODE)
ORDER BY
	score DESC
LIMIT
	0,16;

Open in new window


There's 4 FULLTEXT indexes:

link_title,
link_tags,
link_category,
link_title, link_tags, link_category

On 1.6 million rows

This takes 8 seconds and I'm wondering if there's a way to apply a weighting within the MATCH() function or something so I don't need to use the 3 SELECT MATCH()'S to speed it up - or just a much better way!

EXPLAIN doesn't say much:

id  select_type  table     type      possible_keys                 key                           key_len  ref  rows  Extra                        
1   SIMPLE       textdump  fulltext  fulltext_title_tags_category  fulltext_title_tags_category  0             1     Using where; Using filesort  

Any pointers much appreciated!


0
Comment
Question by:dolythgoe
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
8 Comments
 
LVL 59

Accepted Solution

by:
Kevin Cross earned 500 total points
ID: 36819023
I do not believe there is a better way to get the score other than I do not believe you want the IN BOOLEAN MODE there. The IN BOOLEAN MODE is for the WHERE condition where as you want the SELECT to return a "score" which you can then weight based on which column it was. Because of the nature of FULLTEXT, I am not sure what more you could do unless you have a more restrictive filter you can put on the data first.
0
 

Author Comment

by:dolythgoe
ID: 36819039
hmm you say more restrictive filter - perhaps a subselect might work better in this case? get a quick match first and then do a series fo smaller matches with scores within those? Just speculating but a theory...
0
 

Author Comment

by:dolythgoe
ID: 36819045
..actually that's what the WHERE is doing anyway, forget that!
0
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

 
LVL 59

Expert Comment

by:Kevin Cross
ID: 36819055
Yes, I was trying to think of a good way to show it, but not knowing the data. I also saw in your other question you are using PHP and trying to see if there is something you can do with that. One thought is that you start with just:

SELECT link_id, link_title, (MATCH(link_title) AGAINST ('word search') score
FROM textdump
WHERE MATCH(link_title) AGAINST ('word search' IN BOOLEAN MODE)
ORDER BY score DESC
LIMIT 0,16;

If you do not get 16 rows, then you can move on to grab the next rows from link_category matches.

Not sure if that works, but just a thought.
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 36819070
Yes, you are sub selecting based on what matches, but then you have to get scores for that sub select, order by the whole row set, then limit to 16 rows. If this is a much more than 16 then the order by process can be your biggest headache. If the weighting of the scores always pushes a out 16 matches on link_title my suggestion may work out.
0
 

Author Comment

by:dolythgoe
ID: 36819083
Yes, this one is actually for a search itself but you know what, I'm kind of glad this has turned out not to be the best way as I developed a full set of indexing scripts to create many-to-many id tables for word_id 2 link_id which is scalable and scans pretty quick (aslong as you assign enough temp_table_size memory) which also scores in the way I want. I was getting scared that fulltext was faster and I guess it is when using it's own scoring and index but without being able to influence the weighting used in its algorithm you're stuck with MySQL's interpretation.

A few upgrades to fulltext would be handy, a way of indexing across multiple columns and add a multiplier to the score for those columns would be awesome!

Thanks for your help again on this subject...never an easy one!
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 36819103
Very nice. Glad you got what you needed.
Best regards and happy coding,

Kevin
0
 

Author Closing Comment

by:dolythgoe
ID: 36819135
More of a confirmation than a solution - fulltext is great for some things but lacks flexibility in places. Might send this idea to MySQL..

Thanks Kevin
0

Featured Post

Back Up Your Microsoft Windows Server®

Back up all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction In this article, I will by showing a nice little trick for MySQL similar to that of my previous EE Article for SQLite (http://www.sqlite.org/), A SQLite Tidbit: Quick Numbers Table Generation (http://www.experts-exchange.com/A_3570.htm…
Introduction In this installment of my SQL tidbits, I will be looking at parsing Extensible Markup Language (XML) directly passed as string parameters to MySQL 5.1.5 or higher. These would be instances where LOAD_FILE (http://dev.mysql.com/doc/refm…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an antispam), the admini…

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question