Solved

MYSQL FullTextSearch Doesnt Return for all matches

Posted on 2013-01-04
7
425 Views
Last Modified: 2013-01-07
Hi Guys,

I have this query.

SELECT ItemId,MenuId,SubMenuId,Text, PageTitle,
MATCH(Content) AGAINST ('software') AS SearchScore
FROM SubMenuItems WHERE MATCH(Content) AGAINST ('software')
ORDER BY SearchScore DESC

now running it agains a fulltext enabled myisam table.

This is in the content column.

We are located in Central New Jersey. Our support team is also located in Central New Jersey. You will always get a perfect english speaking person on the phone when you call us. We will be able to help you remotely using the software TeamViewer to access your computer and help you with your needs. TeamViewer is extremely safe, and no one can connect to your computer without your permission. You have to initiate a session. Each session has a different random generated password and id, to insure that no one that had access to your computer once, will be able to access it again without new information from you the customer.

The word software is in the text, but is not returned by my query.


if i do this query

SELECT ItemId,MenuId,SubMenuId,Text, PageTitle,
MATCH(Content) AGAINST ('person') AS SearchScore
FROM SubMenuItems WHERE MATCH(Content) AGAINST ('person')
ORDER BY SearchScore DESC

it does return the content. WHY? am i totally missing something here?
0
Comment
Question by:tim_carter
  • 3
  • 3
7 Comments
 
LVL 24

Accepted Solution

by:
johanntagle earned 500 total points
ID: 38748221
I'm assuming this is a very small table?  Please see the latter paragraphs at http://dev.mysql.com/doc/refman/5.0/en/fulltext-natural-language.html.  To quote from there:

Every correct word in the collection and in the query is weighted according to its significance in the collection or query. Consequently, a word that is present in many documents has a lower weight (and may even have a zero weight), because it has lower semantic value in this particular collection. Conversely, if the word is rare, it receives a higher weight. The weights of the words are combined to compute the relevance of the row.

Such a technique works best with large collections (in fact, it was carefully tuned this way). For very small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results.

If you want a simple keyword search use boolean mode see http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html.  Basically your match clause will be something like:

 match(content) against ('+software' in boolean mode);
0
 
LVL 16

Expert Comment

by:theo kouwenhoven
ID: 38750250
Try

WHERE Content LIKE '%software%')
0
 
LVL 1

Author Closing Comment

by:tim_carter
ID: 38751604
I ended up doing like this

SELECT ItemId,MenuId,SubMenuId,Text, PageTitle,
                MATCH(Content, Code, Text, PageTitle) AGAINST ('+".$rSearchFor."' in boolean mode) AS SearchScore
            FROM SubMenuItems
            WHERE
                MATCH(Content, Code, Text, PageTitle) AGAINST ('+".$rSearchFor."' in boolean mode)
                    OR
                MATCH(Content, Code, Text, PageTitle) AGAINST ('".$rSearchFor."')    
                    OR
                Content LIKE '%".$rSearchFor."%'
                    OR
                Text LIKE '%".$rSearchFor."%'
                    OR
                PageTitle LIKE '%".$rSearchFor."%'
            ORDER BY
                SearchScore DESC

I will still be working on soundex though. thanks
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 24

Expert Comment

by:johanntagle
ID: 38752216
I wonder why you still need to use the LIKE '%keyword%' pattern.  That basically kills the benefits of using a fulltext index as MySQL will now need to do a full table scan every time.  Is it because of the minimum number of chars a word must have to be indexed by fulltext and/or certain reserved words and numbers it won't index?  I think you should finetune fulltext and know how to work around the limitations instead.  For example - fulltext won't allow you to search for "c++".  The workaround is to create another column with the same content but with "c++" replaced by "cplusplus", index that column instead of the original, but display the original when showing results.  Of course that also requires you to replace the user-entered search params with the same.  Last time I checked using a fulltext index will cause MySQL to ignore any index in another column and won't index pure numbers, so one-time when I also needed to add a "where company_id=12345" in my select statement, I added a "companyid12345" in my extra column instead and adjusted my match clause to include it in the boolean search.

Alternatively, you can employ a 3rd party engine like sphinx (http://sphinxsearch.com) instead.
0
 
LVL 1

Author Comment

by:tim_carter
ID: 38753092
well i was just doing a simple search like 'each'. That did not get returned by fulltext.
0
 
LVL 24

Expert Comment

by:johanntagle
ID: 38753231
It's because 'each' is a stopword.  See http://dev.mysql.com/doc/refman/5.1/en/fulltext-stopwords.html.  Also see http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html for how to edit/override the stopword list and other fine-tuning methods.
0
 
LVL 1

Author Comment

by:tim_carter
ID: 38753249
haha ok, thanks so muc
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Many companies are looking to get out of the datacenter business and to services like Microsoft Azure to provide Infrastructure as a Service (IaaS) solutions for legacy client server workloads, rather than continuing to make capital investments in h…
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now