Link to home
Start Free TrialLog in
Avatar of LeeDerbyshire
LeeDerbyshireFlag for United Kingdom of Great Britain and Northern Ireland

asked on

How to generate permutations/combinations of search terms for SQL statements.

Hello all. I have a problem that I expect some of you have seen many times. I want to search descriptions in a SQL 2000 table for the best match with a number of search terms. Suppose someone supplies (for example) five words, I'd like to find rows where all five are present, then rows where any four are present, then rows where any three are present. And so on. I'm sure there's a name for this sort of thing, but my own searches haven't found anything that describes exactly what I'm looking for. Have any of you done this before? Since it's SQL 2000, I'm guessing there aren't any nice built-in functions for this. If someone can even describe how to generate the combinations, I can probably come up with the SQL by myself.
Avatar of EvilPostIt
EvilPostIt
Flag of United Kingdom of Great Britain and Northern Ireland image

Please could you give the structure of your search table?
Avatar of LeeDerbyshire

ASKER

The table is a stock table in our ERP system. There are lots of fields, most of which I'm not interested in, but I would like to get back the stock code field SC01001 based on the results of a search in a description split across two fields named SC01002 (first line of description) and SC01003 (second line). I only mention these field names in case you have it in mind to suggest a complete SQL statement. If not, then the names aren't really important.
Is this the sort of thing you want? This populates one table with a list of sentences and another table with words to find and then enumerates how many of the search words exist in each of the sentences...

create table #temp1 (descrip varchar(255))

create table #search (string varchar(255))

insert into #temp1 values ('this may only be test data')
insert into #temp1 values ('hello may i only say bye')
insert into #temp1 values ('this is only bye')
insert into #temp1 values ('hello')
insert into #temp1 values ('good')

insert into #search values('this')
insert into #search values('only')
insert into #search values('bye')
insert into #search values('data')
insert into #search values('test')

select t.descrip,COUNT(*) from #temp1 t
join (select * from #search) s on charindex(s.string,t.descrip,0) >0
group by t.descrip

Open in new window


As you can see from the join this also joins the second table based upon a select.
Thanks, I'll try it out on Monday.
SOLUTION
Avatar of TommySzalapski
TommySzalapski
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks. I'd started thinking about for .. next loops, and couldn't find a way to do it. Then I thought I'd need something recursive, but my mind started falling apart. So, since I've earned some credit here, I thought I'd ask.

Funny, I thought it would be fairly easy at first...
Have i got the wrong idea here then? The code that I posted does this in a single select with a sub-query rather than having to loop at all...
I don't know... I don't know enough SQL to be able to tell by looking at it what it is doing. And I won't have the time to study it and try it out until next week.

I only mentioned loops because that was the first way I started thinking about creating combinations of terms.

I'm trying to create something where a user will search for, say, a five-word description of something. If all five words are found in a record, then great - just display it. But if not, move on to try to find four of the words. If four are not found, then try three. And so on.
Did you want me to take you through my code and explain in a little more detail?
I should be able to work it out when I have the time to go through it slowly. If I get stuck, I'll be sure to ask.
Ok, just let me know. More than happy to assist if you need it.
EvilPostIt, yours doesn't do quite the same thing mine does. Mine finds the sets of all possible combinations of words. So if you supply 4 words, it will find everything with:
word1
word2
etc
word1, word2, word3
word1, word2, word4
word1, word3, word4
etc

All separately. I think yours finds all the ones matching 3 words at the same time.

I'm not sure which works best for the asker. Mine is also just pseudo code since I'm not an expert at SQL (I'm here from the misc. programming zone, and for the math since he said "combinations")
Mine finds all matches, whether it be 1 match or 5 matches. The second column displays how many matches there were. The create statements and the insert statements setup the environment.

The select statement give the following output.

User generated image
I'm trying to do something like, say, eBay. The user will search for (e.g.)
'big red wooden cube'
If it is found in the table, then it will display it. But if not, then it will do something like:
Found:
2 results for 'big red wooden'
1 result for 'big red cube'
3 results for 'red wooden cube'
10 results for 'big red'
12 results for 'big wooden'
11 results for 'big cube'
13 results for 'red wooden'
and so on.
I thought it would be easy to express this in code, but I discovered it's beyond me.
My algorithm will do exactly that (just count how many items are in resultset and print the items from wordsUsed) but I don't have the knowledge to turn it into SQL. I'll leave that to EvilPostIt if he wants.

Note of course that if you specify N words, you will get 2^N - 1 lines of output (10 words will give 1023, 20 words will give just over a million) so keep that in mind. You may want to put a limit on how many words it will allow so you don't get stuck running forever and need to kill things.
Okay, thanks. I can probably make some crude SQL from that, but I'm sure EvilPostIt could come up with something more elegant than I could.

Anyway, I'm at home now, so I won't get to try anything until Monday.
Hmmmm, the only issue im having at the moment is displaying the matching words due to the aggregate statement...

Trying to do this in a single line of code is the issue here... Having a play now.
I think i can do it but unfortunately you are using SQL 2000, i need to use the xml function which exists in 2005+
There's no need for a single line of sql, unless that is more satisfying :) .  But yes, SQL 2000 does limit things a little.

I'd have thought this was a common thing to want to do, but I don't even know the name for it, so my searches haven't found much.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well, my project suddenly got a bit more complicated, and I now have to do a bit more work with my search terms before I can send them to SQL. But the suggestions here were looking very promising before the project team moved the goalposts.