Solved

How to generate permutations/combinations of search terms for SQL statements.

Posted on 2013-06-28
22
1,312 Views
Last Modified: 2013-07-04
Hello all. I have a problem that I expect some of you have seen many times. I want to search descriptions in a SQL 2000 table for the best match with a number of search terms. Suppose someone supplies (for example) five words, I'd like to find rows where all five are present, then rows where any four are present, then rows where any three are present. And so on. I'm sure there's a name for this sort of thing, but my own searches haven't found anything that describes exactly what I'm looking for. Have any of you done this before? Since it's SQL 2000, I'm guessing there aren't any nice built-in functions for this. If someone can even describe how to generate the combinations, I can probably come up with the SQL by myself.
0
Comment
Question by:LeeDerbyshire
  • 9
  • 9
  • 3
  • +1
22 Comments
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39283862
Please could you give the structure of your search table?
0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39283874
The table is a stock table in our ERP system. There are lots of fields, most of which I'm not interested in, but I would like to get back the stock code field SC01001 based on the results of a search in a description split across two fields named SC01002 (first line of description) and SC01003 (second line). I only mention these field names in case you have it in mind to suggest a complete SQL statement. If not, then the names aren't really important.
0
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39283945
Is this the sort of thing you want? This populates one table with a list of sentences and another table with words to find and then enumerates how many of the search words exist in each of the sentences...

create table #temp1 (descrip varchar(255))

create table #search (string varchar(255))

insert into #temp1 values ('this may only be test data')
insert into #temp1 values ('hello may i only say bye')
insert into #temp1 values ('this is only bye')
insert into #temp1 values ('hello')
insert into #temp1 values ('good')

insert into #search values('this')
insert into #search values('only')
insert into #search values('bye')
insert into #search values('data')
insert into #search values('test')

select t.descrip,COUNT(*) from #temp1 t
join (select * from #search) s on charindex(s.string,t.descrip,0) >0
group by t.descrip

Open in new window


As you can see from the join this also joins the second table based upon a select.
0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39283957
Thanks, I'll try it out on Monday.
0
 
LVL 37

Assisted Solution

by:TommySzalapski
TommySzalapski earned 166 total points
ID: 39284038
The most efficient way to do this would probably be to do the single word searches first (easy for loop) and then keep joining those results to get the rest of the sets.
A recursive function like this would get you there.
The basic idea is you get each single word results and then go through the list branching two ways each time (so you hit all possible combinations) and joining the current list with the one for the current word in the list (if that word is to be added).

Function Findall()
{
  for i from 1 to N (where n is the number of words)
    Find word[i]
    Save results into table[i]
  end for

  results = [empty set]
  wordsUsed = [empty list]
  JoinAll(1, results, wordsUsed)
}

Function JoinAll(index, results, wordsUsed)
{
  if (index == N+1) then
    output results and wordsUsed
    quit function
  end if
  //branch one, do not include this one
  JoinAll(index+1, results, wordsUsed)
  //branch two, include this one
  if (wordsUsed is empty) then
    results = table[index]
  else
    join results with table[index]
  end if
  add words[i] to wordsUsed
  JoinAll(index + 1, results, wordsUsed
}

Open in new window

0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39284078
Thanks. I'd started thinking about for .. next loops, and couldn't find a way to do it. Then I thought I'd need something recursive, but my mind started falling apart. So, since I've earned some credit here, I thought I'd ask.

Funny, I thought it would be fairly easy at first...
0
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39284111
Have i got the wrong idea here then? The code that I posted does this in a single select with a sub-query rather than having to loop at all...
0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39284197
I don't know... I don't know enough SQL to be able to tell by looking at it what it is doing. And I won't have the time to study it and try it out until next week.

I only mentioned loops because that was the first way I started thinking about creating combinations of terms.

I'm trying to create something where a user will search for, say, a five-word description of something. If all five words are found in a record, then great - just display it. But if not, move on to try to find four of the words. If four are not found, then try three. And so on.
0
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39284205
Did you want me to take you through my code and explain in a little more detail?
0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39284224
I should be able to work it out when I have the time to go through it slowly. If I get stuck, I'll be sure to ask.
0
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39284228
Ok, just let me know. More than happy to assist if you need it.
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 37

Expert Comment

by:TommySzalapski
ID: 39284312
EvilPostIt, yours doesn't do quite the same thing mine does. Mine finds the sets of all possible combinations of words. So if you supply 4 words, it will find everything with:
word1
word2
etc
word1, word2, word3
word1, word2, word4
word1, word3, word4
etc

All separately. I think yours finds all the ones matching 3 words at the same time.

I'm not sure which works best for the asker. Mine is also just pseudo code since I'm not an expert at SQL (I'm here from the misc. programming zone, and for the math since he said "combinations")
0
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39284332
Mine finds all matches, whether it be 1 match or 5 matches. The second column displays how many matches there were. The create statements and the insert statements setup the environment.

The select statement give the following output.

Query output.
0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39284338
I'm trying to do something like, say, eBay. The user will search for (e.g.)
'big red wooden cube'
If it is found in the table, then it will display it. But if not, then it will do something like:
Found:
2 results for 'big red wooden'
1 result for 'big red cube'
3 results for 'red wooden cube'
10 results for 'big red'
12 results for 'big wooden'
11 results for 'big cube'
13 results for 'red wooden'
and so on.
I thought it would be easy to express this in code, but I discovered it's beyond me.
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 39284355
My algorithm will do exactly that (just count how many items are in resultset and print the items from wordsUsed) but I don't have the knowledge to turn it into SQL. I'll leave that to EvilPostIt if he wants.

Note of course that if you specify N words, you will get 2^N - 1 lines of output (10 words will give 1023, 20 words will give just over a million) so keep that in mind. You may want to put a limit on how many words it will allow so you don't get stuck running forever and need to kill things.
0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39284433
Okay, thanks. I can probably make some crude SQL from that, but I'm sure EvilPostIt could come up with something more elegant than I could.

Anyway, I'm at home now, so I won't get to try anything until Monday.
0
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39284438
Hmmmm, the only issue im having at the moment is displaying the matching words due to the aggregate statement...

Trying to do this in a single line of code is the issue here... Having a play now.
0
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39284451
I think i can do it but unfortunately you are using SQL 2000, i need to use the xml function which exists in 2005+
0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39284478
There's no need for a single line of sql, unless that is more satisfying :) .  But yes, SQL 2000 does limit things a little.

I'd have thought this was a common thing to want to do, but I don't even know the name for it, so my searches haven't found much.
0
 
LVL 16

Assisted Solution

by:EvilPostIt
EvilPostIt earned 167 total points
ID: 39284570
This works but it uses a loop, if you can upgrade SQL Server 2000 (Which is now out of support btw) I'm pretty sure i can do this in a single statement.

create table #temp1 (descrip varchar(255))

create table #search (string varchar(255))

insert into #temp1 values ('this may only be test data')
insert into #temp1 values ('hello may i only say bye')
insert into #temp1 values ('this is only bye')
insert into #temp1 values ('hello')
insert into #temp1 values ('good')

insert into #search values('this')
insert into #search values('only')
insert into #search values('bye')
insert into #search values('data')
insert into #search values('test')

drop table #results
drop table #finalresults
create table #results (descrip varchar(255), string varchar(255))
create table #finalresults (descrip varchar(255), string varchar(255))

insert into #results
select t.descrip,s.string from #temp1 t
join (select string from #search) s on CHARINDEX(s.string,t.descrip,0)>0

select * from #results

declare @exit bit
declare @var1 varchar(max)
declare @descrip varchar(255)

set @exit=0

while (select COUNT(*) from #results)>0
begin
		select @descrip='',@var1=''
		select @descrip=descrip from #results
		select @var1=@var1+string+' ' from #results where descrip=@descrip
		insert into #finalresults values(@descrip,@var1)
		delete from #results where descrip=@descrip		
end
select * from #finalresults

select t.descrip,fr.string,COUNT(*) from #temp1 t
join (select string from #search) s on CHARINDEX(s.string,t.descrip,0)>0
join #finalresults fr on t.descrip=fr.descrip
group by t.descrip,fr.string

Open in new window


I'd have thought this was a common thing to want to do, but I don't even know the name for it, so my searches haven't found much.

Most is quite common apart from the concatenation of the search parameters as there is no aggregate function for this...
0
 
LVL 26

Accepted Solution

by:
Zberteoc earned 167 total points
ID: 39284803
How about this:

Step 1:
Create a function, fnTally, that generates a set of numbers from 1 to 10000. This will be used in parsing or other situations. Normally a table is generated for this purpose but the function will do just fine as it gives the same performance or better. The function:
CREATE FUNCTION [dbo].[fnTally]()
RETURNS TABLE --WITH SCHEMABINDING 
AS
/*******************************************************************************\
Function	: fnTally

Purpose		: returns a set with numbers from 1 to 10,000 
			  to be used in parsing and sequential data generation whithout loop
			  
Parameters	: no parameters

Invoke		:
	
		select * from [dbo].[fnTally]()
		select N from [dbo].[fnTally]()
		select substring('abcdef',N,1) as chr from [dbo].[fnTally]() where N<len('abcdef') -- parsing a string
		select dateadd(dd, N, '2007-01-01') as dte from [dbo].[fnTally]() --gets dates for about 30 years

\*******************************************************************************/
RETURN
	WITH 
	E1(N) AS 
	( --10E+1 or 10 rows
		 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
		 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
		 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
	),                         
   E2(N) AS 
   ( --10E+2 or 100 rows	
		SELECT 1 FROM E1 a, E1 b
	),
   E4(N) AS 
   ( --10E+4 or 10,000 rows max
		SELECT 1 FROM E2 a, E2 b
	)
			 SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) as N FROM E4
	;

Open in new window

The function above will be used in another function that will parse a string based on a separator to a data set:
CREATE FUNCTION [dbo].[fnParseStringToSet]
(
	@str VARCHAR(8000), 
	@sep CHAR(1)
)
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT KILLS PERFORMANCE!
RETURNS TABLE 
AS
/*******************************************************************************\
Function	: fnParseStringToSet

Purpose		: parses a string by a separator and returns a two columns set with
				each element and its position; uses the fnTally for parsing
			  
Parameters	: @str - the string to parse
			  @sep - the separator character

Invoke		:
	
		select * from [dbo].[fnParseStringToSet]('ab cd ef cd ef cd ef cd ef cd ef cd ef cd ef cd ef cd ef cd ef ',' ')

\*******************************************************************************/
RETURN
	with cteStart(N1) AS 
	(--returns N+1 (starting position of each "element" just once for each delimiter)
		SELECT 0 UNION ALL
		SELECT t.N+1 FROM [dbo].[fnTally]() t WHERE SUBSTRING(@str,t.N,1) = @sep and t.N<=len(@str)
	),
	cteLen(N1,L1) AS
	(--returns start and length (for use in substring)
		 SELECT s.N1,
				ISNULL(NULLIF(CHARINDEX(@sep,@str,s.N1),0)-s.N1,8000)
		   FROM cteStart s
	)
	--so the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
	SELECT 
		ROW_NUMBER() OVER(ORDER BY l.N1)	as ElemID,
		SUBSTRING(@str, l.N1, l.L1)			as Elem
	FROM 
		cteLen l
	;

Open in new window


Now your query becomes(I used the same temp table as in the post above):
if object_id('tempdb..#temp1') is not null
	drop table #temp1
create table #temp1 (descrip varchar(255))

insert into #temp1 values ('this may only be test data')
insert into #temp1 values ('hello may i only say bye')
insert into #temp1 values ('this is only bye')
insert into #temp1 values ('hello')
insert into #temp1 values ('good')

declare @str varchar(8000)='may only be'

select * from [dbo].[fnParseStringToSet](@str,' ')

select 
	*,
	@str as s
from
	#temp1 t
	inner join [dbo].[fnParseStringToSet](@str,' ') w
		on t.descrip like '%'+w.Elem+'%'
order by 
	descrip

-- OR
select 
	descrip,
	count(*) as finds,
	@str as s
from
	#temp1 t
	inner join [dbo].[fnParseStringToSet](@str,' ') w
		on t.descrip like '%'+w.Elem+'%'
group by 
	descrip
order by 
	finds desc

Open in new window

0
 
LVL 31

Author Closing Comment

by:LeeDerbyshire
ID: 39299374
Well, my project suddenly got a bit more complicated, and I now have to do a bit more work with my search terms before I can send them to SQL. But the suggestions here were looking very promising before the project team moved the goalposts.
0

Featured Post

Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

Join & Write a Comment

This article describes how to use the timestamp of existing data in a database to allow Tableau to calculate the prior work day instead of relying on case statements or if statements to calculate the days of the week.
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Viewers will learn how the fundamental information of how to create a table.
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now