Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

How to generate permutations/combinations of search terms for SQL statements.

Posted on 2013-06-28
22
1,401 Views
Last Modified: 2013-07-04
Hello all. I have a problem that I expect some of you have seen many times. I want to search descriptions in a SQL 2000 table for the best match with a number of search terms. Suppose someone supplies (for example) five words, I'd like to find rows where all five are present, then rows where any four are present, then rows where any three are present. And so on. I'm sure there's a name for this sort of thing, but my own searches haven't found anything that describes exactly what I'm looking for. Have any of you done this before? Since it's SQL 2000, I'm guessing there aren't any nice built-in functions for this. If someone can even describe how to generate the combinations, I can probably come up with the SQL by myself.
0
Comment
Question by:LeeDerbyshire
  • 9
  • 9
  • 3
  • +1
22 Comments
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39283862
Please could you give the structure of your search table?
0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39283874
The table is a stock table in our ERP system. There are lots of fields, most of which I'm not interested in, but I would like to get back the stock code field SC01001 based on the results of a search in a description split across two fields named SC01002 (first line of description) and SC01003 (second line). I only mention these field names in case you have it in mind to suggest a complete SQL statement. If not, then the names aren't really important.
0
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39283945
Is this the sort of thing you want? This populates one table with a list of sentences and another table with words to find and then enumerates how many of the search words exist in each of the sentences...

create table #temp1 (descrip varchar(255))

create table #search (string varchar(255))

insert into #temp1 values ('this may only be test data')
insert into #temp1 values ('hello may i only say bye')
insert into #temp1 values ('this is only bye')
insert into #temp1 values ('hello')
insert into #temp1 values ('good')

insert into #search values('this')
insert into #search values('only')
insert into #search values('bye')
insert into #search values('data')
insert into #search values('test')

select t.descrip,COUNT(*) from #temp1 t
join (select * from #search) s on charindex(s.string,t.descrip,0) >0
group by t.descrip

Open in new window


As you can see from the join this also joins the second table based upon a select.
0
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39283957
Thanks, I'll try it out on Monday.
0
 
LVL 37

Assisted Solution

by:TommySzalapski
TommySzalapski earned 166 total points
ID: 39284038
The most efficient way to do this would probably be to do the single word searches first (easy for loop) and then keep joining those results to get the rest of the sets.
A recursive function like this would get you there.
The basic idea is you get each single word results and then go through the list branching two ways each time (so you hit all possible combinations) and joining the current list with the one for the current word in the list (if that word is to be added).

Function Findall()
{
  for i from 1 to N (where n is the number of words)
    Find word[i]
    Save results into table[i]
  end for

  results = [empty set]
  wordsUsed = [empty list]
  JoinAll(1, results, wordsUsed)
}

Function JoinAll(index, results, wordsUsed)
{
  if (index == N+1) then
    output results and wordsUsed
    quit function
  end if
  //branch one, do not include this one
  JoinAll(index+1, results, wordsUsed)
  //branch two, include this one
  if (wordsUsed is empty) then
    results = table[index]
  else
    join results with table[index]
  end if
  add words[i] to wordsUsed
  JoinAll(index + 1, results, wordsUsed
}

Open in new window

0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39284078
Thanks. I'd started thinking about for .. next loops, and couldn't find a way to do it. Then I thought I'd need something recursive, but my mind started falling apart. So, since I've earned some credit here, I thought I'd ask.

Funny, I thought it would be fairly easy at first...
0
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39284111
Have i got the wrong idea here then? The code that I posted does this in a single select with a sub-query rather than having to loop at all...
0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39284197
I don't know... I don't know enough SQL to be able to tell by looking at it what it is doing. And I won't have the time to study it and try it out until next week.

I only mentioned loops because that was the first way I started thinking about creating combinations of terms.

I'm trying to create something where a user will search for, say, a five-word description of something. If all five words are found in a record, then great - just display it. But if not, move on to try to find four of the words. If four are not found, then try three. And so on.
0
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39284205
Did you want me to take you through my code and explain in a little more detail?
0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39284224
I should be able to work it out when I have the time to go through it slowly. If I get stuck, I'll be sure to ask.
0
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39284228
Ok, just let me know. More than happy to assist if you need it.
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 39284312
EvilPostIt, yours doesn't do quite the same thing mine does. Mine finds the sets of all possible combinations of words. So if you supply 4 words, it will find everything with:
word1
word2
etc
word1, word2, word3
word1, word2, word4
word1, word3, word4
etc

All separately. I think yours finds all the ones matching 3 words at the same time.

I'm not sure which works best for the asker. Mine is also just pseudo code since I'm not an expert at SQL (I'm here from the misc. programming zone, and for the math since he said "combinations")
0
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39284332
Mine finds all matches, whether it be 1 match or 5 matches. The second column displays how many matches there were. The create statements and the insert statements setup the environment.

The select statement give the following output.

Query output.
0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39284338
I'm trying to do something like, say, eBay. The user will search for (e.g.)
'big red wooden cube'
If it is found in the table, then it will display it. But if not, then it will do something like:
Found:
2 results for 'big red wooden'
1 result for 'big red cube'
3 results for 'red wooden cube'
10 results for 'big red'
12 results for 'big wooden'
11 results for 'big cube'
13 results for 'red wooden'
and so on.
I thought it would be easy to express this in code, but I discovered it's beyond me.
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 39284355
My algorithm will do exactly that (just count how many items are in resultset and print the items from wordsUsed) but I don't have the knowledge to turn it into SQL. I'll leave that to EvilPostIt if he wants.

Note of course that if you specify N words, you will get 2^N - 1 lines of output (10 words will give 1023, 20 words will give just over a million) so keep that in mind. You may want to put a limit on how many words it will allow so you don't get stuck running forever and need to kill things.
0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39284433
Okay, thanks. I can probably make some crude SQL from that, but I'm sure EvilPostIt could come up with something more elegant than I could.

Anyway, I'm at home now, so I won't get to try anything until Monday.
0
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39284438
Hmmmm, the only issue im having at the moment is displaying the matching words due to the aggregate statement...

Trying to do this in a single line of code is the issue here... Having a play now.
0
 
LVL 16

Expert Comment

by:EvilPostIt
ID: 39284451
I think i can do it but unfortunately you are using SQL 2000, i need to use the xml function which exists in 2005+
0
 
LVL 31

Author Comment

by:LeeDerbyshire
ID: 39284478
There's no need for a single line of sql, unless that is more satisfying :) .  But yes, SQL 2000 does limit things a little.

I'd have thought this was a common thing to want to do, but I don't even know the name for it, so my searches haven't found much.
0
 
LVL 16

Assisted Solution

by:EvilPostIt
EvilPostIt earned 167 total points
ID: 39284570
This works but it uses a loop, if you can upgrade SQL Server 2000 (Which is now out of support btw) I'm pretty sure i can do this in a single statement.

create table #temp1 (descrip varchar(255))

create table #search (string varchar(255))

insert into #temp1 values ('this may only be test data')
insert into #temp1 values ('hello may i only say bye')
insert into #temp1 values ('this is only bye')
insert into #temp1 values ('hello')
insert into #temp1 values ('good')

insert into #search values('this')
insert into #search values('only')
insert into #search values('bye')
insert into #search values('data')
insert into #search values('test')

drop table #results
drop table #finalresults
create table #results (descrip varchar(255), string varchar(255))
create table #finalresults (descrip varchar(255), string varchar(255))

insert into #results
select t.descrip,s.string from #temp1 t
join (select string from #search) s on CHARINDEX(s.string,t.descrip,0)>0

select * from #results

declare @exit bit
declare @var1 varchar(max)
declare @descrip varchar(255)

set @exit=0

while (select COUNT(*) from #results)>0
begin
		select @descrip='',@var1=''
		select @descrip=descrip from #results
		select @var1=@var1+string+' ' from #results where descrip=@descrip
		insert into #finalresults values(@descrip,@var1)
		delete from #results where descrip=@descrip		
end
select * from #finalresults

select t.descrip,fr.string,COUNT(*) from #temp1 t
join (select string from #search) s on CHARINDEX(s.string,t.descrip,0)>0
join #finalresults fr on t.descrip=fr.descrip
group by t.descrip,fr.string

Open in new window


I'd have thought this was a common thing to want to do, but I don't even know the name for it, so my searches haven't found much.

Most is quite common apart from the concatenation of the search parameters as there is no aggregate function for this...
0
 
LVL 26

Accepted Solution

by:
Zberteoc earned 167 total points
ID: 39284803
How about this:

Step 1:
Create a function, fnTally, that generates a set of numbers from 1 to 10000. This will be used in parsing or other situations. Normally a table is generated for this purpose but the function will do just fine as it gives the same performance or better. The function:
CREATE FUNCTION [dbo].[fnTally]()
RETURNS TABLE --WITH SCHEMABINDING 
AS
/*******************************************************************************\
Function	: fnTally

Purpose		: returns a set with numbers from 1 to 10,000 
			  to be used in parsing and sequential data generation whithout loop
			  
Parameters	: no parameters

Invoke		:
	
		select * from [dbo].[fnTally]()
		select N from [dbo].[fnTally]()
		select substring('abcdef',N,1) as chr from [dbo].[fnTally]() where N<len('abcdef') -- parsing a string
		select dateadd(dd, N, '2007-01-01') as dte from [dbo].[fnTally]() --gets dates for about 30 years

\*******************************************************************************/
RETURN
	WITH 
	E1(N) AS 
	( --10E+1 or 10 rows
		 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
		 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
		 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
	),                         
   E2(N) AS 
   ( --10E+2 or 100 rows	
		SELECT 1 FROM E1 a, E1 b
	),
   E4(N) AS 
   ( --10E+4 or 10,000 rows max
		SELECT 1 FROM E2 a, E2 b
	)
			 SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) as N FROM E4
	;

Open in new window

The function above will be used in another function that will parse a string based on a separator to a data set:
CREATE FUNCTION [dbo].[fnParseStringToSet]
(
	@str VARCHAR(8000), 
	@sep CHAR(1)
)
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT KILLS PERFORMANCE!
RETURNS TABLE 
AS
/*******************************************************************************\
Function	: fnParseStringToSet

Purpose		: parses a string by a separator and returns a two columns set with
				each element and its position; uses the fnTally for parsing
			  
Parameters	: @str - the string to parse
			  @sep - the separator character

Invoke		:
	
		select * from [dbo].[fnParseStringToSet]('ab cd ef cd ef cd ef cd ef cd ef cd ef cd ef cd ef cd ef cd ef ',' ')

\*******************************************************************************/
RETURN
	with cteStart(N1) AS 
	(--returns N+1 (starting position of each "element" just once for each delimiter)
		SELECT 0 UNION ALL
		SELECT t.N+1 FROM [dbo].[fnTally]() t WHERE SUBSTRING(@str,t.N,1) = @sep and t.N<=len(@str)
	),
	cteLen(N1,L1) AS
	(--returns start and length (for use in substring)
		 SELECT s.N1,
				ISNULL(NULLIF(CHARINDEX(@sep,@str,s.N1),0)-s.N1,8000)
		   FROM cteStart s
	)
	--so the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
	SELECT 
		ROW_NUMBER() OVER(ORDER BY l.N1)	as ElemID,
		SUBSTRING(@str, l.N1, l.L1)			as Elem
	FROM 
		cteLen l
	;

Open in new window


Now your query becomes(I used the same temp table as in the post above):
if object_id('tempdb..#temp1') is not null
	drop table #temp1
create table #temp1 (descrip varchar(255))

insert into #temp1 values ('this may only be test data')
insert into #temp1 values ('hello may i only say bye')
insert into #temp1 values ('this is only bye')
insert into #temp1 values ('hello')
insert into #temp1 values ('good')

declare @str varchar(8000)='may only be'

select * from [dbo].[fnParseStringToSet](@str,' ')

select 
	*,
	@str as s
from
	#temp1 t
	inner join [dbo].[fnParseStringToSet](@str,' ') w
		on t.descrip like '%'+w.Elem+'%'
order by 
	descrip

-- OR
select 
	descrip,
	count(*) as finds,
	@str as s
from
	#temp1 t
	inner join [dbo].[fnParseStringToSet](@str,' ') w
		on t.descrip like '%'+w.Elem+'%'
group by 
	descrip
order by 
	finds desc

Open in new window

0
 
LVL 31

Author Closing Comment

by:LeeDerbyshire
ID: 39299374
Well, my project suddenly got a bit more complicated, and I now have to do a bit more work with my search terms before I can send them to SQL. But the suggestions here were looking very promising before the project team moved the goalposts.
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Whether you’re a college noob or a soon-to-be pro, these tips are sure to help you in your journey to becoming a programming ninja and stand out from the crowd.
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

808 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question