Solved

Search for dictionary words in another table

Posted on 2010-11-26
20
266 Views
Last Modified: 2012-06-27
I have one table with dictionary words (4000 words) like this:

tableDictionary
DicID
Title
Description
Hits

And I have a table with content (articles).
tableContent
ContentID
Title
Shorttext
Fulltext

I need to find what Titles from tableDictdionary is actually found in tableContent.Title,Shorttext and fulltext and if found update tableDictionary.Hits with number of occurences. 0,1,2,3,4 ...

Both tables are indexed fulltext if this help?

I just need to run this query once to update the tableDictionary.Hits column.

Pls ask me if you need more info.

I need full code that is working.

Best regards

Sjur
0
Comment
Question by:wahooo
  • 8
  • 8
  • 4
20 Comments
 
LVL 10

Expert Comment

by:hosneylk
ID: 34216465
something like this maybe?
UPDATE TABLEDICTIONARY
SET hits = (SELECT ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, TABLEDICTIONARY.Title, ''))) / LEN(TABLEDICTIONARY.Title), 0) +
                   ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, TABLEDICTIONARY.Title, ''))) / LEN(TABLEDICTIONARY.Title), 0) +
                   ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, TABLEDICTIONARY.Title, ''))) / LEN(TABLEDICTIONARY.Title), 0)
            FROM tableContent TC)

Open in new window

0
 

Author Comment

by:wahooo
ID: 34216505
The problem is the shorttext and fulltext column is text datatype.

SJur

0
 
LVL 10

Expert Comment

by:hosneylk
ID: 34216519
how about casting it to varchar(max)

e.g.
cast (shorttext as varchar(max))
0
Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

 

Author Comment

by:wahooo
ID: 34216526
its too long article text... long articles, lots of text..
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 34217118
As I know starting with SQL 2005 the text type was deprecated and replaced with varchar(max). If your database is a restore from an earlier version it might be possible that the text type was just kept for backwards compatibility(this would be true for 2005 only, in 2008 there is no text data type anymore) in which case text is synonym to varchar(max). If that is the case the string functions should work on it.

If for some reason don't then you should create a temp table or a another permanent table where the shorttext and fulltext columns will be of varchar(max) type and insert to it from your original table and then do the search.

I would like to hear a little about how do you think that hit count should be build. Is it how many times you find title in, lets say, fulltext?
0
 

Author Comment

by:wahooo
ID: 34217390
I have convertet Text -> varchar(MAX).

But I get this error when running the query:
-------------
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
The statement has been terminated.
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 34217604
You have a subquery in either the select list or where clause or in the on clause that returns more than one row. That is not allowed, Think of something like:

SELECT * FROM YourTable
WHERE YouCoumn = (SELECT Col FROM AnotherTable)

In this cas the (SELECT Col FROM AnotherTable) query should not retun more than 1 value(row)

To make it easier to help you better post your query here.
0
 

Author Comment

by:wahooo
ID: 34217611
UPDATE utBFDic_copy
SET hits = (SELECT ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)
            FROM utBFcontent_copy TC)
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 34217903
I thought you had to compare the title from one table in the other. Here you use the same table. ANyway as I said you cannot asssign to a field, hits, a set of values from the subquery.n It has to be only one value that is one row one column.

The
(SELECT ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)
            FROM utBFcontent_copy TC)

Returns more than one rows. That is the problem.

I will ask againg. Can you be mroe exact to what you want to count as hits and how? I am not sure that query does the right thing.
0
 
LVL 10

Expert Comment

by:hosneylk
ID: 34218134
try adding a sum to it
0
 
LVL 10

Expert Comment

by:hosneylk
ID: 34218140

UPDATE utBFDic_copy
SET hits = (SELECT SUM(ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)) +
                   SUM(ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)) +
                   SUM(ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0))
            FROM utBFcontent_copy TC)

Open in new window

0
 

Author Comment

by:wahooo
ID: 34238174
hosneylk: I get this error message on your query..
----------------------------
Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.

0
 

Author Comment

by:wahooo
ID: 34255471
Zberteoc:

What I want is this:

In table utBFDic_copy I have about 4000 records. utBFDic_copy.Title is the key Title (Dictionary word).

I then have utBFContent_copy table with about 2000 articles. I want to find how many actual Dictionary words are found in this table. From columns utBFcontent_copy.Title, utBFcontent_copy.Shorttext and utBFContent_fulltext.
The number of occurences found I want to be inserted in utBFDic_copy.Hits.

Sjur
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 34259801
The big question is will this search be done by each word in title utBFDic_copy table all the rows in utBFContent_copy table?

If yes this looks like a CROSS JOIN to me, each row in the first table matched against all the rows in teh second table, which could take a very long time if the tables are big. The total rows will be the multiplication of the row numbers in the 2 tables.

Is this the case?
0
 

Author Comment

by:wahooo
ID: 34259853
yes this is the case. One way of looping through each title in utBFDic_copy table and match with all records in utBFContent_copy table.

this job I just need to do once. I can also do it directly on the db server, Do you have any solutions that you think will work?
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 34259977
Actually I think I found a way to avoid the cross join but it will still take a long tome to run.

I commented out the SELECT line because is not really needed unless you want to test the results first before execute teh update.
update t set
		Hits=occ.title_occurences
-- select occ.*		
from 
	utBFDic_copy t
	inner join
	(
		select
			ttl.Title,
			SUM(
				(	-- length of both texts
					len(cnt.Shorttext)+len(cnt.Fulltext)-
					-- minus the length of both texts after removing the Title
					len(replace(cnt.Shorttext, ttl.Title, ''))+len(replace(cnt.Fulltext, ttl.Title, ''))
				) -- the difference will be the total length occupied by all occurences of title in both columns
				 /
					-- divided by the length of Title will give the numbor of occurrences of Title in the 2 texts
					len(ttl.Title)
			)							as title_occurences
		from 
			 utBFDic_copy ttl
			 inner join utBFContent_copy cnt
				on cnt.Shorttext like '%'+ttl.Title+'%'
				or cnt.Fulltext like '%'+ttl.Title+'%'
		group by 
			ttl.Title
	) occ
		on occ.Title=t.Title

Open in new window

0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 34260012
NO, no matter what you do you NEVER do a loop through the rows to match to another table in SQL. That is the performance killer by definition in SQL world.

There is always a way do use the set base approach, which as you can see in the code I posted is one single update statement using some joins.

Even if the joins take long time to execute it will be incomparable faster than the row by row approach.
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 34260036
I hope at least that the utBFDic_copy table has an index on Title column. If not create one right away before you execute the query. It will improve performance a lot.
0
 
LVL 26

Accepted Solution

by:
Zberteoc earned 500 total points
ID: 34260141
One small modification. I changed the inner join to a left join in the update so we can set to 0 the hits for Titles that are not found at all in the 2 texts (occ.title_occurences will be null in this case):
update t set
		Hits=isnull(occ.title_occurences,0)
-- select occ.*		
from 
	utBFDic_copy t
	left join
	(
		select
			ttl.Title,
			SUM(
				(	-- length of both texts
					len(cnt.Shorttext)+len(cnt.Fulltext)-
					-- minus the length of both texts after removing the Title
					len(replace(cnt.Shorttext, ttl.Title, ''))+len(replace(cnt.Fulltext, ttl.Title, ''))
				) -- the difference will be the total length occupied by all occurences of title in both columns
				 /
					-- divided by the length of Title will give the numbor of occurrences of Title in the 2 texts
					len(ttl.Title)
			)							as title_occurences
		from 
			 utBFDic_copy ttl
			 inner join utBFContent_copy cnt
				on cnt.Shorttext like '%'+ttl.Title+'%'
				or cnt.Fulltext like '%'+ttl.Title+'%'
		group by 
			ttl.Title
	) occ
		on occ.Title=t.Title

Open in new window

0
 

Author Comment

by:wahooo
ID: 34262031
ok let me try this tomorrow, thanks a lot so far. I will let you know tomorrow and of course give points if successful :)

0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Long way back, we had to take help from third party tools in order to encrypt and decrypt data.  Gradually Microsoft understood the need for this feature and started to implement it by building functionality into SQL Server. Finally, with SQL 2008, …
In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

827 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question