Solved

Search for dictionary words in another table

Posted on 2010-11-26
20
265 Views
Last Modified: 2012-06-27
I have one table with dictionary words (4000 words) like this:

tableDictionary
DicID
Title
Description
Hits

And I have a table with content (articles).
tableContent
ContentID
Title
Shorttext
Fulltext

I need to find what Titles from tableDictdionary is actually found in tableContent.Title,Shorttext and fulltext and if found update tableDictionary.Hits with number of occurences. 0,1,2,3,4 ...

Both tables are indexed fulltext if this help?

I just need to run this query once to update the tableDictionary.Hits column.

Pls ask me if you need more info.

I need full code that is working.

Best regards

Sjur
0
Comment
Question by:wahooo
  • 8
  • 8
  • 4
20 Comments
 
LVL 10

Expert Comment

by:hosneylk
ID: 34216465
something like this maybe?
UPDATE TABLEDICTIONARY
SET hits = (SELECT ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, TABLEDICTIONARY.Title, ''))) / LEN(TABLEDICTIONARY.Title), 0) +
                   ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, TABLEDICTIONARY.Title, ''))) / LEN(TABLEDICTIONARY.Title), 0) +
                   ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, TABLEDICTIONARY.Title, ''))) / LEN(TABLEDICTIONARY.Title), 0)
            FROM tableContent TC)

Open in new window

0
 

Author Comment

by:wahooo
ID: 34216505
The problem is the shorttext and fulltext column is text datatype.

SJur

0
 
LVL 10

Expert Comment

by:hosneylk
ID: 34216519
how about casting it to varchar(max)

e.g.
cast (shorttext as varchar(max))
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 

Author Comment

by:wahooo
ID: 34216526
its too long article text... long articles, lots of text..
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 34217118
As I know starting with SQL 2005 the text type was deprecated and replaced with varchar(max). If your database is a restore from an earlier version it might be possible that the text type was just kept for backwards compatibility(this would be true for 2005 only, in 2008 there is no text data type anymore) in which case text is synonym to varchar(max). If that is the case the string functions should work on it.

If for some reason don't then you should create a temp table or a another permanent table where the shorttext and fulltext columns will be of varchar(max) type and insert to it from your original table and then do the search.

I would like to hear a little about how do you think that hit count should be build. Is it how many times you find title in, lets say, fulltext?
0
 

Author Comment

by:wahooo
ID: 34217390
I have convertet Text -> varchar(MAX).

But I get this error when running the query:
-------------
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
The statement has been terminated.
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 34217604
You have a subquery in either the select list or where clause or in the on clause that returns more than one row. That is not allowed, Think of something like:

SELECT * FROM YourTable
WHERE YouCoumn = (SELECT Col FROM AnotherTable)

In this cas the (SELECT Col FROM AnotherTable) query should not retun more than 1 value(row)

To make it easier to help you better post your query here.
0
 

Author Comment

by:wahooo
ID: 34217611
UPDATE utBFDic_copy
SET hits = (SELECT ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)
            FROM utBFcontent_copy TC)
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 34217903
I thought you had to compare the title from one table in the other. Here you use the same table. ANyway as I said you cannot asssign to a field, hits, a set of values from the subquery.n It has to be only one value that is one row one column.

The
(SELECT ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)
            FROM utBFcontent_copy TC)

Returns more than one rows. That is the problem.

I will ask againg. Can you be mroe exact to what you want to count as hits and how? I am not sure that query does the right thing.
0
 
LVL 10

Expert Comment

by:hosneylk
ID: 34218134
try adding a sum to it
0
 
LVL 10

Expert Comment

by:hosneylk
ID: 34218140

UPDATE utBFDic_copy
SET hits = (SELECT SUM(ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)) +
                   SUM(ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)) +
                   SUM(ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0))
            FROM utBFcontent_copy TC)

Open in new window

0
 

Author Comment

by:wahooo
ID: 34238174
hosneylk: I get this error message on your query..
----------------------------
Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.

0
 

Author Comment

by:wahooo
ID: 34255471
Zberteoc:

What I want is this:

In table utBFDic_copy I have about 4000 records. utBFDic_copy.Title is the key Title (Dictionary word).

I then have utBFContent_copy table with about 2000 articles. I want to find how many actual Dictionary words are found in this table. From columns utBFcontent_copy.Title, utBFcontent_copy.Shorttext and utBFContent_fulltext.
The number of occurences found I want to be inserted in utBFDic_copy.Hits.

Sjur
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 34259801
The big question is will this search be done by each word in title utBFDic_copy table all the rows in utBFContent_copy table?

If yes this looks like a CROSS JOIN to me, each row in the first table matched against all the rows in teh second table, which could take a very long time if the tables are big. The total rows will be the multiplication of the row numbers in the 2 tables.

Is this the case?
0
 

Author Comment

by:wahooo
ID: 34259853
yes this is the case. One way of looping through each title in utBFDic_copy table and match with all records in utBFContent_copy table.

this job I just need to do once. I can also do it directly on the db server, Do you have any solutions that you think will work?
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 34259977
Actually I think I found a way to avoid the cross join but it will still take a long tome to run.

I commented out the SELECT line because is not really needed unless you want to test the results first before execute teh update.
update t set
		Hits=occ.title_occurences
-- select occ.*		
from 
	utBFDic_copy t
	inner join
	(
		select
			ttl.Title,
			SUM(
				(	-- length of both texts
					len(cnt.Shorttext)+len(cnt.Fulltext)-
					-- minus the length of both texts after removing the Title
					len(replace(cnt.Shorttext, ttl.Title, ''))+len(replace(cnt.Fulltext, ttl.Title, ''))
				) -- the difference will be the total length occupied by all occurences of title in both columns
				 /
					-- divided by the length of Title will give the numbor of occurrences of Title in the 2 texts
					len(ttl.Title)
			)							as title_occurences
		from 
			 utBFDic_copy ttl
			 inner join utBFContent_copy cnt
				on cnt.Shorttext like '%'+ttl.Title+'%'
				or cnt.Fulltext like '%'+ttl.Title+'%'
		group by 
			ttl.Title
	) occ
		on occ.Title=t.Title

Open in new window

0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 34260012
NO, no matter what you do you NEVER do a loop through the rows to match to another table in SQL. That is the performance killer by definition in SQL world.

There is always a way do use the set base approach, which as you can see in the code I posted is one single update statement using some joins.

Even if the joins take long time to execute it will be incomparable faster than the row by row approach.
0
 
LVL 26

Expert Comment

by:Zberteoc
ID: 34260036
I hope at least that the utBFDic_copy table has an index on Title column. If not create one right away before you execute the query. It will improve performance a lot.
0
 
LVL 26

Accepted Solution

by:
Zberteoc earned 500 total points
ID: 34260141
One small modification. I changed the inner join to a left join in the update so we can set to 0 the hits for Titles that are not found at all in the 2 texts (occ.title_occurences will be null in this case):
update t set
		Hits=isnull(occ.title_occurences,0)
-- select occ.*		
from 
	utBFDic_copy t
	left join
	(
		select
			ttl.Title,
			SUM(
				(	-- length of both texts
					len(cnt.Shorttext)+len(cnt.Fulltext)-
					-- minus the length of both texts after removing the Title
					len(replace(cnt.Shorttext, ttl.Title, ''))+len(replace(cnt.Fulltext, ttl.Title, ''))
				) -- the difference will be the total length occupied by all occurences of title in both columns
				 /
					-- divided by the length of Title will give the numbor of occurrences of Title in the 2 texts
					len(ttl.Title)
			)							as title_occurences
		from 
			 utBFDic_copy ttl
			 inner join utBFContent_copy cnt
				on cnt.Shorttext like '%'+ttl.Title+'%'
				or cnt.Fulltext like '%'+ttl.Title+'%'
		group by 
			ttl.Title
	) occ
		on occ.Title=t.Title

Open in new window

0
 

Author Comment

by:wahooo
ID: 34262031
ok let me try this tomorrow, thanks a lot so far. I will let you know tomorrow and of course give points if successful :)

0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In SQL Server, when rows are selected from a table, does it retrieve data in the order in which it is inserted?  Many believe this is the case. Let us try to examine for ourselves with an example. To get started, use the following script, wh…
How to leverage one TLS certificate to encrypt Microsoft SQL traffic and Remote Desktop Services, versus creating multiple tickets for the same server.
This Micro Tutorial will give you a basic overview how to record your screen with Microsoft Expression Encoder. This program is still free and open for the public to download. This will be demonstrated using Microsoft Expression Encoder 4.
Windows 10 is mostly good. However the one thing that annoys me is how many clicks you have to do to dial a VPN connection. You have to go to settings from the start menu, (2 clicks), Network and Internet (1 click), Click VPN (another click) then fi…

813 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now