Solved

Search for dictionary words in another table

Posted on 2010-11-26
20
264 Views
Last Modified: 2012-06-27
I have one table with dictionary words (4000 words) like this:

tableDictionary
DicID
Title
Description
Hits

And I have a table with content (articles).
tableContent
ContentID
Title
Shorttext
Fulltext

I need to find what Titles from tableDictdionary is actually found in tableContent.Title,Shorttext and fulltext and if found update tableDictionary.Hits with number of occurences. 0,1,2,3,4 ...

Both tables are indexed fulltext if this help?

I just need to run this query once to update the tableDictionary.Hits column.

Pls ask me if you need more info.

I need full code that is working.

Best regards

Sjur
0
Comment
Question by:wahooo
  • 8
  • 8
  • 4
20 Comments
 
LVL 10

Expert Comment

by:hosneylk
Comment Utility
something like this maybe?
UPDATE TABLEDICTIONARY

SET hits = (SELECT ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, TABLEDICTIONARY.Title, ''))) / LEN(TABLEDICTIONARY.Title), 0) +

                   ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, TABLEDICTIONARY.Title, ''))) / LEN(TABLEDICTIONARY.Title), 0) +

                   ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, TABLEDICTIONARY.Title, ''))) / LEN(TABLEDICTIONARY.Title), 0)

            FROM tableContent TC)

Open in new window

0
 

Author Comment

by:wahooo
Comment Utility
The problem is the shorttext and fulltext column is text datatype.

SJur

0
 
LVL 10

Expert Comment

by:hosneylk
Comment Utility
how about casting it to varchar(max)

e.g.
cast (shorttext as varchar(max))
0
 

Author Comment

by:wahooo
Comment Utility
its too long article text... long articles, lots of text..
0
 
LVL 26

Expert Comment

by:Zberteoc
Comment Utility
As I know starting with SQL 2005 the text type was deprecated and replaced with varchar(max). If your database is a restore from an earlier version it might be possible that the text type was just kept for backwards compatibility(this would be true for 2005 only, in 2008 there is no text data type anymore) in which case text is synonym to varchar(max). If that is the case the string functions should work on it.

If for some reason don't then you should create a temp table or a another permanent table where the shorttext and fulltext columns will be of varchar(max) type and insert to it from your original table and then do the search.

I would like to hear a little about how do you think that hit count should be build. Is it how many times you find title in, lets say, fulltext?
0
 

Author Comment

by:wahooo
Comment Utility
I have convertet Text -> varchar(MAX).

But I get this error when running the query:
-------------
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
The statement has been terminated.
0
 
LVL 26

Expert Comment

by:Zberteoc
Comment Utility
You have a subquery in either the select list or where clause or in the on clause that returns more than one row. That is not allowed, Think of something like:

SELECT * FROM YourTable
WHERE YouCoumn = (SELECT Col FROM AnotherTable)

In this cas the (SELECT Col FROM AnotherTable) query should not retun more than 1 value(row)

To make it easier to help you better post your query here.
0
 

Author Comment

by:wahooo
Comment Utility
UPDATE utBFDic_copy
SET hits = (SELECT ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)
            FROM utBFcontent_copy TC)
0
 
LVL 26

Expert Comment

by:Zberteoc
Comment Utility
I thought you had to compare the title from one table in the other. Here you use the same table. ANyway as I said you cannot asssign to a field, hits, a set of values from the subquery.n It has to be only one value that is one row one column.

The
(SELECT ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)
            FROM utBFcontent_copy TC)

Returns more than one rows. That is the problem.

I will ask againg. Can you be mroe exact to what you want to count as hits and how? I am not sure that query does the right thing.
0
 
LVL 10

Expert Comment

by:hosneylk
Comment Utility
try adding a sum to it
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 10

Expert Comment

by:hosneylk
Comment Utility

UPDATE utBFDic_copy

SET hits = (SELECT SUM(ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)) +

                   SUM(ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)) +

                   SUM(ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0))

            FROM utBFcontent_copy TC)

Open in new window

0
 

Author Comment

by:wahooo
Comment Utility
hosneylk: I get this error message on your query..
----------------------------
Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.

0
 

Author Comment

by:wahooo
Comment Utility
Zberteoc:

What I want is this:

In table utBFDic_copy I have about 4000 records. utBFDic_copy.Title is the key Title (Dictionary word).

I then have utBFContent_copy table with about 2000 articles. I want to find how many actual Dictionary words are found in this table. From columns utBFcontent_copy.Title, utBFcontent_copy.Shorttext and utBFContent_fulltext.
The number of occurences found I want to be inserted in utBFDic_copy.Hits.

Sjur
0
 
LVL 26

Expert Comment

by:Zberteoc
Comment Utility
The big question is will this search be done by each word in title utBFDic_copy table all the rows in utBFContent_copy table?

If yes this looks like a CROSS JOIN to me, each row in the first table matched against all the rows in teh second table, which could take a very long time if the tables are big. The total rows will be the multiplication of the row numbers in the 2 tables.

Is this the case?
0
 

Author Comment

by:wahooo
Comment Utility
yes this is the case. One way of looping through each title in utBFDic_copy table and match with all records in utBFContent_copy table.

this job I just need to do once. I can also do it directly on the db server, Do you have any solutions that you think will work?
0
 
LVL 26

Expert Comment

by:Zberteoc
Comment Utility
Actually I think I found a way to avoid the cross join but it will still take a long tome to run.

I commented out the SELECT line because is not really needed unless you want to test the results first before execute teh update.
update t set

		Hits=occ.title_occurences

-- select occ.*		

from 

	utBFDic_copy t

	inner join

	(

		select

			ttl.Title,

			SUM(

				(	-- length of both texts

					len(cnt.Shorttext)+len(cnt.Fulltext)-

					-- minus the length of both texts after removing the Title

					len(replace(cnt.Shorttext, ttl.Title, ''))+len(replace(cnt.Fulltext, ttl.Title, ''))

				) -- the difference will be the total length occupied by all occurences of title in both columns

				 /

					-- divided by the length of Title will give the numbor of occurrences of Title in the 2 texts

					len(ttl.Title)

			)							as title_occurences

		from 

			 utBFDic_copy ttl

			 inner join utBFContent_copy cnt

				on cnt.Shorttext like '%'+ttl.Title+'%'

				or cnt.Fulltext like '%'+ttl.Title+'%'

		group by 

			ttl.Title

	) occ

		on occ.Title=t.Title

Open in new window

0
 
LVL 26

Expert Comment

by:Zberteoc
Comment Utility
NO, no matter what you do you NEVER do a loop through the rows to match to another table in SQL. That is the performance killer by definition in SQL world.

There is always a way do use the set base approach, which as you can see in the code I posted is one single update statement using some joins.

Even if the joins take long time to execute it will be incomparable faster than the row by row approach.
0
 
LVL 26

Expert Comment

by:Zberteoc
Comment Utility
I hope at least that the utBFDic_copy table has an index on Title column. If not create one right away before you execute the query. It will improve performance a lot.
0
 
LVL 26

Accepted Solution

by:
Zberteoc earned 500 total points
Comment Utility
One small modification. I changed the inner join to a left join in the update so we can set to 0 the hits for Titles that are not found at all in the 2 texts (occ.title_occurences will be null in this case):
update t set

		Hits=isnull(occ.title_occurences,0)

-- select occ.*		

from 

	utBFDic_copy t

	left join

	(

		select

			ttl.Title,

			SUM(

				(	-- length of both texts

					len(cnt.Shorttext)+len(cnt.Fulltext)-

					-- minus the length of both texts after removing the Title

					len(replace(cnt.Shorttext, ttl.Title, ''))+len(replace(cnt.Fulltext, ttl.Title, ''))

				) -- the difference will be the total length occupied by all occurences of title in both columns

				 /

					-- divided by the length of Title will give the numbor of occurrences of Title in the 2 texts

					len(ttl.Title)

			)							as title_occurences

		from 

			 utBFDic_copy ttl

			 inner join utBFContent_copy cnt

				on cnt.Shorttext like '%'+ttl.Title+'%'

				or cnt.Fulltext like '%'+ttl.Title+'%'

		group by 

			ttl.Title

	) occ

		on occ.Title=t.Title

Open in new window

0
 

Author Comment

by:wahooo
Comment Utility
ok let me try this tomorrow, thanks a lot so far. I will let you know tomorrow and of course give points if successful :)

0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Suggested Solutions

If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
In this article I will describe the Detach & Attach method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now