Solved

Search for dictionary words in another table

Posted on 2010-11-26
20
270 Views
Last Modified: 2012-06-27
I have one table with dictionary words (4000 words) like this:

tableDictionary
DicID
Title
Description
Hits

And I have a table with content (articles).
tableContent
ContentID
Title
Shorttext
Fulltext

I need to find what Titles from tableDictdionary is actually found in tableContent.Title,Shorttext and fulltext and if found update tableDictionary.Hits with number of occurences. 0,1,2,3,4 ...

Both tables are indexed fulltext if this help?

I just need to run this query once to update the tableDictionary.Hits column.

Pls ask me if you need more info.

I need full code that is working.

Best regards

Sjur
0
Comment
Question by:wahooo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 8
  • 4
20 Comments
 
LVL 10

Expert Comment

by:hosneylk
ID: 34216465
something like this maybe?
UPDATE TABLEDICTIONARY
SET hits = (SELECT ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, TABLEDICTIONARY.Title, ''))) / LEN(TABLEDICTIONARY.Title), 0) +
                   ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, TABLEDICTIONARY.Title, ''))) / LEN(TABLEDICTIONARY.Title), 0) +
                   ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, TABLEDICTIONARY.Title, ''))) / LEN(TABLEDICTIONARY.Title), 0)
            FROM tableContent TC)

Open in new window

0
 

Author Comment

by:wahooo
ID: 34216505
The problem is the shorttext and fulltext column is text datatype.

SJur

0
 
LVL 10

Expert Comment

by:hosneylk
ID: 34216519
how about casting it to varchar(max)

e.g.
cast (shorttext as varchar(max))
0
The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.

 

Author Comment

by:wahooo
ID: 34216526
its too long article text... long articles, lots of text..
0
 
LVL 27

Expert Comment

by:Zberteoc
ID: 34217118
As I know starting with SQL 2005 the text type was deprecated and replaced with varchar(max). If your database is a restore from an earlier version it might be possible that the text type was just kept for backwards compatibility(this would be true for 2005 only, in 2008 there is no text data type anymore) in which case text is synonym to varchar(max). If that is the case the string functions should work on it.

If for some reason don't then you should create a temp table or a another permanent table where the shorttext and fulltext columns will be of varchar(max) type and insert to it from your original table and then do the search.

I would like to hear a little about how do you think that hit count should be build. Is it how many times you find title in, lets say, fulltext?
0
 

Author Comment

by:wahooo
ID: 34217390
I have convertet Text -> varchar(MAX).

But I get this error when running the query:
-------------
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
The statement has been terminated.
0
 
LVL 27

Expert Comment

by:Zberteoc
ID: 34217604
You have a subquery in either the select list or where clause or in the on clause that returns more than one row. That is not allowed, Think of something like:

SELECT * FROM YourTable
WHERE YouCoumn = (SELECT Col FROM AnotherTable)

In this cas the (SELECT Col FROM AnotherTable) query should not retun more than 1 value(row)

To make it easier to help you better post your query here.
0
 

Author Comment

by:wahooo
ID: 34217611
UPDATE utBFDic_copy
SET hits = (SELECT ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)
            FROM utBFcontent_copy TC)
0
 
LVL 27

Expert Comment

by:Zberteoc
ID: 34217903
I thought you had to compare the title from one table in the other. Here you use the same table. ANyway as I said you cannot asssign to a field, hits, a set of values from the subquery.n It has to be only one value that is one row one column.

The
(SELECT ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0) +
                   ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)
            FROM utBFcontent_copy TC)

Returns more than one rows. That is the problem.

I will ask againg. Can you be mroe exact to what you want to count as hits and how? I am not sure that query does the right thing.
0
 
LVL 10

Expert Comment

by:hosneylk
ID: 34218134
try adding a sum to it
0
 
LVL 10

Expert Comment

by:hosneylk
ID: 34218140

UPDATE utBFDic_copy
SET hits = (SELECT SUM(ISNULL((LEN(TC.Title) - LEN(REPLACE(TC.Title, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)) +
                   SUM(ISNULL((LEN(TC.Shorttext) - LEN(REPLACE(TC.Shorttext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0)) +
                   SUM(ISNULL((LEN(TC.Fulltext) - LEN(REPLACE(TC.Fulltext, utBFDic_copy.Title, ''))) / LEN(utBFDic_copy.Title), 0))
            FROM utBFcontent_copy TC)

Open in new window

0
 

Author Comment

by:wahooo
ID: 34238174
hosneylk: I get this error message on your query..
----------------------------
Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.

0
 

Author Comment

by:wahooo
ID: 34255471
Zberteoc:

What I want is this:

In table utBFDic_copy I have about 4000 records. utBFDic_copy.Title is the key Title (Dictionary word).

I then have utBFContent_copy table with about 2000 articles. I want to find how many actual Dictionary words are found in this table. From columns utBFcontent_copy.Title, utBFcontent_copy.Shorttext and utBFContent_fulltext.
The number of occurences found I want to be inserted in utBFDic_copy.Hits.

Sjur
0
 
LVL 27

Expert Comment

by:Zberteoc
ID: 34259801
The big question is will this search be done by each word in title utBFDic_copy table all the rows in utBFContent_copy table?

If yes this looks like a CROSS JOIN to me, each row in the first table matched against all the rows in teh second table, which could take a very long time if the tables are big. The total rows will be the multiplication of the row numbers in the 2 tables.

Is this the case?
0
 

Author Comment

by:wahooo
ID: 34259853
yes this is the case. One way of looping through each title in utBFDic_copy table and match with all records in utBFContent_copy table.

this job I just need to do once. I can also do it directly on the db server, Do you have any solutions that you think will work?
0
 
LVL 27

Expert Comment

by:Zberteoc
ID: 34259977
Actually I think I found a way to avoid the cross join but it will still take a long tome to run.

I commented out the SELECT line because is not really needed unless you want to test the results first before execute teh update.
update t set
		Hits=occ.title_occurences
-- select occ.*		
from 
	utBFDic_copy t
	inner join
	(
		select
			ttl.Title,
			SUM(
				(	-- length of both texts
					len(cnt.Shorttext)+len(cnt.Fulltext)-
					-- minus the length of both texts after removing the Title
					len(replace(cnt.Shorttext, ttl.Title, ''))+len(replace(cnt.Fulltext, ttl.Title, ''))
				) -- the difference will be the total length occupied by all occurences of title in both columns
				 /
					-- divided by the length of Title will give the numbor of occurrences of Title in the 2 texts
					len(ttl.Title)
			)							as title_occurences
		from 
			 utBFDic_copy ttl
			 inner join utBFContent_copy cnt
				on cnt.Shorttext like '%'+ttl.Title+'%'
				or cnt.Fulltext like '%'+ttl.Title+'%'
		group by 
			ttl.Title
	) occ
		on occ.Title=t.Title

Open in new window

0
 
LVL 27

Expert Comment

by:Zberteoc
ID: 34260012
NO, no matter what you do you NEVER do a loop through the rows to match to another table in SQL. That is the performance killer by definition in SQL world.

There is always a way do use the set base approach, which as you can see in the code I posted is one single update statement using some joins.

Even if the joins take long time to execute it will be incomparable faster than the row by row approach.
0
 
LVL 27

Expert Comment

by:Zberteoc
ID: 34260036
I hope at least that the utBFDic_copy table has an index on Title column. If not create one right away before you execute the query. It will improve performance a lot.
0
 
LVL 27

Accepted Solution

by:
Zberteoc earned 500 total points
ID: 34260141
One small modification. I changed the inner join to a left join in the update so we can set to 0 the hits for Titles that are not found at all in the 2 texts (occ.title_occurences will be null in this case):
update t set
		Hits=isnull(occ.title_occurences,0)
-- select occ.*		
from 
	utBFDic_copy t
	left join
	(
		select
			ttl.Title,
			SUM(
				(	-- length of both texts
					len(cnt.Shorttext)+len(cnt.Fulltext)-
					-- minus the length of both texts after removing the Title
					len(replace(cnt.Shorttext, ttl.Title, ''))+len(replace(cnt.Fulltext, ttl.Title, ''))
				) -- the difference will be the total length occupied by all occurences of title in both columns
				 /
					-- divided by the length of Title will give the numbor of occurrences of Title in the 2 texts
					len(ttl.Title)
			)							as title_occurences
		from 
			 utBFDic_copy ttl
			 inner join utBFContent_copy cnt
				on cnt.Shorttext like '%'+ttl.Title+'%'
				or cnt.Fulltext like '%'+ttl.Title+'%'
		group by 
			ttl.Title
	) occ
		on occ.Title=t.Title

Open in new window

0
 

Author Comment

by:wahooo
ID: 34262031
ok let me try this tomorrow, thanks a lot so far. I will let you know tomorrow and of course give points if successful :)

0

Featured Post

NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…
In this video you will find out how to export Office 365 mailboxes using the built in eDiscovery tool. Bear in mind that although this method might be useful in some cases, using PST files as Office 365 backup is troublesome in a long run (more on t…

617 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question