Solved

counting strings in ntext field

Posted on 2004-08-16
4
476 Views
Last Modified: 2012-05-05
Attempting to count a string in an ntext field

for example
the ntext field test has "ted, bob, test, sue, ted"
I want to count the occurences of "ted" in test

I've looked through the various answers provided here re: counting substrings and haven't been successful in modifying them to work for this purpose.
0
Comment
Question by:slinman2
4 Comments
 
LVL 9

Expert Comment

by:paelo
ID: 11813875
The major problem as I see it is working with the text field.  I think there is some way of chunking it up and parsing the entire thing but anything I can conceive would be terribly inefficient.

If its possible to store the text in a varchar(8000) then you can do something like this:

--declarations
DECLARE @X varchar(30),
 @N varchar(8000)

--set find value, convert text field
SELECT @X='Ted', @N=CONVERT(Varchar(8000),textfld)

--print number of occurences of that string (including instances within a word, ie. this will count the TED in faTED)
PRINT (LEN(@N)-LEN(REPLACE(@N,@X,'')))/LEN(@X)


-Paul.
0
 
LVL 15

Expert Comment

by:jdlambert1
ID: 11813883
Only ways I know of to do this are to use a cursor in a Stored Procedure, or create the logic in an Extended Stored Procedure.

In an SP, create two counter variables, one to track the starting position in the string and the other to count the hits. Plug the first one into SubString's starting position and loop (with a cursor) until you get to the end of the string, then store the total and go to the next string.
0
 
LVL 12

Accepted Solution

by:
kselvia earned 125 total points
ID: 11814386
--This is one way to do it. It is slow, but probably faster than a cursor/loop
--You need a table of numbers. Run this once to create one;

SELECT TOP 50000 identity(int,1,1) as ID
INTO Numbers
FROM sysobjects s1, sysobjects s2, sysobjects s3

--Some test data
create table t (id int, test ntext )

--Generate 2 ntext entries
insert t (id, test) select 1, 'ted, bob, test, sue, ted, '  -- ted occurs twice
insert t (id, test) select 2, 'jim, kevin, joe, ted, bob, ' -- ted occurs once

DECLARE @ptrval binary(16)
DECLARE @repeat int

-- Generate 1000 duplications of initial data for row 1
SELECT @ptrval = TEXTPTR(test) , @repeat = 0
FROM t
WHERE id = 1
WHILE @repeat < 1000
BEGIN
      UPDATETEXT t.test @ptrval 0 0 'ted, bob, test, sue, ted, '
      SET @repeat = @repeat + 1
END

-- Generate 1000 duplications of initial data for row 2
SELECT @ptrval = TEXTPTR(test) , @repeat = 0
FROM t
WHERE id = 2
WHILE @repeat < 1000
BEGIN
      UPDATETEXT t.test @ptrval 0 0 'jim, kevin, joe, ted, bob, '
      SET @repeat = @repeat + 1
END

-- Sample - 2 rows of text data
SELECT * FROM t
id          test
----------- ---------------------------------------------------...
1           ted, bob, test, sue, ted, ted, bob, test, sue, ted,...
2           jim, kevin, job, ted, bob, jim, kevin, job, ted, bob...

--Count the number of times 'ted,' occurs.

DECLARE @search varchar(10)
SET @search = 'ted,'

SELECT tid TextRow, count(1) Occurs , @search SearchString
FROM (
      SELECT
      t.id tid, n.id nid, substring(test, (n.id -1), Len(@search)) part
      FROM t, Numbers n
      WHERE (n.ID) < datalength(test)
      AND substring(test, (n.id -1), Len(@search)) = @search
) Lookup
GROUP BY tid
ORDER BY tid

TextRow     Occurs      SearchString
----------- ----------- ------------
1           2002        ted,
2           1001        ted,



--P.S. I tried to do this more efficiently by breaking the text into varchar strings but if the
--text broke in the middle of 'ted' it was not matched. The version below could be made to break on
--word seperators (, or blank) but that will take more work and the version above will solve the problem.

SELECT tid TextRow, sum(Occurs) Occurs
FROM (
      SELECT tid, (Len(part) - Len (Replace(part,'ted',''))) / len ('ted') Occurs
      FROM  (
            SELECT TOP 100 PERCENT
            t.id tid, n.id nid, datalength(test) dl, substring(test, (n.id -1) * 4000 + 1 , 4000) part
            FROM t, Numbers n
            WHERE (n.ID -1) * 4000 + 1 < datalength(test)
            ORDER BY t.ID
            ) CountOccurs
      ) Lookup
GROUP BY tid

TextRow     Occurs      
----------- -----------
1           2000            <-- missed one becase a ted occured at 4000 byte boundry
2           1001

0
 

Author Comment

by:slinman2
ID: 11854922
Thanks for all the responses.  I took the approach of kselvia and it worked out great.  thanks.
0

Featured Post

Complete Microsoft Windows PC® & Mac Backup

Backup and recovery solutions to protect all your PCs & Mac– on-premises or in remote locations. Acronis backs up entire PC or Mac with patented reliable disk imaging technology and you will be able to restore workstations to a new, dissimilar hardware in minutes.

Join & Write a Comment

In this article—a derivative of my DaytaBase.org blog post (http://daytabase.org/2011/06/18/what-week-is-it/)—I will explore a few different perspectives on which week today's date falls within using Microsoft SQL Server. First, to frame this stu…
Nowadays, some of developer are too much worried about data. Who is using data, who is updating it etc. etc. Because, data is more costlier in term of money and information. So security of data is focusing concern in days. Lets' understand the Au…
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now