Link to home
Start Free TrialLog in
Avatar of volking
volking

asked on

TEXTPTR? PATINDEX? READTEXT?

Need advice on the "best-approach" to accomplish ... this ...

TEXT var types in a MS SQL 8.0 database.
My TEXT fields can contain (literally) up to 2 gb each.

Using TSQL in a Stored Proc, I need to

"walk" through all records meeting a condition
and for each record
"walk" through every instance of the words "Hello World" appearing anywhere in the TEXT column
and for each instance ...
capture the 500 characters BEFORE "Hello World" and the 500 characters AFTER "Hello World"

So, if a single record's 2gb TEXT field has 10 instances of "Hello World", then I need each instance's 500chars+"Hello World"+500chars

I assume I'll start with a #tempTable, and use a cursor to walk the records ...
Finding 10 instances, I'll write 500chars+"Hello World"+500chars to the temp table .... 10 times ....

But ... Using TSQL ... how the heck do I walk the 2gb of each TEXT field to find the 10 instances?????

Advice is welcome.
Speed will be important ...

Avatar of volking
volking

ASKER

Also concerned about calculating 500+"Hello World"+500 when "Hello World" is the very-first or very-last words in the TEXT field.
Avatar of Anthony Perkins
Do yourself a big favor and look up Full-Text Search in BOL.  You are going about it the wrong way.
Full-Text Search will resolve the first part of your question.  You can then add all the rows to a Table variable and cycle through them using PATINDEX and SUBSTRING to get the values you need.  There is no need for TEXTPTR, let alone READTEXT.
Avatar of volking

ASKER

@acperkins - thanks for the advice ... how does it address the underlying problem?

Finding which particular records contain "Hello World" SOMEPLACE is rather easy. You're right, Full-Text search would work great.

HOWEVER ... let's assume the record is identified ... How to I locate ALL TEN INSTANCES of the words "Hello World" burried SOMEPLACE in a 2 gigabyte text bucket ... using TSQL?

Any Idea's?
Avatar of volking

ASKER

@acperkins - remember that PATINDEX does not accept a Start-Look-At offset location.
Avatar of volking

ASKER

@acperkins - Obviously, using a solution based in the Client is not feasible, since such would require over-the-wire transmissions of theoretically dozens-and-dozens of gigabytes from the server back to the client for additional sub-processing. A solution that works at-the-server is the only reasonable approach? Right?
ASKER CERTIFIED SOLUTION
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of volking

ASKER

declare @StartChar as bigint
set @StartChar = (select PATINDEX('%Hello World%', TextField) from table where key = 'value')
while @StartChar <> 0
BEGIN
    select substring(TextField, @StartChar, 1000)
    from table
    where key = 'value'

    set @StartChar = (select PatIndex('%Hello World%', Substring(TextField, @StartChar, 20000000000000000000)) from table where key = 'value')
END
Avatar of volking

ASKER

Rats! Sometimes it's SO SIMPLE!
I've written your sample about 5 times.
Every time I wrote it, it failed!

BUT, I did not use a BIGINT
Changed my code from int to bigint.
Shazam!
Worked first time!

So close, yet so far!