We help IT Professionals succeed at work.

We've partnered with Certified Experts, Carl Webster and Richard Faulkner, to bring you a podcast all about Citrix Workspace, moving to the cloud, and analytics & intelligence. Episode 2 coming soon!Listen Now

x

TEXTPTR? PATINDEX? READTEXT?

volking
volking asked
on
Medium Priority
773 Views
Last Modified: 2008-01-09
Need advice on the "best-approach" to accomplish ... this ...

TEXT var types in a MS SQL 8.0 database.
My TEXT fields can contain (literally) up to 2 gb each.

Using TSQL in a Stored Proc, I need to

"walk" through all records meeting a condition
and for each record
"walk" through every instance of the words "Hello World" appearing anywhere in the TEXT column
and for each instance ...
capture the 500 characters BEFORE "Hello World" and the 500 characters AFTER "Hello World"

So, if a single record's 2gb TEXT field has 10 instances of "Hello World", then I need each instance's 500chars+"Hello World"+500chars

I assume I'll start with a #tempTable, and use a cursor to walk the records ...
Finding 10 instances, I'll write 500chars+"Hello World"+500chars to the temp table .... 10 times ....

But ... Using TSQL ... how the heck do I walk the 2gb of each TEXT field to find the 10 instances?????

Advice is welcome.
Speed will be important ...

Comment
Watch Question

Author

Commented:
Also concerned about calculating 500+"Hello World"+500 when "Hello World" is the very-first or very-last words in the TEXT field.
CERTIFIED EXPERT
Top Expert 2012

Commented:
Do yourself a big favor and look up Full-Text Search in BOL.  You are going about it the wrong way.
CERTIFIED EXPERT
Top Expert 2012

Commented:
Full-Text Search will resolve the first part of your question.  You can then add all the rows to a Table variable and cycle through them using PATINDEX and SUBSTRING to get the values you need.  There is no need for TEXTPTR, let alone READTEXT.

Author

Commented:
@acperkins - thanks for the advice ... how does it address the underlying problem?

Finding which particular records contain "Hello World" SOMEPLACE is rather easy. You're right, Full-Text search would work great.

HOWEVER ... let's assume the record is identified ... How to I locate ALL TEN INSTANCES of the words "Hello World" burried SOMEPLACE in a 2 gigabyte text bucket ... using TSQL?

Any Idea's?

Author

Commented:
@acperkins - remember that PATINDEX does not accept a Start-Look-At offset location.

Author

Commented:
@acperkins - Obviously, using a solution based in the Client is not feasible, since such would require over-the-wire transmissions of theoretically dozens-and-dozens of gigabytes from the server back to the client for additional sub-processing. A solution that works at-the-server is the only reasonable approach? Right?
CERTIFIED EXPERT
Top Expert 2012
Commented:
>> how does it address the underlying problem?<<
Which one?  There was two:
1. "I assume I'll start with a #tempTable, and use a cursor to walk the records ..."
2. "Finding 10 instances, I'll write 500chars+"Hello World"+500chars to the temp table .... 10 times ...."

As I stated "Full-Text Search will resolve the first part of your question.  You can then add all the rows to a Table variable and cycle through them using PATINDEX and SUBSTRING to get the values you need. "

Assuming that you now have all the rows in a variable of type table with an IDENTITY column (ID) you can now cycle through all the rows by iterateing over the ID's and also cycling through all the values within the text column.  This second part you can use PATINDEX with SUBSTRING(YourTextColumn, @Pos, LEN(YourTextColumn)) or if you prefer CHARINDEX with @Pos. Where @Pos is the offset you are iterating.

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts

Author

Commented:
declare @StartChar as bigint
set @StartChar = (select PATINDEX('%Hello World%', TextField) from table where key = 'value')
while @StartChar <> 0
BEGIN
    select substring(TextField, @StartChar, 1000)
    from table
    where key = 'value'

    set @StartChar = (select PatIndex('%Hello World%', Substring(TextField, @StartChar, 20000000000000000000)) from table where key = 'value')
END

Author

Commented:
Rats! Sometimes it's SO SIMPLE!
I've written your sample about 5 times.
Every time I wrote it, it failed!

BUT, I did not use a BIGINT
Changed my code from int to bigint.
Shazam!
Worked first time!

So close, yet so far!
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.