Solved

I need to find commonly used phrases in a text field in a SQL table (SQL 2008 R2)

Posted on 2014-11-12
6
132 Views
Last Modified: 2014-11-13
I need to find commonly used phrases in a text field in a SQL table (SQL 2008 R2).  The text contain reports written by medical professionals. The output I would like is a listing of phrases in the text and how often they are used. For example, I would want the output to look something like this:
# of Occurrences  Phrase
120 - Continue with plan of care
119 - The patient complained of pain located
100 - Restrict lifting to a maximum of
..
...

I can run loops to find all of the length of phrases and dump phrases into a table and then query for a count of each one, but I know that will take a lot of time. Is there any way to do this more efficiently and faster?
0
Comment
Question by:loneieagle2
  • 2
  • 2
  • 2
6 Comments
 
LVL 49

Expert Comment

by:Gustav Brock
Comment Utility
Did you try? A pass-through should be quite fast:

    Select Count(*) From TheTable Where TheField Like '%Continue with plan of care%'

/gustav
0
 
LVL 1

Author Comment

by:loneieagle2
Comment Utility
Gustav,

I guess my question wasn't clear. I don't know what the phrases are yet. So I have to analyze the text and determine what the phrases are and how often they are used. My idea was to find individual words (by parsing based on the spaces between words) and storing those in a table. Then go through and find two word phrases and drop them in the table, etc until I have the longest phrases. Then I can run the query you suggest. So since the average text field contains about 200 words and there are about a million records in the table, I would be making 200 million passes thru whatever method I use to extract the phrases.

Bob
0
 
LVL 49

Assisted Solution

by:Gustav Brock
Gustav Brock earned 250 total points
Comment Utility
I see. I don't think there is a smart way doing this other than the method you describe.

If it is a one-time operation, you could write some loops and let them run overnight.

/gustav
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 47

Accepted Solution

by:
Dale Fye (Access MVP) earned 250 total points
Comment Utility
I would agree with Gustav, if you knew what phrases were in use, you could save those phrases to a table, and then do a query against those specific phrases, but without first knowing what phrases, you would have to parse each record into 2, 3, 4, 5, .... word phrases, and save those to a table.  

This would be a very time consuming task involving loops through all of the records, and then loops for the length of the phrase (# of words), and the start point within each record, which would overlap.  A single record of only 10 words would have 45 possible "phrases" of 2-10 words in length.
0
 
LVL 47

Expert Comment

by:Dale Fye (Access MVP)
Comment Utility
@loneieagle2

Oops, I forgot to refresh the thread and didn't see your first response to Gustav before posting.
0
 
LVL 1

Author Closing Comment

by:loneieagle2
Comment Utility
Thanks, I was hoping for something better, but it looks like what I call the "Sledge Hammer" method is going to be the best.

Dale,

I can use what you told me to find out pretty close to how many phrases I will be creating to see if it is even feasible.

Bob
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Suggested Solutions

In this article I will describe the Detach & Attach method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
In this article I will describe the Copy Database Wizard method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
Learn how to number pages in an Access report over each group. Activate two pass printing by referencing the pages property: Add code to the Page Footers OnFormat event to capture the pages as there occur for each group. Use the pages property to …
In Microsoft Access, learn how to “cascade” or have the displayed data of one combo control depend upon what’s entered in another. Base the dependent combo on a query for its row source: Add a reference to the first combo on the form as criteria i…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now