loneieagle2
asked on
I need to find commonly used phrases in a text field in a SQL table (SQL 2008 R2)
I need to find commonly used phrases in a text field in a SQL table (SQL 2008 R2). The text contain reports written by medical professionals. The output I would like is a listing of phrases in the text and how often they are used. For example, I would want the output to look something like this:
# of Occurrences Phrase
120 - Continue with plan of care
119 - The patient complained of pain located
100 - Restrict lifting to a maximum of
..
...
I can run loops to find all of the length of phrases and dump phrases into a table and then query for a count of each one, but I know that will take a lot of time. Is there any way to do this more efficiently and faster?
# of Occurrences Phrase
120 - Continue with plan of care
119 - The patient complained of pain located
100 - Restrict lifting to a maximum of
..
...
I can run loops to find all of the length of phrases and dump phrases into a table and then query for a count of each one, but I know that will take a lot of time. Is there any way to do this more efficiently and faster?
ASKER
Gustav,
I guess my question wasn't clear. I don't know what the phrases are yet. So I have to analyze the text and determine what the phrases are and how often they are used. My idea was to find individual words (by parsing based on the spaces between words) and storing those in a table. Then go through and find two word phrases and drop them in the table, etc until I have the longest phrases. Then I can run the query you suggest. So since the average text field contains about 200 words and there are about a million records in the table, I would be making 200 million passes thru whatever method I use to extract the phrases.
Bob
I guess my question wasn't clear. I don't know what the phrases are yet. So I have to analyze the text and determine what the phrases are and how often they are used. My idea was to find individual words (by parsing based on the spaces between words) and storing those in a table. Then go through and find two word phrases and drop them in the table, etc until I have the longest phrases. Then I can run the query you suggest. So since the average text field contains about 200 words and there are about a million records in the table, I would be making 200 million passes thru whatever method I use to extract the phrases.
Bob
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
@loneieagle2
Oops, I forgot to refresh the thread and didn't see your first response to Gustav before posting.
Oops, I forgot to refresh the thread and didn't see your first response to Gustav before posting.
ASKER
Thanks, I was hoping for something better, but it looks like what I call the "Sledge Hammer" method is going to be the best.
Dale,
I can use what you told me to find out pretty close to how many phrases I will be creating to see if it is even feasible.
Bob
Dale,
I can use what you told me to find out pretty close to how many phrases I will be creating to see if it is even feasible.
Bob
Select Count(*) From TheTable Where TheField Like '%Continue with plan of care%'
/gustav