Solved

Complex: Data Mining some FT indexed text with Sql Server 2008

Posted on 2011-03-23
3
238 Views
Last Modified: 2012-05-11
Hi Experts.  

I'm doing some predictive data mining.  What I want to predict is something called a ConditonRed.  These occurances are always preceded by a number of incidents and all kinds of data are available for attributes and my mining model development is coming along well.

I want to add some simple text mining into my model.  One of the data bits available to me is the raw text of what a service rep types up about an incident.  I have a theory that I can improve my predictions based on the emotional content of the notes people type about the incidents. SO:

Consider two tables.  The incident table with two columns: IncidentID and Notes. Notes is an FT Indexed column.    The emotive word table with one column: EmotiveWord.  There are about 1000 words in the table but just assume only Green, Red, Blue.

I want to generate an output table with columns IncidentID, w_Green, w_Red, w_Blue where each of the w_ columns are just BITs that indicate whether the Notes column for that IncidentID contains the word.  Such an exercise would give me access to a whole world of possibilities based on real textual analysis done outside of the model to be used as an input to the model.

Thoughts - How would you go about it ?



0
Comment
Question by:_Scotch_
  • 2
3 Comments
 
LVL 2

Expert Comment

by:ericpeckham
ID: 35219651
There may be a better way to do it, but this will work:

   
;WITH GreenIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Green')
), RedIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Red')
), BlueIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Blue')
)
SELECT 
	IncidentID = COALESCE(g.IncidentID, r.IncidentID, b.IncidentID),
	w_Green = CONVERT(BIT, CASE WHEN g.IncidentID IS NOT NULL THEN 1 ELSE 0 END),
	w_Red = CONVERT(BIT, CASE WHEN r.IncidentID IS NOT NULL THEN 1 ELSE 0 END),
	w_Blue = CONVERT(BIT, CASE WHEN b.IncidentID IS NOT NULL THEN 1 ELSE 0 END)
FROM
	GreenIncidents g
	FULL JOIN RedIncidents r ON g.IncidentID = r.IncidentID
	FULL JOIN BlueIncidents b ON g.IncidentID = b.IncidentID

Open in new window

0
 
LVL 2

Accepted Solution

by:
ericpeckham earned 500 total points
ID: 35219659
Oy, I forgot to replace my column name with yours - substitute Notes for TextContent.  :-)
0
 
LVL 3

Author Closing Comment

by:_Scotch_
ID: 35226937
Thanks I'll try that...
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
I have a large data set and a SSIS package. How can I load this file in multi threading?
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question