Link to home
Start Free TrialLog in
Avatar of _Scotch_
_Scotch_

asked on

Complex: Data Mining some FT indexed text with Sql Server 2008

Hi Experts.  

I'm doing some predictive data mining.  What I want to predict is something called a ConditonRed.  These occurances are always preceded by a number of incidents and all kinds of data are available for attributes and my mining model development is coming along well.

I want to add some simple text mining into my model.  One of the data bits available to me is the raw text of what a service rep types up about an incident.  I have a theory that I can improve my predictions based on the emotional content of the notes people type about the incidents. SO:

Consider two tables.  The incident table with two columns: IncidentID and Notes. Notes is an FT Indexed column.    The emotive word table with one column: EmotiveWord.  There are about 1000 words in the table but just assume only Green, Red, Blue.

I want to generate an output table with columns IncidentID, w_Green, w_Red, w_Blue where each of the w_ columns are just BITs that indicate whether the Notes column for that IncidentID contains the word.  Such an exercise would give me access to a whole world of possibilities based on real textual analysis done outside of the model to be used as an input to the model.

Thoughts - How would you go about it ?



Avatar of ericpeckham
ericpeckham
Flag of United States of America image

There may be a better way to do it, but this will work:

   
;WITH GreenIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Green')
), RedIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Red')
), BlueIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Blue')
)
SELECT 
	IncidentID = COALESCE(g.IncidentID, r.IncidentID, b.IncidentID),
	w_Green = CONVERT(BIT, CASE WHEN g.IncidentID IS NOT NULL THEN 1 ELSE 0 END),
	w_Red = CONVERT(BIT, CASE WHEN r.IncidentID IS NOT NULL THEN 1 ELSE 0 END),
	w_Blue = CONVERT(BIT, CASE WHEN b.IncidentID IS NOT NULL THEN 1 ELSE 0 END)
FROM
	GreenIncidents g
	FULL JOIN RedIncidents r ON g.IncidentID = r.IncidentID
	FULL JOIN BlueIncidents b ON g.IncidentID = b.IncidentID

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of ericpeckham
ericpeckham
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of _Scotch_
_Scotch_

ASKER

Thanks I'll try that...