asked on

Complex: Data Mining some FT indexed text with Sql Server 2008

Hi Experts.

I'm doing some predictive data mining. What I want to predict is something called a ConditonRed. These occurances are always preceded by a number of incidents and all kinds of data are available for attributes and my mining model development is coming along well.

I want to add some simple text mining into my model. One of the data bits available to me is the raw text of what a service rep types up about an incident. I have a theory that I can improve my predictions based on the emotional content of the notes people type about the incidents. SO:

Consider two tables. The incident table with two columns: IncidentID and Notes. Notes is an FT Indexed column. The emotive word table with one column: EmotiveWord. There are about 1000 words in the table but just assume only Green, Red, Blue.

I want to generate an output table with columns IncidentID, w_Green, w_Red, w_Blue where each of the w_ columns are just BITs that indicate whether the Notes column for that IncidentID contains the word. Such an exercise would give me access to a whole world of possibilities based on real textual analysis done outside of the model to be used as an input to the model.

Thoughts - How would you go about it ?

ericpeckham

There may be a better way to do it, but this will work:

;WITH GreenIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Green')
), RedIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Red')
), BlueIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Blue')
)
SELECT 
	IncidentID = COALESCE(g.IncidentID, r.IncidentID, b.IncidentID),
	w_Green = CONVERT(BIT, CASE WHEN g.IncidentID IS NOT NULL THEN 1 ELSE 0 END),
	w_Red = CONVERT(BIT, CASE WHEN r.IncidentID IS NOT NULL THEN 1 ELSE 0 END),
	w_Blue = CONVERT(BIT, CASE WHEN b.IncidentID IS NOT NULL THEN 1 ELSE 0 END)
FROM
	GreenIncidents g
	FULL JOIN RedIncidents r ON g.IncidentID = r.IncidentID
	FULL JOIN BlueIncidents b ON g.IncidentID = b.IncidentID

Open in new window

ASKER CERTIFIED SOLUTION

ericpeckham

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

_Scotch_

ASKER

Thanks I'll try that...