Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 243
  • Last Modified:

Complex: Data Mining some FT indexed text with Sql Server 2008

Hi Experts.  

I'm doing some predictive data mining.  What I want to predict is something called a ConditonRed.  These occurances are always preceded by a number of incidents and all kinds of data are available for attributes and my mining model development is coming along well.

I want to add some simple text mining into my model.  One of the data bits available to me is the raw text of what a service rep types up about an incident.  I have a theory that I can improve my predictions based on the emotional content of the notes people type about the incidents. SO:

Consider two tables.  The incident table with two columns: IncidentID and Notes. Notes is an FT Indexed column.    The emotive word table with one column: EmotiveWord.  There are about 1000 words in the table but just assume only Green, Red, Blue.

I want to generate an output table with columns IncidentID, w_Green, w_Red, w_Blue where each of the w_ columns are just BITs that indicate whether the Notes column for that IncidentID contains the word.  Such an exercise would give me access to a whole world of possibilities based on real textual analysis done outside of the model to be used as an input to the model.

Thoughts - How would you go about it ?



0
_Scotch_
Asked:
_Scotch_
  • 2
1 Solution
 
ericpeckhamCommented:
There may be a better way to do it, but this will work:

   
;WITH GreenIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Green')
), RedIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Red')
), BlueIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Blue')
)
SELECT 
	IncidentID = COALESCE(g.IncidentID, r.IncidentID, b.IncidentID),
	w_Green = CONVERT(BIT, CASE WHEN g.IncidentID IS NOT NULL THEN 1 ELSE 0 END),
	w_Red = CONVERT(BIT, CASE WHEN r.IncidentID IS NOT NULL THEN 1 ELSE 0 END),
	w_Blue = CONVERT(BIT, CASE WHEN b.IncidentID IS NOT NULL THEN 1 ELSE 0 END)
FROM
	GreenIncidents g
	FULL JOIN RedIncidents r ON g.IncidentID = r.IncidentID
	FULL JOIN BlueIncidents b ON g.IncidentID = b.IncidentID

Open in new window

0
 
ericpeckhamCommented:
Oy, I forgot to replace my column name with yours - substitute Notes for TextContent.  :-)
0
 
_Scotch_Author Commented:
Thanks I'll try that...
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now