I'm doing some predictive data mining. What I want to predict is something called a ConditonRed. These occurances are always preceded by a number of incidents and all kinds of data are available for attributes and my mining model development is coming along well.
I want to add some simple text mining into my model. One of the data bits available to me is the raw text of what a service rep types up about an incident. I have a theory that I can improve my predictions based on the emotional content of the notes people type about the incidents. SO:
Consider two tables. The incident table with two columns: IncidentID and Notes. Notes is an FT Indexed column. The emotive word table with one column: EmotiveWord. There are about 1000 words in the table but just assume only Green, Red, Blue.
I want to generate an output table with columns IncidentID, w_Green, w_Red, w_Blue where each of the w_ columns are just BITs that indicate whether the Notes column for that IncidentID contains the word. Such an exercise would give me access to a whole world of possibilities based on real textual analysis done outside of the model to be used as an input to the model.
Thoughts - How would you go about it ?