Solved

Complex: Data Mining some FT indexed text with Sql Server 2008

Posted on 2011-03-23
3
235 Views
Last Modified: 2012-05-11
Hi Experts.  

I'm doing some predictive data mining.  What I want to predict is something called a ConditonRed.  These occurances are always preceded by a number of incidents and all kinds of data are available for attributes and my mining model development is coming along well.

I want to add some simple text mining into my model.  One of the data bits available to me is the raw text of what a service rep types up about an incident.  I have a theory that I can improve my predictions based on the emotional content of the notes people type about the incidents. SO:

Consider two tables.  The incident table with two columns: IncidentID and Notes. Notes is an FT Indexed column.    The emotive word table with one column: EmotiveWord.  There are about 1000 words in the table but just assume only Green, Red, Blue.

I want to generate an output table with columns IncidentID, w_Green, w_Red, w_Blue where each of the w_ columns are just BITs that indicate whether the Notes column for that IncidentID contains the word.  Such an exercise would give me access to a whole world of possibilities based on real textual analysis done outside of the model to be used as an input to the model.

Thoughts - How would you go about it ?



0
Comment
Question by:_Scotch_
  • 2
3 Comments
 
LVL 2

Expert Comment

by:ericpeckham
ID: 35219651
There may be a better way to do it, but this will work:

   
;WITH GreenIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Green')
), RedIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Red')
), BlueIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Blue')
)
SELECT 
	IncidentID = COALESCE(g.IncidentID, r.IncidentID, b.IncidentID),
	w_Green = CONVERT(BIT, CASE WHEN g.IncidentID IS NOT NULL THEN 1 ELSE 0 END),
	w_Red = CONVERT(BIT, CASE WHEN r.IncidentID IS NOT NULL THEN 1 ELSE 0 END),
	w_Blue = CONVERT(BIT, CASE WHEN b.IncidentID IS NOT NULL THEN 1 ELSE 0 END)
FROM
	GreenIncidents g
	FULL JOIN RedIncidents r ON g.IncidentID = r.IncidentID
	FULL JOIN BlueIncidents b ON g.IncidentID = b.IncidentID

Open in new window

0
 
LVL 2

Accepted Solution

by:
ericpeckham earned 500 total points
ID: 35219659
Oy, I forgot to replace my column name with yours - substitute Notes for TextContent.  :-)
0
 
LVL 3

Author Closing Comment

by:_Scotch_
ID: 35226937
Thanks I'll try that...
0

Featured Post

Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

Join & Write a Comment

If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now