?
Solved

Complex: Data Mining some FT indexed text with Sql Server 2008

Posted on 2011-03-23
3
Medium Priority
?
241 Views
Last Modified: 2012-05-11
Hi Experts.  

I'm doing some predictive data mining.  What I want to predict is something called a ConditonRed.  These occurances are always preceded by a number of incidents and all kinds of data are available for attributes and my mining model development is coming along well.

I want to add some simple text mining into my model.  One of the data bits available to me is the raw text of what a service rep types up about an incident.  I have a theory that I can improve my predictions based on the emotional content of the notes people type about the incidents. SO:

Consider two tables.  The incident table with two columns: IncidentID and Notes. Notes is an FT Indexed column.    The emotive word table with one column: EmotiveWord.  There are about 1000 words in the table but just assume only Green, Red, Blue.

I want to generate an output table with columns IncidentID, w_Green, w_Red, w_Blue where each of the w_ columns are just BITs that indicate whether the Notes column for that IncidentID contains the word.  Such an exercise would give me access to a whole world of possibilities based on real textual analysis done outside of the model to be used as an input to the model.

Thoughts - How would you go about it ?



0
Comment
Question by:_Scotch_
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 2

Expert Comment

by:ericpeckham
ID: 35219651
There may be a better way to do it, but this will work:

   
;WITH GreenIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Green')
), RedIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Red')
), BlueIncidents (IncidentID) AS
(
	SELECT IncidentID FROM Incident
	WHERE CONTAINS(TextContent, 'Blue')
)
SELECT 
	IncidentID = COALESCE(g.IncidentID, r.IncidentID, b.IncidentID),
	w_Green = CONVERT(BIT, CASE WHEN g.IncidentID IS NOT NULL THEN 1 ELSE 0 END),
	w_Red = CONVERT(BIT, CASE WHEN r.IncidentID IS NOT NULL THEN 1 ELSE 0 END),
	w_Blue = CONVERT(BIT, CASE WHEN b.IncidentID IS NOT NULL THEN 1 ELSE 0 END)
FROM
	GreenIncidents g
	FULL JOIN RedIncidents r ON g.IncidentID = r.IncidentID
	FULL JOIN BlueIncidents b ON g.IncidentID = b.IncidentID

Open in new window

0
 
LVL 2

Accepted Solution

by:
ericpeckham earned 2000 total points
ID: 35219659
Oy, I forgot to replace my column name with yours - substitute Notes for TextContent.  :-)
0
 
LVL 3

Author Closing Comment

by:_Scotch_
ID: 35226937
Thanks I'll try that...
0

Featured Post

How Blockchain Is Impacting Every Industry

Blockchain expert Alex Tapscott talks to Acronis VP Frank Jablonski about this revolutionary technology and how it's making inroads into other industries and facets of everyday life.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
This article shows gives you an overview on SQL Server 2016 row level security. You will also get to know the usages of row-level-security and how it works
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Via a live example, show how to setup several different housekeeping processes for a SQL Server.
Suggested Courses

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question