Text answers processing and categorisation

Hi all,

I'm due to program a natural-language categorisation engine soon and could really do with some pointers. The system basically involves people putting in free-text answers to questions and the system must successfully categorise the answers, and then rate them as positive / negative.


Q: What do you like about this website?
A1: I like the colours.
A2: Content Rocks!
A3: Nothing at all.

I need to categorise the answers into specific areas like layout / content - and tally positive and negative feedback.

This is a rather simplistic example, but if I can develop this then I can expand on the categories etc.

How could I implement this solution? Any ideas?

I've done natural language processing before (computational linguistics) and I've also *heard* about some Bayes theorem stuff, but sadly I wouldn't know where to start with the programming for these. A friend has also mentioned about a content management rating system for another website but I simply don't know where to start researching these issues.

I'm a mssql / c# / vb.net / vb6 developer with lots of experience so ideally I'd love some pointers in these technologies if possible. I'm not great at the maths end of things, so I'd prefer some practical examples of the ideas if possible.

Some source code or even a step-by-step description of a project implemented using these ideas would be absolutely fantastic.

I'm really depending on you for this one guys! Thanks...

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

In regards to "*heard* about some Bayes theorem stuff" you may want to take a look at www.spambayes.org.  It is a OpenSource Python implimentation of the Bayes theorem for classifying your email as either spam (bad) or ham (good).  If you were to impliment a similar method in your web application, you'd have to "train" it first.  

Hope this helps,
DaveyByrneAuthor Commented:
Thanks Jake,

Do you have any experience with the Bayes Theorem? I just need someone to explain it to me in layman's terms - and I reckoned the best way to do this was to look at someone else's implementation.

Sadly this isn't as easy as I thought.

Any chance you could explain it to me in plain english?


  Sorry I have no experience with the theorem itself, but I do know the spambayes implementation works excelent for determining spam.  Issues I can forsee is that while for email we have "ham" and "spam", while in your specification you have a variable number of categories.  I'll grind some more grey matter out tomorrow..


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
DaveyByrneAuthor Commented:
Thanks for the tip anyway Jake.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.