Solved

How should I use Artificial Intelligence to sort out relevant statements from non-relevant ones?

Posted on 2016-09-05
4
140 Views
Last Modified: 2016-09-07
I'm trying to build a program to sort out a stream of statements into relevant and non-relevant statements with regards to a particular domain name. What algorithms and frameworks would be helpful?

I shall clarify further with an example.

 Let me pick a subject like economics. For a given group of sentences and phrases, I should be able to sort out each of those to determine whether they belong to the field of economics or otherwise. If I see something regarding cooking or the weather, I should put that in the irrelevant category, and if I see something with regards to profits and GDP, I should include that in the relevant category. I understand that I should have some sort of knowledge base for that particular domain ie. economics.

I need pointers to where I can start.
How do I go about collecting the domain data?
What basic process structure should the system have?
I'm planning to use Java for the implementation.

Tutorials would also be very much appreciated.
0
Comment
Question by:Cynthia Wasonga
  • 2
4 Comments
 
LVL 27

Expert Comment

by:dpearson
ID: 41787242
To solve that generally is a pretty hard problem.  You'd want to start with a natural language parser (to understand the English text) and then categorize its outputs.

However if you want a simpler short cut you could also look at WordNet (https://wordnet.princeton.edu/) which is a semantic network for words - which means given "profit" you can look up what "type" of word this is (or list of options) and see that it can be related to economics.

Might give you what you need without getting into a full natural language processor.

Doug
1
 

Author Comment

by:Cynthia Wasonga
ID: 41787283
Thanks dpearson,

I've found several tools to use for Natural Language processing at this site: https://opensource.com/business/15/7/five-open-source-nlp-tools

I'll look into those after doing some more research on NLP as a whole. I want to understand the details. Wordnet also looks good. Might come in handy when learning about NLP.

Again, thanks.
0
 
LVL 27

Accepted Solution

by:
dpearson earned 500 total points
ID: 41787288
Yes if you're game to really jump into the solutions those should all get you started (although Lucerne shouldn't really be on the list in my opinion).

If you're not familiar with NLP - basically when you run these tools they'll give you a parse graph - the logical structure of a sentence broken up into grammatical elements.  Then once you have identified the nouns (likely the most important for classifying the sentences according to relevant or not) you can look up the semantic meaning - either as part of the NLP processor (some may include this) or via an external tool like WordNet.

Have fun exploring - it's very relevant stuff to learn about in the modern world.

Doug
1

Featured Post

Active Directory Webinar

We all know we need to protect and secure our privileges, but where to start? Join Experts Exchange and ManageEngine on Tuesday, April 11, 2017 10:00 AM PDT to learn how to track and secure privileged users in Active Directory.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Today, still in the boom of Apple, PC's and products, nearly 50% of the computer users use Windows as graphical operating systems. If you are among those users who love windows, but are grappling to keep the system's hard drive optimized, then you s…
If you’re thinking to yourself “That description sounds a lot like two people doing the work that one could accomplish,” you’re not alone.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question