Solved

Entity Extractor

Posted on 2008-06-12
7
412 Views
Last Modified: 2013-11-23
I need to extract entities from text. I wondered if there is a entity extraction library, vcl, api, etc that can be used and incorporated into a Delphi program? I am using Delphi 2006 VCL.
0
Comment
Question by:mcmahling
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 28

Expert Comment

by:2266180
ID: 21768987
depends on what entities you are talking about. there are designated, specialized "extractors" named parsers like html, xml, csv, etc which will parse out the data.

for generic data you have at least one option: regular expression. but still, you will have to write the regex in order to use it (hwich i snot that hard once you get familiar with it).
0
 

Author Comment

by:mcmahling
ID: 21769299
Thanks for your response. I would be extracting entities from .txt documents. Using regular expressions still would require me to have an ontology that must be the accessed by the regular expression to determine what words were entities. I don't readily see how to use regular expressions without that sort of semantic library. I was hoping something already existed that I would not have to do all that work.  
0
 
LVL 28

Expert Comment

by:2266180
ID: 21769378
you canot have a computer guess what you want to extract. you need to tell it somehow. like ti or not, that's the way things work. the computer doens'tknow that a text file contains text and words and numbers and whatever. for the computer all files are the same: a serie of 0's and 1s.

not even I know what you mean by "entities". how do you want a computer to know?

regular expressions are the most egneric way of extracting arbitrary data from something. if that doesn't help, nothing does.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 17

Expert Comment

by:TheRealLoki
ID: 21809345
can you explain more about these entities.
What is the source text? is it just random text, or is it in an exisiting structure, such as XML, CSV?
Are theses "entities" going to be consistent within your app? Or do you really need a solution that gives you an extreme amount of flexibility?
I've written commercial EDI engines, so let me know your full requirements
0
 

Author Comment

by:mcmahling
ID: 21811874
The entities are people, places, or things. The text is free unstructured text coming from news articles. I have heard of the open source product Annie but I wondered was their another product especially made for integration with Delphi.
0
 
LVL 37

Expert Comment

by:Geert Gruwez
ID: 21822106
It looks like you only need an AI to read the news articles and let the AI extract the entities ...
AI = probably human brain instead of Artificial Intelligence
0
 
LVL 3

Accepted Solution

by:
Mamouri earned 500 total points
ID: 21855941
Hi

This issue related to field of Data Mining and Text Extraction. Take a look to this library: http://www.dewresearch.com/data-miner.html
The component is a general Data Mining library and could be used for a variety of Data mining usage.
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The uses clause is one of those things that just tends to grow and grow. Most of the time this is in the main form, as it's from this form that all others are called. If you have a big application (including many forms), the uses clause in the in…
Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
This video Micro Tutorial shows how to password-protect PDF files with free software. Many software products can do this, such as Adobe Acrobat (but not Adobe Reader), Nuance PaperPort, and Nuance Power PDF, but they are not free products. This vide…
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…

690 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question