Solved

Entity Extractor

Posted on 2008-06-12
7
409 Views
Last Modified: 2013-11-23
I need to extract entities from text. I wondered if there is a entity extraction library, vcl, api, etc that can be used and incorporated into a Delphi program? I am using Delphi 2006 VCL.
0
Comment
Question by:mcmahling
7 Comments
 
LVL 28

Expert Comment

by:2266180
ID: 21768987
depends on what entities you are talking about. there are designated, specialized "extractors" named parsers like html, xml, csv, etc which will parse out the data.

for generic data you have at least one option: regular expression. but still, you will have to write the regex in order to use it (hwich i snot that hard once you get familiar with it).
0
 

Author Comment

by:mcmahling
ID: 21769299
Thanks for your response. I would be extracting entities from .txt documents. Using regular expressions still would require me to have an ontology that must be the accessed by the regular expression to determine what words were entities. I don't readily see how to use regular expressions without that sort of semantic library. I was hoping something already existed that I would not have to do all that work.  
0
 
LVL 28

Expert Comment

by:2266180
ID: 21769378
you canot have a computer guess what you want to extract. you need to tell it somehow. like ti or not, that's the way things work. the computer doens'tknow that a text file contains text and words and numbers and whatever. for the computer all files are the same: a serie of 0's and 1s.

not even I know what you mean by "entities". how do you want a computer to know?

regular expressions are the most egneric way of extracting arbitrary data from something. if that doesn't help, nothing does.
0
ScreenConnect 6.0 Free Trial

Check out the updates in one game-changing release, ScreenConnect 6.0, based on partner feedback. New features include a redesigned UI that improves session organization and overall user experience. See the enhancements for yourself!

 
LVL 17

Expert Comment

by:TheRealLoki
ID: 21809345
can you explain more about these entities.
What is the source text? is it just random text, or is it in an exisiting structure, such as XML, CSV?
Are theses "entities" going to be consistent within your app? Or do you really need a solution that gives you an extreme amount of flexibility?
I've written commercial EDI engines, so let me know your full requirements
0
 

Author Comment

by:mcmahling
ID: 21811874
The entities are people, places, or things. The text is free unstructured text coming from news articles. I have heard of the open source product Annie but I wondered was their another product especially made for integration with Delphi.
0
 
LVL 37

Expert Comment

by:Geert Gruwez
ID: 21822106
It looks like you only need an AI to read the news articles and let the AI extract the entities ...
AI = probably human brain instead of Artificial Intelligence
0
 
LVL 3

Accepted Solution

by:
Mamouri earned 500 total points
ID: 21855941
Hi

This issue related to field of Data Mining and Text Extraction. Take a look to this library: http://www.dewresearch.com/data-miner.html
The component is a general Data Mining library and could be used for a variety of Data mining usage.
0

Featured Post

NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This article explains how to create forms/units independent of other forms/units object names in a delphi project. Have you ever created a form for user input in a Delphi project and then had the need to have that same form in a other Delphi proj…
Have you ever had your Delphi form/application just hanging while waiting for data to load? This is the article to read if you want to learn some things about adding threads for data loading in the background. First, I'll setup a general applica…
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question