Solved

Entity Extractor

Posted on 2008-06-12
7
410 Views
Last Modified: 2013-11-23
I need to extract entities from text. I wondered if there is a entity extraction library, vcl, api, etc that can be used and incorporated into a Delphi program? I am using Delphi 2006 VCL.
0
Comment
Question by:mcmahling
7 Comments
 
LVL 28

Expert Comment

by:2266180
ID: 21768987
depends on what entities you are talking about. there are designated, specialized "extractors" named parsers like html, xml, csv, etc which will parse out the data.

for generic data you have at least one option: regular expression. but still, you will have to write the regex in order to use it (hwich i snot that hard once you get familiar with it).
0
 

Author Comment

by:mcmahling
ID: 21769299
Thanks for your response. I would be extracting entities from .txt documents. Using regular expressions still would require me to have an ontology that must be the accessed by the regular expression to determine what words were entities. I don't readily see how to use regular expressions without that sort of semantic library. I was hoping something already existed that I would not have to do all that work.  
0
 
LVL 28

Expert Comment

by:2266180
ID: 21769378
you canot have a computer guess what you want to extract. you need to tell it somehow. like ti or not, that's the way things work. the computer doens'tknow that a text file contains text and words and numbers and whatever. for the computer all files are the same: a serie of 0's and 1s.

not even I know what you mean by "entities". how do you want a computer to know?

regular expressions are the most egneric way of extracting arbitrary data from something. if that doesn't help, nothing does.
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 17

Expert Comment

by:TheRealLoki
ID: 21809345
can you explain more about these entities.
What is the source text? is it just random text, or is it in an exisiting structure, such as XML, CSV?
Are theses "entities" going to be consistent within your app? Or do you really need a solution that gives you an extreme amount of flexibility?
I've written commercial EDI engines, so let me know your full requirements
0
 

Author Comment

by:mcmahling
ID: 21811874
The entities are people, places, or things. The text is free unstructured text coming from news articles. I have heard of the open source product Annie but I wondered was their another product especially made for integration with Delphi.
0
 
LVL 37

Expert Comment

by:Geert Gruwez
ID: 21822106
It looks like you only need an AI to read the news articles and let the AI extract the entities ...
AI = probably human brain instead of Artificial Intelligence
0
 
LVL 3

Accepted Solution

by:
Mamouri earned 500 total points
ID: 21855941
Hi

This issue related to field of Data Mining and Text Extraction. Take a look to this library: http://www.dewresearch.com/data-miner.html
The component is a general Data Mining library and could be used for a variety of Data mining usage.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This article explains how to create forms/units independent of other forms/units object names in a delphi project. Have you ever created a form for user input in a Delphi project and then had the need to have that same form in a other Delphi proj…
Hello everybody This Article will show you how to validate number with TEdit control, What's the TEdit control? TEdit is a standard Windows edit control on a form, it allows to user to write, read and copy/paste single line of text. Usua…
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

808 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question