?
Solved

Entity Extractor

Posted on 2008-06-12
7
Medium Priority
?
414 Views
Last Modified: 2013-11-23
I need to extract entities from text. I wondered if there is a entity extraction library, vcl, api, etc that can be used and incorporated into a Delphi program? I am using Delphi 2006 VCL.
0
Comment
Question by:mcmahling
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 28

Expert Comment

by:2266180
ID: 21768987
depends on what entities you are talking about. there are designated, specialized "extractors" named parsers like html, xml, csv, etc which will parse out the data.

for generic data you have at least one option: regular expression. but still, you will have to write the regex in order to use it (hwich i snot that hard once you get familiar with it).
0
 

Author Comment

by:mcmahling
ID: 21769299
Thanks for your response. I would be extracting entities from .txt documents. Using regular expressions still would require me to have an ontology that must be the accessed by the regular expression to determine what words were entities. I don't readily see how to use regular expressions without that sort of semantic library. I was hoping something already existed that I would not have to do all that work.  
0
 
LVL 28

Expert Comment

by:2266180
ID: 21769378
you canot have a computer guess what you want to extract. you need to tell it somehow. like ti or not, that's the way things work. the computer doens'tknow that a text file contains text and words and numbers and whatever. for the computer all files are the same: a serie of 0's and 1s.

not even I know what you mean by "entities". how do you want a computer to know?

regular expressions are the most egneric way of extracting arbitrary data from something. if that doesn't help, nothing does.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 17

Expert Comment

by:TheRealLoki
ID: 21809345
can you explain more about these entities.
What is the source text? is it just random text, or is it in an exisiting structure, such as XML, CSV?
Are theses "entities" going to be consistent within your app? Or do you really need a solution that gives you an extreme amount of flexibility?
I've written commercial EDI engines, so let me know your full requirements
0
 

Author Comment

by:mcmahling
ID: 21811874
The entities are people, places, or things. The text is free unstructured text coming from news articles. I have heard of the open source product Annie but I wondered was their another product especially made for integration with Delphi.
0
 
LVL 38

Expert Comment

by:Geert Gruwez
ID: 21822106
It looks like you only need an AI to read the news articles and let the AI extract the entities ...
AI = probably human brain instead of Artificial Intelligence
0
 
LVL 3

Accepted Solution

by:
Mamouri earned 1500 total points
ID: 21855941
Hi

This issue related to field of Data Mining and Text Extraction. Take a look to this library: http://www.dewresearch.com/data-miner.html
The component is a general Data Mining library and could be used for a variety of Data mining usage.
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
In my programming career I have only very rarely run into situations where operator overloading would be of any use in my work.  Normally those situations involved math with either overly large numbers (hundreds of thousands of digits or accuracy re…
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…
Add bar graphs to Access queries using Unicode block characters. Graphs appear on every record in the color you want. Give life to numbers. Hopes this gives you ideas on visualizing your data in new ways ~ Create a calculated field in a query: …
Suggested Courses
Course of the Month8 days, 9 hours left to enroll

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question