Solved

Open source search engine to analyse Enron e-mail corpus?

Posted on 2008-10-14
4
340 Views
Last Modified: 2013-12-09
I'm in the process of trying to develop a system to analyse the Enron e-mail corpus (downloadable here http://www.cs.cmu.edu/~enron/).

First of all I'd like to know about the available open source search engines suitable for doing this...and their pros and cons. I'd need to apply certain filters, to say, remove duplication of e-mail and search for certain types of e-mails too...so I imagine this affects the decision of search engine to use. If you could point me in the direction for any good articles for this I'd appreciate it.

I've heard a lot about Lucene, but I don't know if that's my best option for this particular task?
I'm trying to work out the best environment to do it in as well and I know Lucene is Java based, but may also have other versions available?

I did a little Java a few years ago, but nothing major and I'd need a complete refresh, but I am quite fond of C. The only problem is that whatever I develop would need to be object oriented rather than command line ideally.

Perhaps if I could fit whatever you suggest into some form of website I'd enjoy developing that the most. I've got some experience of HTML, ASP and SQL so if your solution fits that bill that would be great.


Thanks for whatever feedback you give.
0
Comment
Question by:G1ST
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
4 Comments
 
LVL 15

Accepted Solution

by:
ericpete earned 250 total points
ID: 22722936
Lucene is what EE uses, and it's pretty good.
0
 
LVL 5

Assisted Solution

by:wickedpassion
wickedpassion earned 250 total points
ID: 22760821
0

Featured Post

MS Dynamics Made Instantly Simpler

Make Your Microsoft Dynamics Investment Count  & Drastically Decrease Training Time by Providing Intuitive Step-By-Step WalkThru Tutorials.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

FAQ pages provide a simple way for you to supply and for customers to find answers to the most common questions about your company. Here are six reasons why your company website should have a FAQ page
Australian government abolished Visa 457 earlier this April and this article describes how this decision might affect Australian IT scene and IT experts.
This tutorial walks through the best practices in adding a local business to Google Maps including how to properly search for duplicates, marker placement, and inputing business details. Login to your Google Account, then search for "Google Mapmakerā€¦
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question