Solved

Open source search engine to analyse Enron e-mail corpus?

Posted on 2008-10-14
4
330 Views
Last Modified: 2013-12-09
I'm in the process of trying to develop a system to analyse the Enron e-mail corpus (downloadable here http://www.cs.cmu.edu/~enron/).

First of all I'd like to know about the available open source search engines suitable for doing this...and their pros and cons. I'd need to apply certain filters, to say, remove duplication of e-mail and search for certain types of e-mails too...so I imagine this affects the decision of search engine to use. If you could point me in the direction for any good articles for this I'd appreciate it.

I've heard a lot about Lucene, but I don't know if that's my best option for this particular task?
I'm trying to work out the best environment to do it in as well and I know Lucene is Java based, but may also have other versions available?

I did a little Java a few years ago, but nothing major and I'd need a complete refresh, but I am quite fond of C. The only problem is that whatever I develop would need to be object oriented rather than command line ideally.

Perhaps if I could fit whatever you suggest into some form of website I'd enjoy developing that the most. I've got some experience of HTML, ASP and SQL so if your solution fits that bill that would be great.


Thanks for whatever feedback you give.
0
Comment
Question by:G1ST
4 Comments
 
LVL 15

Accepted Solution

by:
ericpete earned 250 total points
ID: 22722936
Lucene is what EE uses, and it's pretty good.
0
 
LVL 5

Assisted Solution

by:wickedpassion
wickedpassion earned 250 total points
ID: 22760821
0

Featured Post

How our DevOps Teams Maximize Uptime

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us. Read the use case whitepaper.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Learn by example how to specify CSS selectors for Selenium WebDriver test automation software.
There’s a good reason for why it’s called a homepage – it closely resembles that of a physical house and the only real difference is that it’s online. Your website’s homepage is where people come to visit you. It’s the family room of your website wh…
Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

735 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question