Advertisement

10.14.2008 at 11:40AM PDT, ID: 23813974 | Points: 500
[x]
Attachment Details

Open source search engine to analyse Enron e-mail corpus?

Asked by G1ST in Search Engines

I'm in the process of trying to develop a system to analyse the Enron e-mail corpus (downloadable here http://www.cs.cmu.edu/~enron/).

First of all I'd like to know about the available open source search engines suitable for doing this...and their pros and cons. I'd need to apply certain filters, to say, remove duplication of e-mail and search for certain types of e-mails too...so I imagine this affects the decision of search engine to use. If you could point me in the direction for any good articles for this I'd appreciate it.

I've heard a lot about Lucene, but I don't know if that's my best option for this particular task?
I'm trying to work out the best environment to do it in as well and I know Lucene is Java based, but may also have other versions available?

I did a little Java a few years ago, but nothing major and I'd need a complete refresh, but I am quite fond of C. The only problem is that whatever I develop would need to be object oriented rather than command line ideally.

Perhaps if I could fit whatever you suggest into some form of website I'd enjoy developing that the most. I've got some experience of HTML, ASP and SQL so if your solution fits that bill that would be great.


Thanks for whatever feedback you give.Start Free Trial
 
 
[+][-]10.15.2008 at 09:28AM PDT, ID: 22722936

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]10.20.2008 at 11:40AM PDT, ID: 22760821

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
 
Loading Advertisement...
20080716-EE-VQP-32 / EE_QW_2_20070628