?
Solved

Open source search engine to analyse Enron e-mail corpus?

Posted on 2008-10-14
4
Medium Priority
?
355 Views
Last Modified: 2013-12-09
I'm in the process of trying to develop a system to analyse the Enron e-mail corpus (downloadable here http://www.cs.cmu.edu/~enron/).

First of all I'd like to know about the available open source search engines suitable for doing this...and their pros and cons. I'd need to apply certain filters, to say, remove duplication of e-mail and search for certain types of e-mails too...so I imagine this affects the decision of search engine to use. If you could point me in the direction for any good articles for this I'd appreciate it.

I've heard a lot about Lucene, but I don't know if that's my best option for this particular task?
I'm trying to work out the best environment to do it in as well and I know Lucene is Java based, but may also have other versions available?

I did a little Java a few years ago, but nothing major and I'd need a complete refresh, but I am quite fond of C. The only problem is that whatever I develop would need to be object oriented rather than command line ideally.

Perhaps if I could fit whatever you suggest into some form of website I'd enjoy developing that the most. I've got some experience of HTML, ASP and SQL so if your solution fits that bill that would be great.


Thanks for whatever feedback you give.
0
Comment
Question by:G1ST
2 Comments
 
LVL 15

Accepted Solution

by:
Eric AKA Netminder earned 1000 total points
ID: 22722936
Lucene is what EE uses, and it's pretty good.
0
 
LVL 5

Assisted Solution

by:wickedpassion
wickedpassion earned 1000 total points
ID: 22760821
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When crafting your “Why Us” page, there are a plethora of pitfalls to avoid. Follow these five tips, and you’ll be well on your way to creating an effective page.
The first step to building an amazing About page is to figure out what you want the page to say about your company. You then must grab the attention of the reader, boast a bit, tell a story and let others brag about you. With a little bit of thought…
The viewer will learn how to count occurrences of each item in an array.
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).
Suggested Courses
Course of the Month15 days, 6 hours left to enroll

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question