Solved

Open source search engine to analyse Enron e-mail corpus?

Posted on 2008-10-14
4
334 Views
Last Modified: 2013-12-09
I'm in the process of trying to develop a system to analyse the Enron e-mail corpus (downloadable here http://www.cs.cmu.edu/~enron/).

First of all I'd like to know about the available open source search engines suitable for doing this...and their pros and cons. I'd need to apply certain filters, to say, remove duplication of e-mail and search for certain types of e-mails too...so I imagine this affects the decision of search engine to use. If you could point me in the direction for any good articles for this I'd appreciate it.

I've heard a lot about Lucene, but I don't know if that's my best option for this particular task?
I'm trying to work out the best environment to do it in as well and I know Lucene is Java based, but may also have other versions available?

I did a little Java a few years ago, but nothing major and I'd need a complete refresh, but I am quite fond of C. The only problem is that whatever I develop would need to be object oriented rather than command line ideally.

Perhaps if I could fit whatever you suggest into some form of website I'd enjoy developing that the most. I've got some experience of HTML, ASP and SQL so if your solution fits that bill that would be great.


Thanks for whatever feedback you give.
0
Comment
Question by:G1ST
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
4 Comments
 
LVL 15

Accepted Solution

by:
ericpete earned 250 total points
ID: 22722936
Lucene is what EE uses, and it's pretty good.
0
 
LVL 5

Assisted Solution

by:wickedpassion
wickedpassion earned 250 total points
ID: 22760821
0

Featured Post

Secure Your WordPress Site: 5 Essential Approaches

WordPress is the web's most popular CMS, but its dominance also makes it a target for attackers. Our eBook will show you how to:

Prevent costly exploits of core and plugin vulnerabilities
Repel automated attacks
Lock down your dashboard, secure your code, and protect your users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
website rewamp 5 64
Error in Angular 2 when showing in grid 4 65
certificate error on website only in internal network 24 101
ad on top left of youtube video 3 26
Developer portfolios can be a bit of an enigma—how do you present yourself to employers without burying them in lines of code?  A modern portfolio is more than just work samples, it’s also a statement of how you work.
Because your company can’t afford for you to make SEO mistakes, you’ll want to ensure you’re taking the right steps each and every time you post a new piece of content. This list of optimization do’s and don’ts can help you become an SEO wizard.
Any person in technology especially those working for big companies should at least know about the basics of web accessibility. Believe it or not there are even laws in place that require businesses to provide such means for the disabled and aging p…
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question