Link to home
Start Free TrialLog in
Avatar of glopezz
glopezzFlag for Mexico

asked on

Best DB for inverted index in .net

Hello,

I have the following task:

Build an inverted index of the company's intranet text documents to perform fast searches using a c#.net application.

I'm totally new to the inverted index concept so I wish to know which is the best database platform to create this inverted index and what would be a good method  to create it in c#.

Thanks a lot!
Avatar of Ralf Klatt
Ralf Klatt
Flag of Germany image

Hi,

Have you heard about "Natch" yet? Nutch is an open-source Web search engine that can be used at global, local, and even personal scale. Its initial design goal was to enable  a  transparent  alternative  for  global  Web  search  in  the public interest — one of its signature features is the ability to “explain” its result rankings. Recent work has emphasized how it can also be used for intranets; by local communities with richer data  models,  such  as  the  Creative  Commons  metadata-enabled search for licensed content; on a personal scale to index a user's files,  email,  and  web-surfing  history;  and  we  also  report  on several other research projects built on Nutch. In their paper at http://labs.commerce.net/wiki/images/0/06/CN-TR-04-04.pdf, they present how the architecture of the Nutch system enables it to be more flexible and scalable than other comparable systems today.

As you're asking for a C# solution it may interesting for you that it has even been re-implemented in several languages: C++, C#, Python, Perl and Ruby.


Best regards,
Raisor

Avatar of glopezz

ASKER

Thanks a lot Raisor,

I've read the PDF document and it's VERY interesting for my purposes. I understand it runs under Apache-Lucene.  However when accessing nutch.org I get redirected to the Apache Incubator page. I couldn't find any links to download or test it. The API Docs link is broken as well.

Any clue on where to download Nutch from?

Thanks a lot!

glopezz
Hi,

All downloads and information are available at http://sourceforge.net/projects/nutch


Best regards,
Raisor
Hi,

Sorry ... I just realized that they have removed all files at sourceforge ... I'll have another look somewhere else ... I'll let you know!


Best regards,
Raisor
Hi,

It seems that they re-structure everything ... also seems that it'll be even getting better/bigger ...

Here are some links for further info:

http://www.vb-development.de/nutch/nutch_news.pdf
http://osuosl.org/news_folder/nutch

... yeah finally ;-)) the downloads: http://nutch.sourceforge.net/release/

Haven't tested yet but should work now!


Best regards,
Raisor
ASKER CERTIFIED SOLUTION
Avatar of Ralf Klatt
Ralf Klatt
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of glopezz

ASKER

Hi Raisor, thanks for all the work in finding Nutch.

I just have a final question: In the sourceforge site it says Nutch is developed entirely in Java but no C#. I just wondered if you knew someting about the C# version.

If not I can maybe propose the company to use the java version instead of c#

thankS!!!
glopezz