Apache Solr best performance - pull all data as stored or index only and pull data from DB

Hi All,

We are planning to utilize Apace Solr (sunspot) for our website built using ruby on rails. We already have Solr setup and working using the sunspot Solr gem for rails.

When we first implemented Solr our assumption was that all indexing and data would pull from Solr, but obviously that's not the case. Solr is intended to be an index server not a database. With that said, there is an option to store data in Solr so the data pulled will come from the Solr server and not the DB direct. I've read various articles that say its ok to store the data on Solr as well as index.

The main reasons we decided to use Solr were to improve search speed, take some load of the database and for the faceted search capability that Solr offers.

We are a bit confused as to what the best practice would be to achieve these goals, more specifically to gain performance.

Should we:

1. store all needed fields displayed in the search results in Solr?
2. Use a combination of both (some stored data from Solr and other data from the DB)?
3. Only use Solr for indexing and pull all data from the DB?

Some additional info. Our database of product for our web search has been denormalized. This DB is specifically for read only purposes on the website. Total rows a little over 380,000. We are on rails 4.2, Ruby 2.2. MySQL Version 5.5.41 so we are pretty current on the software side.
binovpdAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

gheistCommented:
SolR is for keeping indexes and returning references to data it finds, be it web links, rowids or whatever.
Readonly database is perfect for this purpose.
Current or not - what mysql tuning scripts tell about your database? It should be able to retrieve rows in no time in common scenarios.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
binovpdAuthor Commented:
Hi gheist,

So are you saying its best practice to only use Solr for indexing and not to store data in Solr, but instead pull data from the DB? I wasnt clear on what you meant.
gheistCommented:
SOLR inevitably stores some data in its search index, it is not the best memory cache out ther to store all data.
binovpdAuthor Commented:
Thank you for the reply. After researching this more you can use SOLR as a datastore but it depends on the application. SOLR works best if you have a denormalized database from which you are pulling the data. This is the case with our scenario. We update a "ready only table" for our products with most information that would need to be retrieved on our web application. We use store on SOLR and its much faster via SOLR direct than going via ActiveRecord. hitting the DB and pulling the information (Ruby on rails).
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
MySQL Server

From novice to tech pro — start learning today.