Pre-caching of large databases

I often work with large datasets (over 100GB), and so caching is of course extremely important.

My question is about how MySQL (or other similar DBMSs) handle large data sets.  I assume most DBMSs use some caching mechanism such as LRU to reduce swapping pages from disk.  But I want to know if MySQL has any "pre-fetching" or "pre-caching" option for when a database file is initially opened.

A caching mechanism like LRU is effective when a lot of pages are already cached in memory.  But when you first open a file that isn't the case.  Only after a significant number of accesses would a lot of pages be cached in memory, so I would imagine that when you initially open a large database, each record lookup will almost always require a disk access (i.e. there will be cache-misses almost 100% of the time because the file was just opened so there are no pages in the cache.)

I was wondering how MySQL (or other similar DBMSs) handles this, if at all.  Is there an option to prefetch an even distribution of pages or something when the database is opened?
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

In mysqlsql this depends on the used storage-engine.

Might post that question in the innodb-forum


Here is a bit of discussion about this (PostgreSQL wise):

I would say this problem is a bit research-like.
Ok, pre-fetching or pre-caching but what?
If you have 100GB of data, how would the server at startup know which indexes, etc should load into memory? And even so, there is no more than 16 GB of memory available (or less). I agree that caching would make things work faster, but then you need to somehow know what to cache at startup.

A suggestion might be to remember all the select within the last 2-3 days
in an sql file and then run that when you start the db.
This way there will be some information in the cache and it might move
more quickly at the next queries.


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
chsalviaAuthor Commented:

That's a good point.  How would the server know which data to prefetch?  I suppose one strategy would be to use an LRU (Least Recently Used) based cache, and just write all the row numbers in the LRU cache to disk every time the database closes.  Then, when the database is reopened, read in all the row numbers saved to disk, and cache each of those rows.  This would basically save all the most recently used row numbers to disk, so that when you reopen the database, all commonly accessed rows are already cached.

Apparently, some DBMSs attempt to do row pre-fetching, but others don't.  I don't know what sort of pre-fetching strategy they use, or if it is any way similar to what I suggested above.  In the discussion you linked to, it seems that postgreSQL doesn't, but Oracle does.  I don' t know if MySQL does.  Regardless, I suppose that a custom solution could be implemented on top of a database like postgreSQL, even if the DBMS itself doesn't support row pre-caching.  But, what do you think about the above strategy I mentioned?


What you suggests, of course can be done.
Take the MySQL or PostgreSQL sources and see if you can implement it :)
I know that PostgreSQL has a query planner based on a genetic algorithm, the more queries it gets the better it becomes. I think there is no such thing in MySQL yet.

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.