It's ben a while now since we experience these sudden PLE drops on our production servers. Usually the value is normal, over 4K, but several times a day it suddenly drops and stays like that for minutes until grows back again. When these drops happen they are accompanied by spikes in the waiting time for that period of time and obviously by lag experience by users. The waiting time is caused by I/O high activity due to storage-memory transfer.
We know that we can rule out index missing or fragmentation, poor queries, we know how to deal with those problems. I understand that a high usage by our users at times can cause this but it doesn't seem to necessarily follow that pattern. It can happen while regular usage or even due to lunch period when normally there are less users active, or even at night.
One thing that we found on the net is that this may be a known SQL 2012 SP1 issue, which is supposed to be fixed by teh SP1 CU4 :
only that in our case it happens multiple times a day. We already scheduled to apply the upgrade to SP2 but I thought I should ask here as well maybe someone can give us a light in this matter.
Some pictures here:
Waiting time spike at 10min interval(sometimes the grow can be dramatic, to over 20K):