We have a 2008 R2 SQL Server with a number of minor databases on them. One database, called ProdData had a situation last night that I struggle to find the root cause (or even a theory).
On our production floor, we have about 120 stations running a VB App which communicates with the SQL Database to update production, downtime, etc.
Within the ProdData database, there is a table called downtime, which simply writes a single record everytime downtime is recorded on one of the production stations. It records the ID, Down Start, Down End, StationName, and the duration. There are probably no more than 250-300 appends to this table every 8 hours.
Around 10PM last night during a shift change, the stations were filing their production numbers. To do the final filing, they have to access 4 separate tables to assemble the data needed to file into our main system.
What we noticed was, when the stations attempted to collect the data from the downtime table, they were timing out and shutting down, thus not completing the filing process. When we linked into the network to troubleshoot, we found that we could not access the downtime table. All other tables responded as they usually do. The Downtime table is actually the smallest table, at only 190,000 records. We purged 70,000 records, and there was no immediate improvement.
5 hours later, everything is back to normal. Like nothing was ever wrong.
- All stations could continue to write to the downtime table without issue all night. We could not find a noticable pattern to suggest which station may have triggered the issue, or even what time the issue may have started.
- No stations could query the table in a reasonable amount of time, thus the timeouts.
- When we attempted to access the table, a select * would take 9 minutes
- On neither the workstations, nor the server did we observe:
- Any event log items both Windows or SQL
- No CPU, Memory or Network abnormalities. No spikes at all.
- Now reviewing the data in the downtime table, we can find no data abnormalities in the last 2 weeks of data. No special characters, missing fields, incomplete dates, nothing unexpected at all.
We have already made code changes to the VB App to accomodate a timeout and just omit downtime information if necessary, however the root cause of the issue is a complete and unexplainable mystery here.
So, anyone have any ideas?
It is the worst kind of problem to me that has no explaination or theory to fall back on to fix.