Link to home
Start Free TrialLog in
Avatar of Steve Marshall
Steve MarshallFlag for United Kingdom of Great Britain and Northern Ireland

asked on

SEHException (Randomly) On Connect To Access Database

Just a question to address a problem that has come up in a piece of software that has been live for 4.5 years without this issue, and the code of which has been unchanged since 10/07/2020.

All of a sudden we have started getting numerous SEHException (System.Runtime.InteropServices.SEHException) errors from various DataAdapters when attempting to retrieve data from Microsoft Access databases. The error code is always "(0x80004005): External component has thrown an exception at System.Data.OleDb.DataSourceWrapper.InitializeAndCreateSession".

As far as I can work out, the 0x80004005 equates to Microsoft's really helpful "Unspecified Error" which does not help a whole lot. We have started getting this error on various Databases, with various DataAdapters, against various Tables, with varying actions against those Tables. The error is not EVERY time, but is random throughout the day, and will appear in batches lasting a couple of minutes when we will get a burst of about 20 errors,

We have investigated environmental changes, and we cannot spot any change to the environment that might be causing this issue. No change to Antivirus, no change to the software (as mentioned), no changes to the databases. The only thing we did spot were a couple of .NET Framework updates a week or so ago, but searching the internet has shown to increase in people complaining about errors of this type after those updates

As I said, the software has been stable for 4.5 years, and the current version of the software was released on 10/07/2020, so has been running for 2 months unchanged with no issues whatsoever.

Does anyone have any ideas of what might be causing this issue and where it might have come from?

I know its not a lot to go on, but its all I have and I have run out of things to check. I am sure it must be something "environmental" as the software and its components are unchanged. The only other things that I can think of is a change to Microsoft Access itself (we are using Office 365, so updates supposedly go in quarterly supposedly - but last time Microsoft kindly broke Access - with the broken UPDATE query issue - that went into our server off schedule), or a change to the OleDb Driver.

I am stumped!

Steve
Avatar of AndyAinscow
AndyAinscow
Flag of Switzerland image

At a guess an update from Microsoft has added an extra feature (microsoft speak, normal speak - bug) to the consumer PC's.
How do you access your data?
I'm curios about the external component error, never had this one.
Avatar of Steve Marshall

ASKER

AndyAinscow: That was my first instinct, due to the fact that my software is unchanged. I appreciate that it would be out of the ordinary for Microsoft to ship untested software with an unexpected new feature ... ;-)

Ste5an: The surprising thing is nothing flash or complicated at all. No hand-rolled cleverness, just a plain old DataTable and DataAdapter with a Fill Method, defined in a DataSet from within Visual Studio using the designer in the IDE. Even the fill is nothing special being a simple select of a number of fields from a single table. In fact, the full Stack from the error message contains a rererence to only one line from my application, and that is the line within my data-service that calls the Fill method on the DataAdapter.

The error is always the same 0x80004005, and is always in method InitializeAndCreateSession. It seems to me like the DataAdapter is failing to even connect to the database - so certainly not anywhere in my code but in the automagically generated code somewhere. The connection strings have not changed, and we can find no trace to any update to the Database Engine drivers for Access having changed either (not to say that they haven't though knowing Microsoft's knack of sneaking things in as part of something else. We KNOW that a number of updates to the Frameworks have been installed, and we believe that an update to Office 365 (v2016) has been installed (as the appearance of the Ribbon in the Office Apps has changed slightly), though we can find no clear entry in the updates log that this has been done.

What concerns me more is that there is no big "kick off" on the internet, which there normally is when Microsoft release one of their new undocumented (unwanted) features - but this time we seem to be the only ones having an issue.

I just want to be sure that there is nothing obvious that we have missed before I start to try to code round the issue by rolling my own ADO classes to deal with the databases. Just really odd that a piece of software that has been 100% stable for 4.5 years, and that is completely unchanged at all in 2.5 months suddenly goes haywire like this.

Steve

Without looking into the .NET code, InitializeAndCreateSession sounds like COM+ interfacing code.
This would indicate a problem with the installation and the registry. On the otherhand, then you would have permanent issues.

So, I would test on a separate system to rule out hardware errors on the machine.
Below is an example of the details from a trapped exception, just to see if it displays anything to you that I cannot see. To me it is basically saying "cannot connect to Access Database", and nothing more. The only two lines that chance message to message are the last two lines, depending on which of my Data Services is called and what it is retrieving:

System.Runtime.InteropServices.SEHException (0x80004005): External component has thrown an exception. at System.Data.OleDb.DataSourceWrapper.InitializeAndCreateSession(OleDbConnectionString constr, SessionWrapper& sessionWrapper) at System.Data.OleDb.OleDbConnectionInternal..ctor(OleDbConnectionString constr, OleDbConnection connection) at System.Data.OleDb.OleDbConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningObject) at System.Data.ProviderBase.DbConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions) at System.Data.ProviderBase.DbConnectionFactory.CreateNonPooledConnection(DbConnection owningConnection, DbConnectionPoolGroup poolGroup, DbConnectionOptions userOptions) at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection) at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions) at System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions) at System.Data.ProviderBase.DbConnectionInternal.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory) at System.Data.OleDb.OleDbConnection.Open() at System.Data.Common.DbDataAdapter.FillInternal(DataSet dataset, DataTable[] datatables, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior) at System.Data.Common.DbDataAdapter.Fill(DataTable[] dataTables, Int32 startRecord, Int32 maxRecords, IDbCommand command, CommandBehavior behavior) at System.Data.Common.DbDataAdapter.Fill(DataTable dataTable) at GPDataCapture.ManufacturingJobDataSetTableAdapters.PlannedJobAdapter.FillByMachineCodeAndMachineJobId(PlannedJobDataTable dataTable, String MachineCode, Int32 MachineJobId) at GPDataCapture.MachineJobService.ReadByMachineJobId(Int32 machineJobId)
>>The only thing we did spot were a couple of .NET Framework updates a week or so ago

Could you de-install those and see  if the problem continues or goes away ?
Is a network router starting to fail ?
After some research:

- What bitness has your build? Is the correct version ACE installed?
- What Office version is it? Use the correct version in the connection string. E.g. Provider=Microsoft.ACE.OLEDB.16.0; for Office 16.
I can try to convince the SysAdmin to try the removal of the updates, but the big problem we have is that we are a Production Facility with virtually no down time unless we schedule in a weekend - so we will not get a time slot for a couple of weeks as they plan a few weeks in advance. Not an ideal IT environment really, but it is what it is.

As to the Network Router failing, we did in fact have a 3Com unit fail yesterday. Luckily we were already well through a process of upgrading the network to a 10Gb Ubiquiti infrastructure, so it was a simple 5 minute swap out and bring back on the Ubiquity replacement unit. There was a partial outage for about 10 minutes, which I think was a decent response! Now that about 95% of our network is based on Ubiquiti we have a much better ability to interrogate and monitor the network traffic which should help. We have already identified that the on-site CCTV was causing bottlenecks so additional units with fibre links have been installed to take that traffic away.

The crazy thing, and this just baffles me, is yesterday building up to the failure of the 3Com unit, we did not have a single failure all day. The first failure was at 22:11, several hours after the unit had been replaced. Only two errors at that time though, and then nothing throughout the rest of the night-shift until we got 6 errors in a tight batch at 08:10 this morning and then silence since. Hence my description being "random" errors. It just defies explanation. We are currently interrogating all of the logs to see if anything was happening elsewhere at those times.

After all of the work we have done on the network, it does seem that the errors are fewer and less frequent, but I don't know if that is just wishful thinking and me hoping that all the work we are putting in is making a difference!

I will keep monitoring the situation and when the new units have had time to bed in I will post again. We do have some work going in on the software in parallel to try to ease the situation (for that phrase read "code round"!!). We are stopping short of "rolling our own" DAL, but if we have to go that route to wrestle back more control then that is where we will go next.

Steve

Ste5an:

Thanks for the reply. We have the 32bit version of Office 365 (2016) installed on a 64bit server. This was on advice direct from Microsoft themselves when we initially went to 365 - I seem to remember the phrase being "the 64bit version does not work very well".

As to the installed Database Engine, well that is ACE.OLEDB.12.0, and is being used to connect to Access 2016 databases. We had spotted this inconsistency and it was raised with the SysAdmin, but the reply was that the same driver was installed prior to the issues starting, so he does not believe that is the source of the issues. He has concerns that installing the ACE.OLEDB.16.0 drivers may "cause issues elsewhere" and that currently "the issues are in one very specific area". He basically does not want to risk the issue base widening, which I sort of understand but it does not help us at all.

We are currently trying to find a definitive statement that you can (or cannot) have multiple ACE.OLEDB.x.0 drivers installed side by side without causing conflicts to try and change his point of view. This would mean that we could install it and then swap the offending application to using the 16.0 driver to see is the issues go away. Currently we have found nothing that is a "clear enough" statement that will change the SysAdmin's stance.

Steve
Well, this discovery (which should have been spotted pretty much immediately by the person searching!!) seems to seal our fate with the SysAdmin unfortunately:

https://www.microsoft.com/en-us/download/details.aspx?id=54920

The note under the "Install Instructions" sort of cuts us off at the knees:

Note: Before installing this download, you must first remove any previous versions of the Access Database Engine from your computer by using Add/Remove Programs in the Control Panel.

Therefore, ACE.OLEDB.12.0 must be removed before installing ACE.OLEDB.16.0, so hardly "no impact on previous versions". I have changed the search now to try to establish if the ACE.OLEDB.16.0 is completely backwards compatible so that we can achieve the "no impact" via that route instead.

Steve.
If you have installed Office, then you don't need to install ACE separately.
Thanks Ste5an. I will get a check done to see what version of the Database Engine is actually on the server then.

Steve

Okay - that did not take long to come back. What is actually installed on the server is:

Microsoft Access Database Engine 2010 (English)
Installed On: 19/09/2020
Version: 14.0.7015.1000

The install date is interesting as that shows it has been recently updated, which none of the logs show!! However, the update was AFTER the issues started, so it has clearly come in as part of something else.

Not sure what the version number means or what version(s) of ACE.OLEDB and therefore which connection strings will be supported. A bit more digging needed.

As we have Office 365 (2016) installed and with your comment I was expecting the Database Engine 2016 to be there, but that would be far too straight-forward for Microsoft.

I am going to bash up a quick test that allows me to specify the ACE.OLEDB version and build the connection string with that to see if I can get the connections updated to ACE.OLEDB.16.0 and they still work so that we can eliminate this wrinkle from the situation.

Steve
I strongly recommend that you and your SysAdmin (IT department) setup a test server using only Office 2016 and test the application to identify the reason and possible solution.
Literally just in the process of that as we speak!

Steve

Ste5an,

We just built a quick test app using a connection with ACE.OLEDB.16.0 to the most heavily loaded database table, and coded a tight loop retrieving records and logging. No issues. So, although that does not prove a fix for the SEHException, it does prove that we can connect using the 16.0 driver which is the one we should be using for Access 2016.

We are going to build a new version of our app using the 16.0 driver throughout and release a beta version to a test base of 2 out of 11 of our Production machines (the most heavily used, and one light use instance).

This will remove one more wrinkle from the environment. If the occasional errors from the two machines running the beta (both of which have reported errors already today) fall silent, then we will publish to all machines.

One of the developers has reported back saying that some of the Adapters in the application are set to "Optimistic" and others to "Pessimistic" and he is sure that all started life as "Pessimistic". I am not convinced that this has anything to do with the issue as it is the initial connection to the database that is failing and not the underlying queries. It is, though, an inconsistency that will need to be looked at.

I will report back. Thanks for the help and pointers so far.

Steve


You're welcome.

Just an additional thought:

As Andy pointed out to look into networking, when you use network access, then using the newer version can cover some network issue scenarios as MS added some resilience features on that behalf in the newer versions.

Thus your IT department and maybe you should consider monitoring the network in greater depth for some time.
Thanks Ste5an. I did reply to Andy in an earlier post saying that we did have a 3Com unit fail and that we had been through an (almost complete) exercise of replacing all hubs, routers, and the main back-plane of the network with Ubiquiti units. This allows us to monitor our entire network at a much finer level. We have already identified that the CCTV was causing problems and we installed additional units in order to separate that off and we have removed that issue now. We have identified a couple of other issues which we are also working to resolve.

We have every intention of monitoring the network closely, and in fact have been as the Ubiquiti infrastructure was rolled out. We have the software configured to collect statistics for a rolling 7 day period down to the individual port level, and it is this that has allowed us to identify the issues so far.

That said, the SEHException issue is still around (even though it "feels" like it is at a lesser level, that may just be wishful thinking as it was random at best anyway) even after all that so we are still looking.

Steve
Also, we have built and released a Beta Version of our software on ACE.OLEDB.16.0 and that is released to 2 of 11 of our Production Machines.

Steve

You mentioned a batch of failures all about the same time 08:10 and a couple at 22:11.
What times do your staff start/finish work where the problems arise?  Any relation to those times you noticed?  (I don't see why but could people logging on / not logging off have anything to do with it).
Andy - we have been monitoring the logs to try and see if anything correlates. Nothing so far.

The times do not match to shift start/end times, nor to office staff arriving/departing. We seem to get the bursts at various times - just little burst with a varying amount of reports, and then it just falls silent again. Generally the burst will last no more that a couple of minutes.

We did initially try to match up with people going into meetings in Microsoft Teams because that is one change that has recently come in due to the need for social distancing etc, so the amount of audio and video flying round our network has shot up from nothing to loads (all meetings were F2F before), but we could not even get that to tally up.

I am convinced it is something environmental somewhere - we just have to find it. We have found a couple of issues as I have said in earlier posts, and we have addressed them. We have literally just identified an old D-Link External CCTV camera which seems to generate more data on the network than others (thanks to the port monitoring that we now have from the Ubiquiti software), so that will get checked out and swapped out for a new Ubiquiti one if necessary.

Steve
With the clustering you experience I'm guessing network traffic/issue is the cause.  If I get any further ideas I'll post them for you to mull over them.
A long time ago one used to say, for Access, up to 5 simultaneous users was no problem, between 5 and 10 was in the critical area and more than 10 users simultaneously would overload Access (jet engine).  I know that Access went away from the Jet Engine some time ago but I believe the scaling problems still exist.  After all SQL Server is available both as a full blown paid for version and a light version for free for intermediate size organisations.

Could you have a problem with too many users?  Has the loading on Access increased in recent times?
Your use-case, as far as I understand it, has another limiting factor of higher precedence:

DR and business continuity. Backup and restore using Access database files is problematic, as such a file can go corrupt beyond repair more easily than a RDBMS solution using SQL Server or comparable products. E.g. it is prone to network errors when you need to write to the file.
Here RDBMS server based solutions or more resilient as they work differently. Also they offer a broader range of tools for backup and restore.

When Access can handle your current workload, then SQL Server Express can do this also.

And for your numbers: They only apply for old-school Access solutions using a single shared file (no split architecture).
Andy,

In reply to your first point: I agree 100%, which is why we have been monitoring the network traffic at the port level. We have identified and removed issues as they are spotted, and this may be why the issue feels like it is easing a little.

In reply to your second point: In fact the reverse is true. With the situation the way it is the number of users has in fact reduced. We are a relatively small company and do not even have enough employees to get near the "theoretical" limit of 255 connections to an Access database. I say "theoretical" as this is the documented limit, but after years of working with Access I do not believe a word of that claim!

Ste5an,

I fully understand your argument, and have been arguing the case for a migration to SQL Server for quite a while now. I have managed it in certain areas and we are having no issues with data that is in SQL. Unfortunately the company I work for is a bit like an oil tanker trying to turn. Initially it is very resistant to changing direction, but when it decides to go it wants to go immediately.

I believe that I have "proven" my arguments for migration away from Access completely (we have both data in Access Databases and also Access based applications) in that all the issues we ever get are either accessing data in Access Databases, or with the Access Applications breaking due to some feature released by Microsoft. A while ago Microsoft released an "update" to Access such that it completely broke UPDATE queries - they just did not work at all. We had a day's outage on the Access applications whist we updated every "UPDATE" statement to a Recordset creation and then using the .Edit ... .Update to make changes. The .NET applications and SQL Server data did not miss a beat. Even this fact that the company was almost crippled for a day has not made the management think "Hmmm, maybe there is a problem with Access".

I think their attitude is changing as these kinds of problem are getting more frequent as Microsoft take less and less care over an application that they are soon to wash their hands of.

I am not sure what I more I can do to convince them that a migration to .NET and SQL Server is what needs to happen, and sooner rather than later. It is a bit like banging my head against a brick wall at times, but it is what it is and I have to deal with it. I think our team is just "too good" at keeping the old decrepit systems working that it creates the illusion that all is well when it isn't. The powers that be seem to quickly forget the times they get burned for days on end, probably because it is the IT team that absorbs most of the stress/pressure/strain etc.

I hear you loud and clear, and I agree 100%, and have been arguing for this for the longest time.

Steve

BTW, I don't want to make the call too early, but the two machines that have been migrated to the new Beta (using the ACE.OLEDB.16.0 driver) have fallen silent, the other 9 machines have not. Things are looking promising.

Just a quick update. A lot quieter on the SEHException front, but it has not gone away. We are continuing to interrogate the network for issues. The occurrence of the errors remains sporadic, in batches, and for very short periods of time. They also now seem to be in very specific parts of the application rather than from all over the place. They are still always in InitializeAndCreateSession, and still the 0x80004005 - which is of no specific use.

We are also looking at the specific areas to see if there is anything that can be done, which is always a challenge when your software has been stable and functioning for 4.5 years and you are 99% sure that something external to your area of control has changed to cause the problem.

One thing that has crossed my mind is that maybe due to the way that DataAdapters work (and the way that we are using them), coupled with the was that the GC operates we are hitting the Access connection limit? But then I ask myself why that would have suddenly changed - the way the application works with the databases has not changed.

We always get the required adapter with a "Using dtoAdapter As <T>" where <T> is the DataSet/Adapter in the application. In the Using ... End Using we set the connection string, and then call the appropriate Fill method to retrieve the data into a DataTable. We do not explicitly close the connection in the Adapter, nor do we call .Dispose on it as the Using deals with this. My thought is that maybe the GC is not collecting the finished Adapters in a timely fashion and so the connections are remaining and not "going away" even though the application has finished with them, and maybe because of this we are hitting the limit and so connections are being rejected.

If this is the case I see two possible solutions:

  1. Explicitly call the .Connection.Close() just prior to the "End Using" to force the connection to be closed and (hopefully) release correctly. Still leaves us open to the GC, but hopefully as the connection has been explicitly closed it might help.
  2. Maintain a single connection to the database within the application and explicitly use that for each and every access to the database (set the .Connection property directly) rather that allowing the DataAdapter to create its own connection each time, and so greatly reduce the connections made.

With the first I am not sure that will necessarily help, and with the second I am not sure if trying to do everything with a single connection is opening us up to nastier issues.

Steve
 
If I understand you correctly, then you're not using using for IDisposable classes in all places. This would be a code flaw and should/must be corrected. E.g.
"Using dtoAdapter As <T>" where <T> is the DataSet/Adapter in the application. In the Using ... End Using we set the connection string,
This sounds like the using for the connection is missing.

Using a central factory method or service provider class for the connection makes sense for multiple reasons.
Ste5an:

The connection is a property of the created DataAdapter, and is opened automatically by the DataAdapter when the .Fill is called. The entire object is then (according to documentation) automatically disposed when the "End Using" is hit. An edited example from our code would be:

Using dtoTable As New CrewDataSet.CrewDataTable
    Using dtoAdapter As New CrewDataSetTableAdapters.CrewAdapter
        With dtoAdapter
            .Connection.ConnectionString = My.Configuration.CoreDBConnectionString
            .Fill(dtoTable)
        End With
    End Using

    If (dtoTable.Rows.Count = 0) Then
        Return New List(Of Crew)
    End If

    Return LoadListFromTable(dtoTable)
End Using

Open in new window


All of the above is wrapped in a Try Catch block that catches the SEHException or OleDbException and deals with them with logging, retries, caching etc etc.

Have we misunderstood the documentation/examples and we need to explicitly manage the Connection as well. I don't remember ever seeing such a thing in an example. Are you suggesting something along the following lines to explicitly close and dispose the connection?

Using dtoTable As New CrewDataSet.CrewDataTable
    Using dtoAdapter As New CrewDataSetTableAdapters.CrewAdapter
        With dtoAdapter
            .Connection.ConnectionString = My.Configuration.CoreDBConnectionString
            .Fill(dtoTable)
            .Connection.Close()
            .Connection.Dispose()
        End With
    End Using

    If (dtoTable.Rows.Count = 0) Then
        Return New List(Of Crew)
    End If

    Return LoadListFromTable(dtoTable)
End Using

Open in new window


Thanks,
Steve
An update. We had a 36 (well, 35 hours and 41 minutes!!) of silence from all machines w/ regard to the SEHExceptions - not a one. Then this morning at 07:50 a two minute burst of 9 messages and then at 08:46 a two minute burst of only two messages.

It has to be something environmental, we just have to try and find it now but the system is telling us nothing.

I have updated the software to explicitly close the connections associated with a DataAdapter now, but held off releasing it as everything fell silent Thursday afternoon (case of don't poke a sleeping beast!!).

This is really starting to do my crust in now!! I will release the updated version of the software later this morning to see if that makes a blind bit of difference, but I will not be holding my breath ... it is just totally random (and infuriating).

Steve
Gut feeling: Some updates or network peaks. These times sound like employees booting their machines.


btw, for switching to SQL Server: The time needed to investigate this problem would be already the budget worth to migrate.
Ste5an,

I agree on the updates and network peaks. Also, you know (and I know) that the migration to SQL-Server is well overdue and more than makes sense and is more than justified ... but try as I might to convince those above me of these facts they are not listening. I don't know why they have their heads in the sand, but they do.

I may just walk away if they continue not to listen to reason ... go work somewhere that actually want to use my skills and will take notice of me.

Steve
I guessed earlier it is a network problem and still feel that waay with all the info you supply.  However if it is a network problem then is updating to SQL Server going to make a difference?
Andy,

All the time we have been having these problems, the same piece of software has not reported a single issue when accessing data from SQL-Server (we have data in both). I have been arguing for migration to SQL for several years, but it generally falls on deaf ears. I have migrated some data into SQL and all new tables go into SQL. It's just convincing them to allow us to migrate the remainder into SQL. I think they are concerned due to the changes that need to be made to the older Access applications (add dbSeeChanges to recordset creations etc etc). None of the work is what a developer would call "significant", but it seems to scare them for some reason.

Anyhow, I have the updated version live now. I will see if that does anything to help, but I will keep pushing for a migration because now that they have had the pain of this problem for a few weeks they might be more inclined to listen. Quite why they have an "IT Team" when they don't listen to that team when it comes it IT issues is beyond me, but I am sure we are not the only company that is like that!

Steve

BTW - when it comes to it, how do I close and give credit to both you and Ste5an as I think you have both had inputs that I have implemented to move us forward with this.

I guessed earlier it is a network problem and still feel that waay with all the info you supply.  However if it is a network problem then is updating to SQL Server going to make a difference?
The SQL Server drivers are more resilient to network errors then Access file access is.
Thanks for clarifying Ste5an.
We have identified another issue. We found one PC that was swamping the network regularly and was the second highest I/O on the entire server. This made no sense so we investigated.

What we found was an "amateur" piece of programming which was an attempt by a user to get some "extra" data they wanted. Due to the lack of knowledge they were extracting ~20 years worth of data from several tables. To make things worse they were refreshing every 60 seconds and it was taking 30 seconds to refresh the data due to volume.

We have taken the app, refined the queries to extract only data that is relevant to the user and reduced the refresh rate right down to 10 minutes as the data does not need to be that accurate. The application now loads in the blink of an eye.

The user was someone who had extended privileges due to his role/position, but we have now reviewed and adjusted his access. Very much a case of "a little knowledge is (very) dangerous". Yet another argument to move to SQL Server!!

I am not saying that this was the cause of the issues, but it certainly was not going to help.

I really have a dislike for these tools that allow users to effectively "draw a picture of what they want" and it auto-generates queries. I consider them very dangerous if the person does not understand the data - which in the most part they don't.

Steve
At least it was only reading.  I've had users trying to modify entries in tables directly in the past.  Not good.  (I've also experienced someone not doing backups as recommended - found out when the disc crashed.)
Andy,

At least we don't have the backup issue - we run Terminal Services so everything is on the server. The users do not have the ability to do or save anything directly on the PC itself. They are effectively dumb terminals. Users, however, always find ever inventive ways of making the lives of IT hell!! You gotta love 'em!

I wonder if Access is stupid enough to lock when reading? Maybe out problem is locking preventing the connection, or something else related to that app. Maybe that was the cause of the issues ...

Slowly, slowly, things are getting better ...

Steve

>>I wonder if Access is stupid enough to lock when reading?

Not certain if this is still correct.  SQL will lock records for writing on a record level.  Access locks on a 'page' level - over multiple records.  Also there is pessimistic vs optimistic locking.  So it could be the case that a record isn't available for reading even if that specific record isn't being written to in access.
Andy,

Yes, the page locking by Access was at the back of my mind when I typed that, but as you say not sure if it is still the case that Access is that stupid.

Anyhow - we have already had our first 2 minute burst this morning, so that was not the root cause. The search continues!

So far we have:

  • Upgraded hubs and routers
  • Partitioned the network
  • Moved CCTV
  • Upgraded cameras
  • Reworked program logic
  • fixed some amateur programming

As Ste5an has said, the cost so far more than justifies a migration to SQL Server!!

Steve



What I find REALLY odd is that the bursts, though at seemingly random times, are always for about two minutes. I have just analysed the error logs and more accurately the bursts last approximately 75 seconds (based on the timestamp of the first report to the timestamp of the last report in the batch) give or take a second or two.

This cannot be coincidence, it is far too regular.

Steve
Do you have a virus installed somewhere?

I would highly doubt it. We have AV software installed, the definitions are always up to date, and we have real-time protection active also. We are also on a protected network with a single breakout point to the internet which is protected by a hardware firewall. We have never had a sniff of anything getting through. We also have the full Barracuda email protection in place to prevent anything getting in that way.

Steve

Okay, latest update. We had almost 5 days of silence, and then the flood gates opened. It appeared that we had 183 message reports yesterday. I say "appears" because that is the number of messages that were received. However, when we look into the logs, only 71 messages (still not good!!) were raised by the system and sent. Looking back over the received emails it would appear that the Office 365 mail delivery system is making the situation appear worse than it is by duplicating messages. Our system is generating a single email, and sometimes it is being delivered multiple times, sometimes rapidly, sometimes minutes apart. We know it is the same message as the timestamp in the message is identical - it is just being delivered multiple times for some reason. That is another issue that is for someone else to solve. Microsoft are doing their best to make this difficult to solve.

However, it would seem that there ARE other reports of this issue out there - I was just not finding them before. It seems that Microsoft have, on more than one occasion, released a patch for Windows, .NET Framework, or Office that has broken the Access Database Engine which is the OleDb Driver used by .Net to get to Access.

Now, it seems to be the accepted solution is generally when you get this problem to reinstall the Access Database Engine again to correct the issue created with the driver. I remember Ste5an mentioning something about this before.

The question I have is that I can only see the Microsoft Access Database Engine 2010 (English) - version 14.0.7015.1000 installed on our system, but at Ste5an's suggestion I am no longer using this version I am using 2016. I cannot, however, see any reference to the 2016 version being installed. Ste5an did say that if Office 2016 was installed then the 2016 OleDb Driver would be installed, and he was right on that point as I am using it!

Is there a way I can update the Driver as suggested in other posts ? Can I download and install the Access Database Engine 2016 and install? Are there likely to be any issues caused ? Is it possible that there are conflicts in having the two versions installed together ?

Steve

Here's the URL for 2016:

Microsoft Access Database Engine 2016 Redistributable

Just  download the correct version (32/64 bit). Installing may require running it from the command line using the /quiet (EDIT: not quite) parameter.

Are there likely to be any issues caused ?
I have not tested it, which is basically already the answer: test it on a test machine (VM). Cause Office installations can be special.

it possible that there are conflicts in having the two versions installed together ?
Have not tested it. As it's the engine and a separate driver probably. Considering production strategy by MS probably not. When there is a reason for having both, then test it before. When you don't need both do an uninstall of the old version first.
Thanks Ste5an.
One more area for me to investigate. I have been looking in detail at the errors from Monday as there was a large batch from which to draw conclusions. I would estimate that about 95%+ of the errors are coming from tasks that are being carried out in a background thread (BackgroundWorker) which would explain why the Production Staff are not running around complaining like crazy that the software is not working. Of that 95% I would estimate that about 75% of those are when the system is re-querying the Manufacturing Planning data to determine the "new" planning order of the jobs on the Machine. The system then compares "current" and "new" lists to indicate to the machine operators where the planning has changed.

Now, as this is in a BackgroundWorker, are you aware of anything that I should be aware of regards using and accessing an Access database? I am aware of cross-threaded accesses to UI Controls and I am not doing that, the refresh and display of any changes takes place on the main UI thread. The worker creates its own database connection and its own objects for its data. I am certainly not aware of any issues working with Access in a background thread, but this is a common theme through the messages that I have now only had the time to draw out from the messages.

Just a thought ... probably another blind alley of investigation, but I am checking anything now!

Steve

The most obvious thing I can think of is records being locked when written to (two threads writing simultaneously), the second is optimistic vs pessimistic locking with a thread starting an edit then blocking the record from being read for a 'long' time.
Andy,

I have error handling and retries w/ pauses to address this, and then if all retries fail it notifies us by email. I also have error caching - the system will send a burst of emails to "wake us up" to the issue, and then after a defined number of emails (10) or a specified period of time (15 mins) it will enter summary mode where after a defined period of time (every 45 mins) it will send us just a summary of the error and how many time it has occurred since first instance. The summary mode is to stop email swamping. If an entry in the cache falls silent (no reports for 2 hours) it will be removed and a final summary sent. The errors are unique down to the Exception Type, Module, Function, and Line Number.

The problem wit the SEHException is that the error is always on InitializeAndCreateSession when connecting - it does not even get connected to the Access Database, so it cannot be a locked record as it is not getting that far. If it were a locked record it would be reporting that as the issue.

Steve

Thanks.  You're more closely involved than I am, I'd forgotten the specifics of the problem.


One vague idea.  Access uses a 'locking' file (.ldb or .lccdb) when a connection exists and, in theory, removes it when the last connection is closed.  Could the creation / modification / deletion of this separate file be involved with your problem?
Andy,

I am not surprised you had forgotten ... it has been dragging on for a while. It does not help that the reported message is so generic "External Component has thrown an Exception".

Regards you suggestion, nothing would surprise me with Access anymore. It really is a piece of software that I wish businesses did not use as a cheap alternative to a proper solution. In my book it is cheap for a reason ...

One plus is that the longer this drags on, the more I seem to be being listened to with respect to a migration to SQL Server!!

Steve

[..] tasks that are being carried out in a background thread (BackgroundWorker) [..]
How clean is the involved code? No warnings, every IDisposable is used with a using clause, no dangling objects in GC, error propagation?

Especially error propagation is nasty, cause one single error at on place can lead to subsequent errors.
Thus certain background tasks are better encapsulated into its own process as clean-up is then much simpler. When possible I prefer console applications, which are invoked by task scheduler.
Ste5an, you ask how clean is the code?

Well, I guess it is as clean as two developers and a static code analysis tool implementing Microsoft Guidelines and StyleCop can make it. As far as we are aware (and the tool is aware), and the code has all been re-reviewed and checked during these problems, the code is clean. All items are either within a Using clause or explicitly released, including all connections, tables, queries, and adapters. We have been "tinkering around the edges" changing the way things are released, but in essence we have changed nothing. We have updated hardware, and updated drivers.

Regards error propagation, we never re-throw Exceptions, and we always handle and do not suppress (swallow) exceptions. Any exception is either handled and cleaned, or is reported after retries.

The only possible "dirtiness" that could be considered present by a purist is the fact that after the error the application continues, but each "action" is in its own function, so start/error/recovery/complete is self contained.

Again, from my initial post, this code has been working this way without any issue whatsoever for 4.5 years, and the software was completely unchanged for 6 weeks prior to the errors starting. The errors did start after Windows and Office updates were forced into our Server by the lovely Microsoft, and I still maintain that it is these that have changed/broken something that we are yet to find.

The prior stability of the software makes it hard, therefore, for me to accept that the issue is being caused by the way the code is doing anything, though I am still looking and open to suggestions. I will revisit the code again, and check for the items you raise just in case I have so far been blind to something.

There is clearly an issue that has come up since Microsoft released one of their unwanted features, but as far as we are aware the software is trapping and handling the "unrecoverable" SEHException in the best way it can by reporting and releasing all objects.

I will review it all as you suggest.

Steve
Ste5an,

I forgot to mention. We are currently building services (exactly as you say) to take some of the work away from the application. I still can't help bit think that this is just masking the fact that there is an issue in the environment elsewhere, but we are doing it anyway. Some tasks though must remain in the application as they occur based on the current running context, which a separate service would not have.

I am also pushing for the purchase of RedGate's ".Net Developer Bundle" that includes the ANTS Performance Profiler and the ANTS Memory Profiler, which will allow more in depth checking of the software for memory leaks which might show up something. This has to go for approval, so I will not be holding my breath.

Steve
Just a quick update. We have slowed down the refresh rate on an old Access based application, and the issue has seemingly gone away .... but it has gone away for a few days before. Our suspicions are now falling directly on that application because tracking back the issues started back in first lockdown when a couple of users were given direct access to it as they could not see the status screen that it feeds on site as they were at home. This never registered as a cause with me as it was not me that granted the access, and when asked what had changed in the environment this was never even mentioned.

We are now extending other applications to populate a table specific to this application to provide it with the data it needs in a way that is consistent and convenient with the applications generating that data, rather than it acting bully-boy and wading in and grabbing what it wants. This way it accesses only the one table and leaves everything else alone.

It is also at the top of the list for reworking, so that it is in .Net and can respond to notifications properly and refresh when necessary rather than polling for data.

Steve
ASKER CERTIFIED SOLUTION
Avatar of Steve Marshall
Steve Marshall
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial