I have a 2 node SQL Server cluster running on Microsoft Cluster Service, this has been running without issue for at least a year, with at least one failover per month (Microsoft Updates). This last month however, when the cluster failed over from Node 1 to Node 2, Node 2 would not start the services. Forcing everything back to Node 1 brought the services back online. Investigation on Node 2 shows an event MSSQLSERVER - ID: 9003 "The log scan number passed to log scan in database 'master' is invalid."
Everything I have read points to a corrupted Master database, but if the database was corrupt, why does it run without issue on Node 1? Have not attempted to force a failover back to Node 2 as I don't know if it will in fact corrupt the master database causing it to not run on Node 1. I have good full backups of all databases including master, so I may just force a failover and deal with whatever happens, just confused as to why it would fine on Node 1 and not Node 2, both servers are identical.
Have also considered evicting Node 2 and re-adding, but don't really know what difference that would make.