Update: 2015-03-20 09:24 PST – Thank you Didi for informing us that Microsoft has released two updated hotfixes KB3002657-v2 and KB3033395-v2 that shouldn’t cause this problem. I have not had a chance to try these patches yet.
Update: 2015-03-12 09:45 PST – Now that I’ve had a chance to sleep I’ve updated this post to include some of the specific errors we saw and gather some additional information from around the web. Hopefully it will help others diagnose this problem.
We installed our Windows Updates this week and this little gem, KB3002657, came with them.
After this patch was installed on our Windows 2003 Domain Controllers and we rebooted them and all of the other servers in our organization we randomly started having authentication issues with certain services.
Outlook Anywhere and Mac’s running Outlook could no longer authenticate to Exchange 2010 but everyone could login to webmail. To add to this once I had patched and rebooted our Exchange 2010 servers Windows Outlook clients could no longer authenticate and no e-mail was being delivered it was just backing up in the queues. I dug into the Exchange Event Viewer and looked under ‘Security’ and found thousands of these errors:
An account failed to log on. Subject: Security ID: NULL SID Account Name: - Account Domain: - Logon ID: 0x0 Logon Type: 3 Account For Which Logon Failed: Security ID: NULL SID Account Name: <EXCHANGE CAS SERVER HOSTNAME>$ Account Domain: <OUR DOMAIN> Failure Information: Failure Reason: An Error occured during Logon. Status: 0xc000006d Sub Status: 0x0 Process Information: Caller Process ID: 0x0 Caller Process Name: - Network Information: Workstation Name: <EXCHANGE CAS SERVER HOSTNAME> Source Network Address: <EXCHANGE CAS SERVER IP> Source Port: 54244 Detailed Authentication Information: Logon Process: NtLmSsp Authentication Package: NTLM Transited Services: - Package Name (NTLM only): - Key Length: 0 This event is generated when a logon request fails. It is generated on the computer where access was attempted. The Subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe. The Logon Type field indicates the kind of logon that was requested. The most common types are 2 (interactive) and 3 (network). The Process Information fields indicate which account and process on the system requested the logon. The Network Information fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases. The authentication information fields provide detailed information about this specific logon request. - Transited services indicate which intermediate services have participated in this logon request. - Package name indicates which sub-protocol was used among the NTLM protocols. - Key length indicates the length of the generated session key. This will be 0 if no session key was requested.
A departments NAS would no longer authenticate users. The couldn’t access any of their files. I didn’t have access to this device so I couldn’t review and logs.
A few of our websites that use ‘Integrated Authentication’ (Kerberos) wouldn’t authenticate users but if we switched the site to ‘Basic Authentication’ it would work fine. When ‘Integrated Authentication’ was enabled the IIS logs would show error “401.1” for every failed login.
Another department couldn’t login to a MSSQL 2005 Database using their AD accounts but applications that used MSSQL accounts worked fine. On our SQL server we saw the following errors:
Date 11/03/2015 7:55:26 AM Log SQL Server (Current - 12/03/2015 9:53:00 AM) Source Logon Message Login failed for user ''. The user is not associated with a trusted SQL Server connection. [CLIENT: XXX.XXX.XXX.XXX] Date 11/03/2015 7:55:26 AM Log SQL Server (Current - 12/03/2015 9:53:00 AM) Source Logon Message Error: 18452, Severity: 14, State: 1. Date 11/03/2015 7:55:26 AM Log SQL Server (Current - 12/03/2015 9:53:00 AM) Source Logon Message SSPI handshake failed with error code 0x8009030c while establishing a connection with integrated security; the connection has been closed. [CLIENT: XXX.XXX.XXX.XXX] Date 11/03/2015 7:55:26 AM Log SQL Server (Current - 12/03/2015 9:53:00 AM) Source Logon Message Error: 17806, Severity: 20, State: 2.
These errors lead us to believe there was a trust issue going on but on. We checked our DCs and they were all happy and replicating BUT there was a red herring. We logged into one of our DCs and ran “nltest.exe /server:<DC HOSTNAME> /sc_verify:<DOMAIN FQDN>”.
All of our DCs returned the expected results:
C:\>nltest.exe /server:<DC HOSTNAME> /sc_verify:<DOMAIN FQDN> Flags: b0 HAS_IP HAS_TIMESERV Trusted DC Name \<FQDN OF A DC> Trusted DC Connection Status Status = 0 0x0 NERR_Success Trust Verification Status = 0 0x0 NERR_Success The command completed successfully
Except one. That one DC returned this:
C:\>nltest.exe /server:<DC HOSTNAME> /sc_verify:<DOMAIN FQDN> I_NetLogonControl failed: Status = 1355 0x54b ERROR_NO_SUCH_DOMAIN
That lead us down the wrong path of thinking one of our DCs had some how fallen off the domain even though users were authenticating against it successfully and replication was working properly.
After spending an few hours troubleshooting what we thought was a DC problem it all came down to this bloody hotfix.
Removing KB3002657 (and per Microsofts suggestion KB3046049) from our Domain Controllers and another reboot the problem was solved.
After some reflection I don’t think removing KB3046049 was necessary. I’d recommend starting with KB3002657 first.
As of this morning our users are reporting that everything is back to normal. Didn’t have to reboot my Exchange, SQL or Webservers. Once the DCs had the patch removed and were rebooted everything just magically started working again.
Finally some credit. Thank you to Eds for his post here and to those who responded: http://serverfault.com/questions/674541/has-march-2015-patch-tuesday-broken-2003-shares they helped me narrow down and solve the problem for our organization before Microsoft called me back and confirmed this was the solution.
They suggest re-configuring local policies so that Windows LAN Manager Authentication is set to “Send LM & NTLM responses”. According to gpedit.msc on our Domain Controllers the default setting should be “Send LM & NTLM responses”. For some odd reason on ours we’ve manually changed it to “Send NTLM responses only”. I can’t comment if this would work or not for all of the above cases I came across or if you need to push this setting change out to all of your clients and/or servers.