KB3002657 breaks everything!

Update: 2015-03-20 09:24 PST – Thank you Didi for informing us that Microsoft has released two updated hotfixes KB3002657-v2 and KB3033395-v2 that shouldn’t cause this problem. I have not had a chance to try these patches yet.

Update: 2015-03-12 09:45 PST – Now that I’ve had a chance to sleep I’ve updated this post to include some of the specific errors we saw and gather some additional information from around the web. Hopefully it will help others diagnose this problem.

 

We installed our Windows Updates this week and this little gem, KB3002657, came with them.

After this patch was installed on our Windows 2003 Domain Controllers and we rebooted them and all of the other servers in our organization we randomly started having authentication issues with certain services.

Outlook Anywhere and Mac’s running Outlook could no longer authenticate to Exchange 2010 but everyone could login to webmail. To add to this once I had patched and rebooted our Exchange 2010 servers Windows Outlook clients could no longer authenticate and no e-mail was being delivered it was just backing up in the queues. I dug into the Exchange Event Viewer and looked under ‘Security’ and found thousands of these errors:

An account failed to log on.

Subject:
	Security ID:		NULL SID
	Account Name:		-
	Account Domain:		-
	Logon ID:		0x0

Logon Type:			3

Account For Which Logon Failed:
	Security ID:		NULL SID
	Account Name:		<EXCHANGE CAS SERVER HOSTNAME>$
	Account Domain:		<OUR DOMAIN>

Failure Information:
	Failure Reason:		An Error occured during Logon.
	Status:			0xc000006d
	Sub Status:		0x0

Process Information:
	Caller Process ID:	0x0
	Caller Process Name:	-

Network Information:
	Workstation Name:	<EXCHANGE CAS SERVER HOSTNAME>
	Source Network Address:	<EXCHANGE CAS SERVER IP>
	Source Port:		54244

Detailed Authentication Information:
	Logon Process:		NtLmSsp 
	Authentication Package:	NTLM
	Transited Services:	-
	Package Name (NTLM only):	-
	Key Length:		0

This event is generated when a logon request fails. It is generated on the computer where access was attempted.

The Subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe.

The Logon Type field indicates the kind of logon that was requested. The most common types are 2 (interactive) and 3 (network).

The Process Information fields indicate which account and process on the system requested the logon.

The Network Information fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases.

The authentication information fields provide detailed information about this specific logon request.
	- Transited services indicate which intermediate services have participated in this logon request.
	- Package name indicates which sub-protocol was used among the NTLM protocols.
	- Key length indicates the length of the generated session key. This will be 0 if no session key was requested.

A departments NAS would no longer authenticate users. The couldn’t access any of their files. I didn’t have access to this device so I couldn’t review and logs.

A few of our websites that use ‘Integrated Authentication’ (Kerberos) wouldn’t authenticate users but if we switched the site to ‘Basic Authentication’ it would work fine. When ‘Integrated Authentication’ was enabled the IIS logs would show error “401.1” for every failed login.

Another department couldn’t login to a MSSQL 2005 Database using their AD accounts but applications that used MSSQL accounts worked fine. On our SQL server we saw the following errors:

Date		11/03/2015 7:55:26 AM
Log		SQL Server (Current - 12/03/2015 9:53:00 AM)

Source		Logon

Message
Login failed for user ''. The user is not associated with a trusted SQL Server connection. [CLIENT: XXX.XXX.XXX.XXX]


Date		11/03/2015 7:55:26 AM
Log		SQL Server (Current - 12/03/2015 9:53:00 AM)

Source		Logon

Message
Error: 18452, Severity: 14, State: 1.




Date		11/03/2015 7:55:26 AM
Log		SQL Server (Current - 12/03/2015 9:53:00 AM)

Source		Logon

Message
SSPI handshake failed with error code 0x8009030c while establishing a connection with integrated security; the connection has been closed. [CLIENT: XXX.XXX.XXX.XXX]

Date		11/03/2015 7:55:26 AM
Log		SQL Server (Current - 12/03/2015 9:53:00 AM)

Source		Logon

Message
Error: 17806, Severity: 20, State: 2.

These errors lead us to believe there was a trust issue going on but on. We checked our DCs and they were all happy and replicating BUT there was a red herring. We logged into one of our DCs and ran “nltest.exe /server:<DC HOSTNAME> /sc_verify:<DOMAIN FQDN>”.

All of our DCs returned the expected results:

C:\>nltest.exe /server:<DC HOSTNAME> /sc_verify:<DOMAIN FQDN>

Flags: b0 HAS_IP  HAS_TIMESERV
Trusted DC Name \\<FQDN OF A DC>
Trusted DC Connection Status Status = 0 0x0 NERR_Success
Trust Verification Status = 0 0x0 NERR_Success
The command completed successfully

Except one. That one DC returned this:

C:\>nltest.exe /server:<DC HOSTNAME> /sc_verify:<DOMAIN FQDN>

I_NetLogonControl failed: Status = 1355 0x54b ERROR_NO_SUCH_DOMAIN

That lead us down the wrong path of thinking one of our DCs had some how fallen off the domain even though users were authenticating against it successfully and replication was working properly.

After spending an few hours troubleshooting what we thought was a DC problem it all came down to this bloody hotfix.

Removing KB3002657 (and per Microsofts suggestion KB3046049) from our Domain Controllers and another reboot the problem was solved.

After some reflection I don’t think removing KB3046049 was necessary. I’d recommend starting with KB3002657 first.

As of this morning our users are reporting that everything is back to normal. Didn’t have to reboot my Exchange, SQL or Webservers. Once the DCs had the patch removed and were rebooted everything just magically started working again.

Finally some credit. Thank you to Eds for his post here and to those who responded: http://serverfault.com/questions/674541/has-march-2015-patch-tuesday-broken-2003-shares they helped me narrow down and solve the problem for our organization before Microsoft called me back and confirmed this was the solution.

Also give this a read: http://www.infoworld.com/article/2895900/security/microsoft-netlogon-patch-kb-3002657-woes-continue-kb-3032359-cisco-anyconnect-fix-confirmed.html

They suggest re-configuring local policies so that Windows LAN Manager Authentication is set to “Send LM & NTLM responses”. According to gpedit.msc on our Domain Controllers the default setting should be “Send LM & NTLM responses”. For some odd reason on ours we’ve manually changed it to “Send NTLM responses only”. I can’t comment if this would work or not for all of the above cases I came across or if you need to push this setting change out to all of your clients and/or servers.

29 thoughts on “KB3002657 breaks everything!”

  1. Everything in my company was broken too, SQL Server 2008 R2 lost trust with the domain.

    My apps with Integrated Security stop works, this was a total mess up.

    Im trying your solution, soon im going give you feedfack

    Reply
    • Chances are removing the KB from your DCs will solve the problem. It fixed our SQL issues. When we were troubleshooting the SQL problem the logs made it seem like the SQL server had lost trust with the domain. Strangely some other AD SQL accounts (SharePoint) were working fine. That being said no one could actually login to SharePoint thanks to IIS being affected instead.

      Reply
      • This indeed broke our document mgmt application between SQL and AD. Looked like trust issues, and did see NULL SID errors, but little else. Thought it may have had something to do with VMWare, but was not related. Removing the patch cleared it up immediately.

        Chris

        Reply
  2. This update broke outlook authentication, and access to our internal iis based intranet. After opening a case with MS, we had to remove this from our 2003 domain controllers. We only had to remove KB3002657.

    Thanks for the article, yesterday when this was going on, I could find little info about it.

    Todd

    Reply
    • Welcome. I didn’t clue in until I rebooted our Exchange servers and everything went south. I found a StackExchange article with someone else reporting a similar problem and their suggestion was to remove the KB. Then I finally got a hold of Microsoft and they confirmed KB3002657 was also the problem. The rep I spoke with said they were getting a ton of calls.

      Reply
  3. Bro, you made my day with this…

    I started to be crazy with my SQL Server 2008 server member of my domain 2003.

    Tried to reboot my controlers and the SQL server without success.

    I knew there were some update during the past night so i looked for this KBs online a found your ticket.

    Thanks a lot !!!!!!!!!!!!!!!!!!!!

    Reply
  4. any luck figuring out why it broke everything ? / how to fix without uninstalling ?

     

    we are experiencing same situation

     

    thx

     

     

    Reply
  5. Same situation in our environment with Windows server 2003 domain controlers.

    It’s also impossible to connect to our VMWare VCenter 5.0 using Active Directory authentification.

    I uninstalled the patch and I can confirm that a reboot will be required because the problem is not resolved at this moment. I will wait at the end of the work day to reboot the DCs.

    Reply
    • For us we only have to remove the patch from our DCs and reboot them. If you have redundant DCs and the impact is large enough you could remove it and reboot one DC at a time.

      Reply
  6. The same issue happens to out outlook/exchange in windows 2008.

    Remove the patch, email is fine.

    Reply
  7. Two different companies with W2003-based DCs, one with Ex2003, another using Ex2010, both experiencing the same symptoms. Spent almost all the day by trying to understand what has happened. Your article helped me to identify two suspicious hotfixes. Thx.

    BTW another problem area not covered in your article – it was not possible to connect via RDP to any domain member server if RDP-TCP Security Level was set to Negotiate or SSL (TLS 1.0). Temporary workaround was to choose “RDP Security Level”.

    Reply
  8. Experienced the same symptoms this morning on our SharePoint 2010 Server running on Windows Server 2008 R2. Simply uninstalled the patch from our 2003 DC’s, rebooted both systems just to be sure and everything was running smoothly Again.

    Thanks for your your article – you saved the day :)

    Reply
  9. Thank you for your blog.
    Because this patch MacOS could no longer mount Windows shares.

    Maybe it would be better soon a Samba 4 to be used as DC…

    Reply
  10. Eric/Todd and whomever else working on this resolution many thanks.  1.5 days trying to figure out why users could no longer RDP in our servers.  Even our network printers could not scan any longer.  Removing KB3002657 from our 2003 domain controllers resolved the issues!  Thanks again!

    Johnny

    Reply
  11. Thanks for this!  I’ve been screwing around with this problem off and on for a couple of days and so glad you posted this and I stumbled across it.

    Reply
  12. For us, there were no problems when authentication was purely within the 2003 domain; no problems within the 2008R2 domain; but cross-domain authentication, from 2008R2 to 2003, was causing inconsistent issues (RDP remote access, file shares, MSSQL); I think all were NTLM – not Kerberos.

    There were very few security logs which were relevant. An occasional EventID 537 “An error occurred during logon” or EventID 672 “An error occurred during logon”.

    After exhausting other avenues, we uninstalled KB3002657 from all domain controllers (and rebooted), both 2008R2 and 2003. The inconsistent issues all disappeared! Thanks for your confirmation of a problem with 3002657.

    Reply
  13. We found that we didn’t properly configure ServicePrincipalName settings for each DNS entry for our fileshare, which was causing authentication to fail over to NTLM instead of using Kerberos. Adding the correct SPNs via setspn.exe appears to have resolved the issue for us:

    #help
    setspn /?
    
    #list spn:
    setspn -L <AD object name>
    
    #add spn:
    setspn -S HOST/<DNS-name> <AD object name>
    setspn -S HOST/<DNS-name.with.full.dns.suffix> <AD object name>
    

    Expect to see a ServicePrincipalName listed twice for each DNS entry pointing to your host services; one using the netbios name and one with the FQDN.

    More reading on Kerberos and SPNs:

    http://blogs.technet.com/b/askds/archive/2011/08/09/kerberos-and-load-balancing.aspx

    Reply
  14. We also had big issues with our Websense filtering application which uses ISA 2006, all users were getting a login prompt which denied them access even after entering their credentials. We also lost access to file shares and our in house HR web system, plus a whole host of sql access problems. I checked and we were using Kereberos, I thought, but in some cases if this fails it defaults to ntlm, we removed the patch from our DC’s and it worked.

    Reply
  15. This manifested in breaking our connection from Cisco Unity to Exchange.  Thanks for the information, removing it from our domain controllers solved it!

    Reply
  16. Thank you. In the middle of a 2003 to 2012 migration, then all of the sudden everything breaks.  Spent countless of hours looking at the wrong thing, patch this morning everything is normal again.  Lesson learned, never patch in the middle of a migration :(

    Reply

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.