KB3002657 breaks everything!

Update: 2015-03-20 09:24 PST – Thank you Didi for informing us that Microsoft has released two updated hotfixes KB3002657-v2 and KB3033395-v2 that shouldn’t cause this problem. I have not had a chance to try these patches yet.

Update: 2015-03-12 09:45 PST – Now that I’ve had a chance to sleep I’ve updated this post to include some of the specific errors we saw and gather some additional information from around the web. Hopefully it will help others diagnose this problem.

 

We installed our Windows Updates this week and this little gem, KB3002657, came with them.

After this patch was installed on our Windows 2003 Domain Controllers and we rebooted them and all of the other servers in our organization we randomly started having authentication issues with certain services.

Outlook Anywhere and Mac’s running Outlook could no longer authenticate to Exchange 2010 but everyone could login to webmail. To add to this once I had patched and rebooted our Exchange 2010 servers Windows Outlook clients could no longer authenticate and no e-mail was being delivered it was just backing up in the queues. I dug into the Exchange Event Viewer and looked under ‘Security’ and found thousands of these errors:

An account failed to log on.

Subject:
	Security ID:		NULL SID
	Account Name:		-
	Account Domain:		-
	Logon ID:		0x0

Logon Type:			3

Account For Which Logon Failed:
	Security ID:		NULL SID
	Account Name:		<EXCHANGE CAS SERVER HOSTNAME>$
	Account Domain:		<OUR DOMAIN>

Failure Information:
	Failure Reason:		An Error occured during Logon.
	Status:			0xc000006d
	Sub Status:		0x0

Process Information:
	Caller Process ID:	0x0
	Caller Process Name:	-

Network Information:
	Workstation Name:	<EXCHANGE CAS SERVER HOSTNAME>
	Source Network Address:	<EXCHANGE CAS SERVER IP>
	Source Port:		54244

Detailed Authentication Information:
	Logon Process:		NtLmSsp 
	Authentication Package:	NTLM
	Transited Services:	-
	Package Name (NTLM only):	-
	Key Length:		0

This event is generated when a logon request fails. It is generated on the computer where access was attempted.

The Subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe.

The Logon Type field indicates the kind of logon that was requested. The most common types are 2 (interactive) and 3 (network).

The Process Information fields indicate which account and process on the system requested the logon.

The Network Information fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases.

The authentication information fields provide detailed information about this specific logon request.
	- Transited services indicate which intermediate services have participated in this logon request.
	- Package name indicates which sub-protocol was used among the NTLM protocols.
	- Key length indicates the length of the generated session key. This will be 0 if no session key was requested.

A departments NAS would no longer authenticate users. The couldn’t access any of their files. I didn’t have access to this device so I couldn’t review and logs.

A few of our websites that use ‘Integrated Authentication’ (Kerberos) wouldn’t authenticate users but if we switched the site to ‘Basic Authentication’ it would work fine. When ‘Integrated Authentication’ was enabled the IIS logs would show error “401.1” for every failed login.

Another department couldn’t login to a MSSQL 2005 Database using their AD accounts but applications that used MSSQL accounts worked fine. On our SQL server we saw the following errors:

Date		11/03/2015 7:55:26 AM
Log		SQL Server (Current - 12/03/2015 9:53:00 AM)

Source		Logon

Message
Login failed for user ''. The user is not associated with a trusted SQL Server connection. [CLIENT: XXX.XXX.XXX.XXX]


Date		11/03/2015 7:55:26 AM
Log		SQL Server (Current - 12/03/2015 9:53:00 AM)

Source		Logon

Message
Error: 18452, Severity: 14, State: 1.




Date		11/03/2015 7:55:26 AM
Log		SQL Server (Current - 12/03/2015 9:53:00 AM)

Source		Logon

Message
SSPI handshake failed with error code 0x8009030c while establishing a connection with integrated security; the connection has been closed. [CLIENT: XXX.XXX.XXX.XXX]

Date		11/03/2015 7:55:26 AM
Log		SQL Server (Current - 12/03/2015 9:53:00 AM)

Source		Logon

Message
Error: 17806, Severity: 20, State: 2.

These errors lead us to believe there was a trust issue going on but on. We checked our DCs and they were all happy and replicating BUT there was a red herring. We logged into one of our DCs and ran “nltest.exe /server:<DC HOSTNAME> /sc_verify:<DOMAIN FQDN>”.

All of our DCs returned the expected results:

C:\>nltest.exe /server:<DC HOSTNAME> /sc_verify:<DOMAIN FQDN>

Flags: b0 HAS_IP  HAS_TIMESERV
Trusted DC Name \<FQDN OF A DC>
Trusted DC Connection Status Status = 0 0x0 NERR_Success
Trust Verification Status = 0 0x0 NERR_Success
The command completed successfully

Except one. That one DC returned this:

C:\>nltest.exe /server:<DC HOSTNAME> /sc_verify:<DOMAIN FQDN>

I_NetLogonControl failed: Status = 1355 0x54b ERROR_NO_SUCH_DOMAIN

That lead us down the wrong path of thinking one of our DCs had some how fallen off the domain even though users were authenticating against it successfully and replication was working properly.

After spending an few hours troubleshooting what we thought was a DC problem it all came down to this bloody hotfix.

Removing KB3002657 (and per Microsofts suggestion KB3046049) from our Domain Controllers and another reboot the problem was solved.

After some reflection I don’t think removing KB3046049 was necessary. I’d recommend starting with KB3002657 first.

As of this morning our users are reporting that everything is back to normal. Didn’t have to reboot my Exchange, SQL or Webservers. Once the DCs had the patch removed and were rebooted everything just magically started working again.

Finally some credit. Thank you to Eds for his post here and to those who responded: http://serverfault.com/questions/674541/has-march-2015-patch-tuesday-broken-2003-shares they helped me narrow down and solve the problem for our organization before Microsoft called me back and confirmed this was the solution.

Also give this a read: http://www.infoworld.com/article/2895900/security/microsoft-netlogon-patch-kb-3002657-woes-continue-kb-3032359-cisco-anyconnect-fix-confirmed.html

They suggest re-configuring local policies so that Windows LAN Manager Authentication is set to “Send LM & NTLM responses”. According to gpedit.msc on our Domain Controllers the default setting should be “Send LM & NTLM responses”. For some odd reason on ours we’ve manually changed it to “Send NTLM responses only”. I can’t comment if this would work or not for all of the above cases I came across or if you need to push this setting change out to all of your clients and/or servers.

Workaround for 0x80248015 when trying to do Windows Updates on Server 2003

If you’re like us you probably noticed this patch Tuesday that a bunch, if not all, of your Windows 2003 servers are not able to load the Microsoft Update page. When trying to load the page the error 0x80248015 is displayed.

After some Googling I’ve pieced together a workaround that will get you some updates this patch Tuesday. It seems switching back to Windows Update does the trick.

To switch back to Windows Update you just need to do the following:

  1. Login to the server
  2. Run ‘services.msc’
  3. Stop the ‘Automatic Update’ service
  4. Delete C:\Windows\SoftwareDistrabution
  5. Start the ‘Automatic Update’ service
  6. Launch ‘Internet Explorer’ and click ‘Tools’ and ‘Windows Update’
  7. Use Windows Updates. DO NOT upgrade to Microsoft Update

OR if you’re like me and don’t want to have to do that for 50+ servers you can do it from a batch file:

net stop wuauserv
rd /s /q %SystemRoot%\SoftwareDistribution
net start wuauserv
%SystemRoot%\system32\wupdmgr.exe