Exchange users unable to share calendars post AD/Exchange migration

We just recently went through an AD forest migration AND an Exchange 2010 -> 2016 migration across forests at the same time. Good times.

One of the many issues that came up after the migration was the majority of our users being unable to share their calender’s with other users.

When trying to share via manually editing the calendar permissions users would get the error “One or more users cannot be added to the folder access list. Non-local users cannot be given rights on this server.”

 

If users tried to go the invite route by right clicking their calendar, choosing ‘Share’ and ‘Share Calendar’ they would get “Calendar sharing is not available with the following entries because of permission settings on your network:”

 

If you took a look at our GAL you’d see all of the users you couldn’t share with had a circle with a line through their entry:

 

I ended up stumbling across a solution by accident when trying to fix this on my own account. It turned out my account was a ‘Shared’ mailbox and not a ‘User’ mailbox. I converted it with the below PowerShell and then my account started working again:

Set-Mailbox -Identity <USERNAME> -Type Regular -DomainController <DC FQDN>

This worked great for me but my situation was unique. Other users with the issue were already ‘User’ mailboxes. I took another problematic account and ran the above command on it and got this:

Set-Mailbox -Identity <USERNAME2> -Type Regular -DomainController <DC FQDN>
WARNING: Couldn't convert the mailbox because the mailbox "USERNAME2" is already of the type "Regular".

Despite that warning message this users mailbox was now fixed after the user closed/re-opened Outlook.

I re-ran the command against their mailbox and the output was this:

Set-Mailbox -Identity <USERNAME2> -Type Regular -DomainController <DC FQDN>

WARNING: Couldn't convert the mailbox because the mailbox "USERNAME2" is already of the type "Regular".
WARNING: The command completed successfully but no settings of '<DOMAIN FQDN>/User Accounts/Staff/USERNAME2' have been
 modified.

Why didn’t I get that second warning about not making any changes the first time I ran it? Simple. It’s because something was changed and Microsoft doesn’t think I need to know that.

Digging into the account attributes I figured out what changed. It’s called ‘msExchRecipientDisplayType’ and was introduced in Exchange 2007. This attribute determines what kind of recipient the mailbox is in the Address Book.

Pre-AD Migration msExchRecipientDisplayType was set to 1073741824 which is a “ACL able Mailbox User”.

Post-AD Migration msExchRecipientDisplayType was set to 0 which is a “Mailbox User”.

Makes sense now why you can’t apply permissions (ACL) on a “Mailbox User” when a “ACL able Mailbox User” user type exists.

We used Microsoft’s own tools (ADMT, Exchange 2016) to migrate our users from one forest and Exchange to another. Some where in that migration the attribute was wiped out and not transferred on 2941 out of 3123 mailboxes.

Here is how you can identity all users in your environment with this attribute set to “0”

Get-AdUser -Filter * -Properties Name,msExchRecipientDisplayType -Server <DC FQDN> | Where-Object { $_.msExchRecipientDisplayType -eq "0" } | Select Name,msExchRecipientDisplayType

Our environment is a mix of Shared, User, Resource and Equipment Mailboxes. There were affected accounts in all four categories. If we did a simple script that looked for “msExchRecipientDisplayType=0” and changed it to “1073741824” we might end up with the wrong value for a mailbox depending on what type it’s supposed to be. Based on my reading msExchRecipientDisplayType should be 1073741824 for Shared and User mailboxes, 7 for a Room Mailbox and 8 for a Equipment Mailbox.

We decided the best way to fix this was simply re-applying the user type that a mailbox already was. This made the PowerShell much simpler. Here’s what we ran:

Get-Recipient -Resultsize unlimited | where {$_.RecipientTypeDetails -eq "SharedMailbox"} |Set-Mailbox -Type Shared -DomainController <DC FQDN>
Get-Recipient -Resultsize unlimited | where {$_.RecipientTypeDetails -eq "UserMailbox"} |Set-Mailbox -Type Regular -DomainController <DC FQDN>
Get-Recipient -Resultsize unlimited | where {$_.RecipientTypeDetails -eq "RoomMailbox"} |Set-Mailbox -Type Room -DomainController <DC FQDN>
Get-Recipient -Resultsize unlimited | where {$_.RecipientTypeDetails -eq "EquipmentMailbox"} |Set-Mailbox -Type Equipment -DomainController <DC FQDN>

These commands together fixed 2890 of 2941 broken mailboxes.

This will generate a simple report of which mailboxes weren’t converted and what type they are:

$domainController = "<DC FQDN>"
$brokenUsers = Get-AdUser -Filter * -Properties Name,msExchRecipientDisplayType -Server $domainController | Where-Object { $_.msExchRecipientDisplayType -eq "0" }
$user = ""

foreach ($user in $brokenUsers) {

    Get-User $user.Name -DomainController $domainController | Group RecipientTypeDetails

}

Clear-Variable domainController
Clear-Variable brokenUsers
Clear-Variable user

I took one of the accounts that wasn’t fixed and ran this command:

Set-Mailbox -Identity <USERNAME> -Type Regular -DomainController <DC FQDN>

This corrected the account. No idea why the batch command didn’t. I ran this command for all of the regular mailboxes that didn’t fix in the batch and it worked fine. That left me with a bunch of shared mailboxes that were still broken and one user account that would not fix.

Running this on the shared mailboxes did not help:

Set-Mailbox -Identity <USERNAME> -Type Shared -DomainController <DC FQDN>

I checked one of the problematic accounts in ADSIEdit.msc and it had the correct msExchRecipientDisplayType value of 1073741824 despite PowerShell telling me it was set to 0.

Since there were only 23 accounts left that were problematic I used ADSIEdit to verify and fix the remainders.

We ran into a few users who still had problems using the e-mail invite method to share their calendar. This was fixed by having them clear their Outlook auto-complete via these steps:

  1. On the File tab, choose Options > Mail.
  2. Under Send messages, choose Empty Auto-Complete List
  3. Choose Yes to confirm you want to empty the list.
  4. Close/Re-open Outlook
  5. Try again

 

References

NetApp SnapManager for Exchange failing with VSS_E_WRITERERROR_RETRYABLE

Running into a problem with Exchange 2010, a FAS2240-4 and SnapManager for Exchange 7.1 where backups would randomly fail every now and then started failing consistently.

Our DFM server would send us an e-mail when the failure occurred that looked like this:

CLIENT APP ERROR Backup: SME Version 7.1: (111) on dora02 at Sun Sep 04 22:09:31 PDT 2016

The backup failure would also knock the databases offline and require us to re-sync them the next day.

Digging into the SME logs we found the following:

[22:10:42.635]  *****BACKUP DETAIL SUMMARY*****

 [22:10:42.635]  Backup group set #1: 

 [22:10:42.635]  Backup SG/DB [FK] Error: SnapManager detected the following Exchange writer error. Please retry SnapManager operation.
 VSS_E_WRITERERROR_RETRYABLE: The writer failed due to an error that might not occur if another snapshot copy is created.


 [22:10:42.635]  Backup SG/DB [LQ] Error: SnapManager detected the following Exchange writer error. Please retry SnapManager operation.
 VSS_E_WRITERERROR_RETRYABLE: The writer failed due to an error that might not occur if another snapshot copy is created.


 [22:10:42.635]  Backup SG/DB [RZ] Error: SnapManager detected the following Exchange writer error. Please retry SnapManager operation.
 VSS_E_WRITERERROR_RETRYABLE: The writer failed due to an error that might not occur if another snapshot copy is created.


 [22:10:42.635]  Backup SG/DB [AE] Error: SnapManager detected the following Exchange writer error. Please retry SnapManager operation.
 VSS_E_WRITERERROR_RETRYABLE: The writer failed due to an error that might not occur if another snapshot copy is created.


 [22:10:42.635]  Backup SG/DB [Public Folders (<SERVER>)] Error: SnapManager detected the following Exchange writer error. Please retry SnapManager operation.
 VSS_E_WRITERERROR_RETRYABLE: The writer failed due to an error that might not occur if another snapshot copy is created.




 [22:10:42.635]  ***SNAPMANAGER BACKUP JOB ENDED AT: [09-04-2016_22.10.42]

 [22:10:42.635]  Failed to backup storage groups/databases.

NetApp provides this page for what they call “Common VSS errors”: https://kb.netapp.com/support/index?page=content&id=1010785&locale=en_US

None of the suggestions there helped us.

In the end I found this forum post for a different product: https://community.emc.com/thread/168678?tstart=0 and applied the registry edits they suggested here: https://community.emc.com/message/705346#705346

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpWindowSize=256000
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\GlobalMaxTcpWindowSize=16777216
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveInterval=1000
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime=600000

and then rebooted our Exchange server running SME.

Since making the change roughly 20 days ago we haven’t had a single failed backup.

Exchange permission issues for a single user on a generic mailbox

We created a new generic mailbox in Exchange 2010. Created a group (Global) and added the users to it. We then created a Universal group, added the Global group to it and then added the Universal group to the mailbox with full mailbox permissions.

After the obligatory wait 24 hours for Exchange to update it’s permissions 4 of the 5 users in the Global group had access to the generic mailbox. The 5th user could not access the mailbox via their Outlook or Webmail. Permission denied errors.

We tried removing/re-adding them from the group with no success. We tried migrating their mailbox back and forth with no success.

Finally after some digging we thing we figured out the problem.

The user in question had originally been migrated from Exchange 2003 to 2010. Then at some point while on Exchange 2010 we had to purge/re-create their Mailbox.

The process of purging re-creating their mailbox changed their LegacyDN to the new, correct, format for Exchange 2010 and dumped their old Exchange 2003 LegacyDN.

To view a users legacy DN run the following Powershell command:

Get-MailboxStatistics -Identity <USERNAME>

# For Exchange 2007 or newer you'll see a LegacyDN like this:
LegacyDN : /O=<ORGANIZATION>/OU=EXCHANGE ADMINISTRATIVE GROUP (<RANDOM LETTERS/NUMBER>)/CN=RECIPIENTS/CN=<USERNAME>

# For Exchange 2003 or older you'll see a LegacyDN like this:
LegacyDN : /O=<ORGANIZATION>/OU=<DOMAIN>/CN=RECIPIENTS/CN=<USERNAME>

What we ended up doing is re-creating their LegacyDN from Exchange 2003 as a new X500 record and then they were instantly able to access the generic mailbox.

It could always be coincidence…..

References:

NetApp SnapManager for Exchange – Error Code: 0xc004146f

We use SnapManager for Exchange and NetApp Single Mailbox Recovery to backup our Exchange 2010 environment.

We’ve noticed that randomly our backups will fail with a verification error. The backups are still good and the failure usually happens 2-3 times in a row and then doesn’t happen again for a month.

NetApp has a KB article (login required) for this which basically tells you to verify that a set of applicable Exchange DLL and EXE files are all the same version on all of our Exchange servers that SME runs on.

We did this, they are and in our case we actually only involve a single server when it comes to SME so the versions on our other Exchange servers are irrelevant (I think).

Here is a snippet of what our log looks like:

[02:47:01.595]  ErrCheckLogs: failed with ERROR -1811

[02:47:01.595]  Operation terminated with error -1811 ChecUnknown, in 0.0 seconds.

[02:47:01.595]  WARNING: Database/log consistency checking returned with error code 0xFFFFF8ED.

[02:47:01.611]  Error Code: 0xc004146f
Transaction log verification failed.


[02:47:01.611]  Error Code: 0xc004146f
Transaction log verification failed.


[02:47:01.611]  Error Code: 0xc004146f
Transaction log verification failed.

[02:47:01.611]  Updating SnapInfo file at C:\Exchange\Logs\lun_Exchange_Logs_AE\sme_snapinfo\EXCH__MYSERVER\SG__AE\08-31-2015_22.00.12__Daily\SnapInfo__08-31-2015_22.00.12.sme with LogConsistentCheck=Failure

[02:47:02.126]  Verification failed. Error code: 0xc004146f, LUN: C:\Exchange\Databases\lun_Exchange_AE\
[02:47:02.126]  An autosupport message is sent on failure to the storage system of LUN [C:\Exchange\Databases\lun_Exchange_AE\].

[02:47:02.126]  ***LOG VERIFICATION STATUS: AE

[02:47:02.126]  Failed to verify physical integrity of the transaction logs.

[02:47:02.126]  ***LOG CONSISTENT VERIFICATION: Public Folders (MYSERVER)

[02:47:02.126]  Start to verify the logs in SnapInfo directory...

I’ve opened a case with NetApp and it appears they have an internal bug for this issue which you can learn nothing about here: http://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=132153

 

Update 2015-09-18

Heard back from NetApp and they recommend the following:

  1. Disable circular logging (this is enabled in our case)
  2. There are log files with bad generation numbers and you can fix the issue by running the SnapManager for Exchange backup without verification which will commit and truncate the transaction logs, removing the log files with the bad generation number then you can Re-run the job again with verification enabled.

 

Update 2015-09-22

After finally digging into the circular logging configuration on our Exchange system it turns out we only had it enabled for 2 of our 5 mail stores. I then went back through all of our failure logs and the only databases that ever generated a verification error were the 2 configured for circular logging.

I’ve disabled circular logging on both of those mail stores and we’ll see if that solves this problem.

KB3002657 breaks everything!

Update: 2015-03-20 09:24 PST – Thank you Didi for informing us that Microsoft has released two updated hotfixes KB3002657-v2 and KB3033395-v2 that shouldn’t cause this problem. I have not had a chance to try these patches yet.

Update: 2015-03-12 09:45 PST – Now that I’ve had a chance to sleep I’ve updated this post to include some of the specific errors we saw and gather some additional information from around the web. Hopefully it will help others diagnose this problem.

 

We installed our Windows Updates this week and this little gem, KB3002657, came with them.

After this patch was installed on our Windows 2003 Domain Controllers and we rebooted them and all of the other servers in our organization we randomly started having authentication issues with certain services.

Outlook Anywhere and Mac’s running Outlook could no longer authenticate to Exchange 2010 but everyone could login to webmail. To add to this once I had patched and rebooted our Exchange 2010 servers Windows Outlook clients could no longer authenticate and no e-mail was being delivered it was just backing up in the queues. I dug into the Exchange Event Viewer and looked under ‘Security’ and found thousands of these errors:

An account failed to log on.

Subject:
	Security ID:		NULL SID
	Account Name:		-
	Account Domain:		-
	Logon ID:		0x0

Logon Type:			3

Account For Which Logon Failed:
	Security ID:		NULL SID
	Account Name:		<EXCHANGE CAS SERVER HOSTNAME>$
	Account Domain:		<OUR DOMAIN>

Failure Information:
	Failure Reason:		An Error occured during Logon.
	Status:			0xc000006d
	Sub Status:		0x0

Process Information:
	Caller Process ID:	0x0
	Caller Process Name:	-

Network Information:
	Workstation Name:	<EXCHANGE CAS SERVER HOSTNAME>
	Source Network Address:	<EXCHANGE CAS SERVER IP>
	Source Port:		54244

Detailed Authentication Information:
	Logon Process:		NtLmSsp 
	Authentication Package:	NTLM
	Transited Services:	-
	Package Name (NTLM only):	-
	Key Length:		0

This event is generated when a logon request fails. It is generated on the computer where access was attempted.

The Subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe.

The Logon Type field indicates the kind of logon that was requested. The most common types are 2 (interactive) and 3 (network).

The Process Information fields indicate which account and process on the system requested the logon.

The Network Information fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases.

The authentication information fields provide detailed information about this specific logon request.
	- Transited services indicate which intermediate services have participated in this logon request.
	- Package name indicates which sub-protocol was used among the NTLM protocols.
	- Key length indicates the length of the generated session key. This will be 0 if no session key was requested.

A departments NAS would no longer authenticate users. The couldn’t access any of their files. I didn’t have access to this device so I couldn’t review and logs.

A few of our websites that use ‘Integrated Authentication’ (Kerberos) wouldn’t authenticate users but if we switched the site to ‘Basic Authentication’ it would work fine. When ‘Integrated Authentication’ was enabled the IIS logs would show error “401.1” for every failed login.

Another department couldn’t login to a MSSQL 2005 Database using their AD accounts but applications that used MSSQL accounts worked fine. On our SQL server we saw the following errors:

Date		11/03/2015 7:55:26 AM
Log		SQL Server (Current - 12/03/2015 9:53:00 AM)

Source		Logon

Message
Login failed for user ''. The user is not associated with a trusted SQL Server connection. [CLIENT: XXX.XXX.XXX.XXX]


Date		11/03/2015 7:55:26 AM
Log		SQL Server (Current - 12/03/2015 9:53:00 AM)

Source		Logon

Message
Error: 18452, Severity: 14, State: 1.




Date		11/03/2015 7:55:26 AM
Log		SQL Server (Current - 12/03/2015 9:53:00 AM)

Source		Logon

Message
SSPI handshake failed with error code 0x8009030c while establishing a connection with integrated security; the connection has been closed. [CLIENT: XXX.XXX.XXX.XXX]

Date		11/03/2015 7:55:26 AM
Log		SQL Server (Current - 12/03/2015 9:53:00 AM)

Source		Logon

Message
Error: 17806, Severity: 20, State: 2.

These errors lead us to believe there was a trust issue going on but on. We checked our DCs and they were all happy and replicating BUT there was a red herring. We logged into one of our DCs and ran “nltest.exe /server:<DC HOSTNAME> /sc_verify:<DOMAIN FQDN>”.

All of our DCs returned the expected results:

C:\>nltest.exe /server:<DC HOSTNAME> /sc_verify:<DOMAIN FQDN>

Flags: b0 HAS_IP  HAS_TIMESERV
Trusted DC Name \\<FQDN OF A DC>
Trusted DC Connection Status Status = 0 0x0 NERR_Success
Trust Verification Status = 0 0x0 NERR_Success
The command completed successfully

Except one. That one DC returned this:

C:\>nltest.exe /server:<DC HOSTNAME> /sc_verify:<DOMAIN FQDN>

I_NetLogonControl failed: Status = 1355 0x54b ERROR_NO_SUCH_DOMAIN

That lead us down the wrong path of thinking one of our DCs had some how fallen off the domain even though users were authenticating against it successfully and replication was working properly.

After spending an few hours troubleshooting what we thought was a DC problem it all came down to this bloody hotfix.

Removing KB3002657 (and per Microsofts suggestion KB3046049) from our Domain Controllers and another reboot the problem was solved.

After some reflection I don’t think removing KB3046049 was necessary. I’d recommend starting with KB3002657 first.

As of this morning our users are reporting that everything is back to normal. Didn’t have to reboot my Exchange, SQL or Webservers. Once the DCs had the patch removed and were rebooted everything just magically started working again.

Finally some credit. Thank you to Eds for his post here and to those who responded: http://serverfault.com/questions/674541/has-march-2015-patch-tuesday-broken-2003-shares they helped me narrow down and solve the problem for our organization before Microsoft called me back and confirmed this was the solution.

Also give this a read: http://www.infoworld.com/article/2895900/security/microsoft-netlogon-patch-kb-3002657-woes-continue-kb-3032359-cisco-anyconnect-fix-confirmed.html

They suggest re-configuring local policies so that Windows LAN Manager Authentication is set to “Send LM & NTLM responses”. According to gpedit.msc on our Domain Controllers the default setting should be “Send LM & NTLM responses”. For some odd reason on ours we’ve manually changed it to “Send NTLM responses only”. I can’t comment if this would work or not for all of the above cases I came across or if you need to push this setting change out to all of your clients and/or servers.