Event ID 20292 from DHCP-Server

Checking over our DHCP server we were seeing quite a few of these errors appearing in the ‘Microsoft-Windows-DHCP Server Events/Admin’ event log:

Researching this error I came across this forum post: https://social.technet.microsoft.com/Forums/en-US/15d00412-3dfc-4520-a74e-1f32fe1329ef/windows-server-2012-dhcp-event-id-20291?forum=winserveripamdhcpdns

Which lead me to this KB article: https://support.microsoft.com/en-ca/help/2955135/event-id-20291-is-logged-in-the-system-log-when-a-client-computer-is-m

The hotfix that Microsoft mentions is from November 2014 and has been installed on our server for a very long time. We never noticed this error back in 2014 when the hotfix was installed so we were not able to “first remove the failover relationship, install the update to both DHCP nodes and restart them, and then reestablish the failover relationship” per Microsoft’s article.

The article leads me to believe you have to deconfigure failover on all subnets, destroy the failover relationship, re-create the failover relationship and then re-configure failover on each subnet.

Turns out you can just right click ‘Deconfigure failover’ and then right click ‘Configure failover’ on the specific subnets having the issue and re-use the existing failover relationship to resolve this issue assuming you’ve installed the November 2014 hotfix.

Microsoft RAS VPN and VXLAN not quite working

I’m not overly knowledgeable about advanced networking but I figured I’d share this since I couldn’t find anything online about it at the time.

We run a Microsoft Remote Access Server (RAS) for our VPN server. We provide L2TP primarily for users.

Due to a limitation in the Windows VPN client our RAS server has two network interfaces, one directly on the internet with a public IP (VLAN1) and one internally with a private IP (VLAN2).

The private IP relays VPN users DHCP/DNS requests to our internal DHCP and DNS servers.

RAS handles the authentication instead of RADIUS and we have our internal routes published via RIP to the RAS server so they can be provided to VPN clients when they connect.

I believe this is a fairly common design.

On the network end, our original design involved spanning VLAN1 and 2 all the way from our edge into our data center so the VM could pretty much sit directly on them. This worked fine.

As part of a major network redesign we performed we changed the VLAN spanning design over to using a VXLAN from our edge into our data center.

After making this change we ran into the strangest VPN issues. Users could connect and ping anything they wanted, do DNS lookups and browse most HTTP websites. HTTPS websites would partially load or fail to load and network share (SMB) access would partially work (you could get to the DFS root but not down to an actual file server).

After many hours of troubleshooting we determined our problem.

The MTU of most devices is configured to default to 1500 bytes. When we started tunneling the traffic through a VXLAN the tunneling added 52 bytes to the packet size making the total packet size 1552 bytes which is just over what most network cards are expecting. This caused large packets to drop (loading a HTTPS website, connecting to a share) but small packets (pings, some HTTP websites) to work fine.

I believe the final solution from our network team was to enable Jumbo packets from end to end of the VXLAN tunnel so it could transmit slightly larger than normal packets.

If you have any specific questions I can relay them to our Network Team and try to get you an answer. No promises :)

DHCP stops serving IPs when audit log is full

We run two DHCP servers in a HA configuration. The HA is configured to split the scopes in half. Depending on how high up the scope your IP is will determine which DHCP server you get your IP from. We have DHCP audit logging enabled.

DHCP1 handles 0-127 and DHCP2 handles 128-254 (we mostly use /24’s right now).

We started getting reports of random devices on the network not being able to connect or login to the domain. By the time a technician got to the PC to check it the issue was resolved magically.

We dug into the DHCP servers and found the DHCP audit log on DHCP1 was full (36MB in size). The log on DHCP2 was not full (yet, only 34MB in size).

Stopping DHCP on DHCP1, renaming the audit log and then starting DHCP on DHCP1 again appeared to resolve the issue.

The thing that had us scratching our heads is we’ve had this problem before and we had re-configured DHCP on these servers to allow the log files to grow to 250MB but things had stopped at 36MB.

We used this PowerShell to make the change a long while ago and restarted the DHCP service: https://docs.microsoft.com/en-us/powershell/module/dhcpserver/set-dhcpserverauditlog?view=win10-ps

Per the above link it states “-MaxMBFileSize Specifies the maximum size of the audit log, in megabytes (MB).”

It turns out this PowerShell command simply changes the registry value for HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DHCPServer\Parameters\DhcpLogFilesMaxSize which you can just do manually if you’d prefer.

I have no idea how I found it but after some digging I found this article for Server 2008 (we’re using 2012R2): https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/cc726869(v=ws.10)

It states:

 

Dynamic Host Configuration Protocol (DHCP) servers include several logging features and server parameters that provide enhanced auditing capabilities. You can specify the following features:

  • The file path in which the DHCP server stores audit log files. DHCP audit logs are located by default at %windir%\System32\Dhcp.
  • A maximum size restriction (in megabytes) for the total amount of disk space available for all audit log files created and stored by the DHCP service.
  • An interval for disk checking that is used to determine how many times the DHCP server writes audit log events to the log file before checking for available disk space on the server.
  • A minimum size requirement (in megabytes) for server disk space that is used during disk checking to determine if sufficient space exists for the server to continue audit logging.

 

I’ve bolded and italicized the relevant line. The article also specifically references the registry key the PowerShell command changes.

This leads me to believe the PowerShell documentation is incorrect and “-MaxMBFileSize” specifies the maximum size of all audit logs added together. Not a maximum size per individual audit log.

I checked the directory size of “%windir%\system32\dhcp” on both servers and they were very close to 250MB.

We’ve since made the following change:

I will update this article if this does not resolve the issue for us.

 

Update 2019-01-10: I can confirm this resolved the issue for us. The log file for the following day reached 54MB with no issue.