Microsoft RAS VPN and VXLAN not quite working

I’m not overly knowledgeable about advanced networking but I figured I’d share this since I couldn’t find anything online about it at the time.

We run a Microsoft Remote Access Server (RAS) for our VPN server. We provide L2TP primarily for users.

Due to a limitation in the Windows VPN client our RAS server has two network interfaces, one directly on the internet with a public IP (VLAN1) and one internally with a private IP (VLAN2).

The private IP relays VPN users DHCP/DNS requests to our internal DHCP and DNS servers.

RAS handles the authentication instead of RADIUS and we have our internal routes published via RIP to the RAS server so they can be provided to VPN clients when they connect.

I believe this is a fairly common design.

On the network end, our original design involved spanning VLAN1 and 2 all the way from our edge into our data center so the VM could pretty much sit directly on them. This worked fine.

As part of a major network redesign we performed we changed the VLAN spanning design over to using a VXLAN from our edge into our data center.

After making this change we ran into the strangest VPN issues. Users could connect and ping anything they wanted, do DNS lookups and browse most HTTP websites. HTTPS websites would partially load or fail to load and network share (SMB) access would partially work (you could get to the DFS root but not down to an actual file server).

After many hours of troubleshooting we determined our problem.

The MTU of most devices is configured to default to 1500 bytes. When we started tunneling the traffic through a VXLAN the tunneling added 52 bytes to the packet size making the total packet size 1552 bytes which is just over what most network cards are expecting. This caused large packets to drop (loading a HTTPS website, connecting to a share) but small packets (pings, some HTTP websites) to work fine.

I believe the final solution from our network team was to enable Jumbo packets from end to end of the VXLAN tunnel so it could transmit slightly larger than normal packets.

If you have any specific questions I can relay them to our Network Team and try to get you an answer. No promises :)

Some users cannot login to new NPS based VPN server

Our environment previous used a Windows 2003 Server running RAS to offer our employees VPN. This server went away for multiple reasons and we built a brand new 2012 R2 server running NPS and RAS.

Since switching over we’ve had a few employees unable to login to the new VPN server. They keep getting “Invalid Username/Password”. Strangely these users had access to a different account that would work from their personal device. This eliminated client side issues as being the culprit.

Checking the Event Logs on the VPN server we found this event:

Log Name:      System
Source:        RemoteAccess
Date:          8/23/2017 10:03:12 AM
Event ID:      20271
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      <SERVER FQDN>
Description:
CoId={NA}: The user <DOMAIN>\<USERNAME> connected from xxx.xxx.xxx.xxx but failed an authentication attempt due to the following reason: The remote connection was denied because the user name and password combination you provided is not recognized, or the selected authentication protocol is not permitted on the remote access server.

We had the user login to Webmail to verify their username and password. Everything was fine.

That led us into the text based logs. We found these:

10.x.x.x,<DOMAIN>\<USERNAME>,08/23/2017,09:58:49,RAS,<SERVERNAME>,44,1634,32,<SERVERNAME>,4,10.x.x.x,7,1,5,6,61,5,64,3,65,1,30,104.x.x.x,67,104.x.x.x,31,207.x.x.x,66,207.x.x.x,4108,10.x.x.x,4128,<SERVERNAME>,8132,2,4147,311,4148,MSRASV5.20,4160,MSRASV5.20,4159,MSRAS-0-DESKTOP-DG5CEKG,8158,{552971A4-83C8-4208-A1F7-54834C94498F},4154,Microsoft Routing and Remote Access Service Policy,4155,1,4129,<DOMAIN>\<USERNAME>,25,311 1 104.x.x.x 08/09/2017 14:23:47 1633,4127,4,4130,<DOMAIN FQDN>/User Accounts/<OU>/<OU>/<USERNAME>,4149,ORG VPN Access,8136,1,4136,1,4142,0

10.x.x.x,<DOMAIN>\<USERNAME>,08/23/2017,09:58:49,RAS,<SERVERNAME>,44,1634,25,311 1 104.x.x.x 08/09/2017 14:23:47 1633,8153,0,8111,0,6,2,4130,<DOMAIN FQDN>/User Accounts/<OU>/<OU>/<USERNAME>,4294967206,12,4294967207,2,4294967209,120,4294967210,50,28,3600,7,1,8136,1,4149,ORG VPN Access,4154,Microsoft Routing and Remote Access Service Policy,4155,1,4129,<DOMAIN>\<USERNAME>,4127,4,4120,0x00564955,4136,2,4142,0

10.x.x.x,<DOMAIN>\<USERNAME>,08/23/2017,09:58:49,RAS,<SERVERNAME>,32,<SERVERNAME>,4,10.x.x.x,6,2,7,1,5,6,61,5,64,3,65,1,30,104.x.x.x,67,104.x.x.x,31,207.x.x.x,66,207.x.x.x,25,311 1 104.x.x.x 08/09/2017 14:23:47 1633,44,1634,8,10.x.x.x2,12,1400,28,3600,50,2045,51,1,55,1503507529,45,3,40,1,4108,10.x.x.x,4128,<SERVERNAME>,4147,311,4148,MSRASV5.20,4160,MSRASV5.20,4159,MSRAS-0-DESKTOP-DG5CEKG,8158,{552971A4-83C8-4208-A1F7-54834C94498F},8132,2,4120,0x00564955,4294967206,4,4154,Microsoft Routing and Remote Access Service Policy,4155,1,4136,4,4142,0

10.x.x.x,<DOMAIN>\<USERNAME>,08/23/2017,09:58:51,RAS,<SERVERNAME>,32,<SERVERNAME>,4,10.x.x.x,6,2,7,1,5,6,61,5,64,3,65,1,30,104.x.x.x,67,104.x.x.x,31,207.x.x.x,66,207.x.x.x,25,311 1 104.x.x.x 08/09/2017 14:23:47 1633,44,1634,8,10.x.x.x2,12,1400,28,3600,50,2045,51,1,55,1503507529,45,3,46,0,43,1068,42,1185,48,19,47,26,49,1,40,2,4108,10.x.x.x,4128,<SERVERNAME>,4147,311,4148,MSRASV5.20,4160,MSRASV5.20,4159,MSRAS-0-DESKTOP-DG5CEKG,8158,{552971A4-83C8-4208-A1F7-54834C94498F},8132,2,4120,0x00564955,4294967206,4,4154,Microsoft Routing and Remote Access Service Policy,4155,1,4136,4,4142,0

The tip-off here was “Microsoft Routing and Remote Access Service Policy”. That was not the name of our VPN access policy. In fact that policy is located on a completely separate tab in NPS.

Turns out the issue was a AD account setting:

After some digging I found out that this AD attribute is called ‘msnpallowdialin’ and can have the following values:

Knowing this I wrote a quick PowerShell script to tell me how many accounts we had configured incorrectly:

$usersTrue = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $TRUE}
$usersFalse = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $FALSE}
$usersUnset = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $NULL}

Write-Output "TRUE = $($usersTrue.Count)"
Write-Output "FALSE = $($usersFalse.Count)"
Write-Output "BLANK = $($usersUnset.Count)"

Turns out we had 142 accounts that were incorrect and 1783 accounts that were. All of the accounts that were incorrect have been around a LONG time.

To change this property on all accounts that were set to TRUE or FALSE we used the following script:

$usersTrue = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $TRUE}

foreach ($user in $usersTrue) {

    Set-ADUser $user -clear msnpallowdialin -Server <DC FQDN>

}

$usersFalse = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $FALSE}

foreach ($user in $usersFalse) {

    Set-ADUser $user -clear msnpallowdialin -Server <DC FQDN>

}

$usersTrue = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $TRUE}
$usersFalse = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $FALSE}
$usersUnset = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $NULL}

Write-Output "TRUE = $($usersTrue.Count)"
Write-Output "FALSE = $($usersFalse.Count)"
Write-Output "BLANK = $($usersUnset.Count)"

I didn’t bother making variables of the repeating values. You can just search/replace these scripts. You need to change “OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN” to be the OU of where your users are and “<DC FQDN>” to the FQDN of one of your Domain Controllers.

DFS not working properly over VPN for personal computers

We recently switched to a new VPN server after Mac OS dropped support for PPTP and because we were way overdue to do it anyway. Since then personal computers were unable to access network shares via DFS.

They could go directly to the file server and that would work.

Users who connected to VPN with a organization owned laptops were able to use DFS.

After some digging it turned out the issue was our old VPN allowed for WINS (yes yes I know) and our new VPN has WINS disabled (by design, see… we’re trying)

The proper solution to this problem is to re-configure DFS to use DNS only: https://support.microsoft.com/en-us/help/244380/how-to-configure-dfs-to-use-fully-qualified-domain-names-in-referrals

Unfortunately I didn’t have the time to implement this.

What we ended up doing is re-configured the DHCP scope to set VPN users DNS Suffix to ‘vpn.mydomain.com’

I then added aliases for all of our file servers and DFS servers under ‘vpn.mydomain.com’. Example:

  • fileserver1.vpn.mydomain.com, CNAME, fileserver1.it.mydomain.com
  • dfsserver1.vpn.mydomain.com, CNAME, dfsserver1.it.mydomain.com

This is a crappy hacky work around that isn’t really sustainable but will work for now until we can sit down and plan changing our DFS over to use DNS only.

Domain Computers worked fine because we use group policy to push out multiple DNS search suffixes. DHCP doesn’t allow you to do this with Windows PCs so when they try to lookup ‘fileserver1’ they would try to hit WINS if implemented and then append their DNS suffix (vpn.mydomain.com) and then fail to find the file server resulting in a “Network Path Not Found” errors.