Some users cannot login to new NPS based VPN server

Our environment previous used a Windows 2003 Server running RAS to offer our employees VPN. This server went away for multiple reasons and we built a brand new 2012 R2 server running NPS and RAS.

Since switching over we’ve had a few employees unable to login to the new VPN server. They keep getting “Invalid Username/Password”. Strangely these users had access to a different account that would work from their personal device. This eliminated client side issues as being the culprit.

Checking the Event Logs on the VPN server we found this event:

Log Name:      System
Source:        RemoteAccess
Date:          8/23/2017 10:03:12 AM
Event ID:      20271
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      <SERVER FQDN>
Description:
CoId={NA}: The user <DOMAIN>\<USERNAME> connected from xxx.xxx.xxx.xxx but failed an authentication attempt due to the following reason: The remote connection was denied because the user name and password combination you provided is not recognized, or the selected authentication protocol is not permitted on the remote access server.

We had the user login to Webmail to verify their username and password. Everything was fine.

That led us into the text based logs. We found these:

10.x.x.x,<DOMAIN>\<USERNAME>,08/23/2017,09:58:49,RAS,<SERVERNAME>,44,1634,32,<SERVERNAME>,4,10.x.x.x,7,1,5,6,61,5,64,3,65,1,30,104.x.x.x,67,104.x.x.x,31,207.x.x.x,66,207.x.x.x,4108,10.x.x.x,4128,<SERVERNAME>,8132,2,4147,311,4148,MSRASV5.20,4160,MSRASV5.20,4159,MSRAS-0-DESKTOP-DG5CEKG,8158,{552971A4-83C8-4208-A1F7-54834C94498F},4154,Microsoft Routing and Remote Access Service Policy,4155,1,4129,<DOMAIN>\<USERNAME>,25,311 1 104.x.x.x 08/09/2017 14:23:47 1633,4127,4,4130,<DOMAIN FQDN>/User Accounts/<OU>/<OU>/<USERNAME>,4149,ORG VPN Access,8136,1,4136,1,4142,0

10.x.x.x,<DOMAIN>\<USERNAME>,08/23/2017,09:58:49,RAS,<SERVERNAME>,44,1634,25,311 1 104.x.x.x 08/09/2017 14:23:47 1633,8153,0,8111,0,6,2,4130,<DOMAIN FQDN>/User Accounts/<OU>/<OU>/<USERNAME>,4294967206,12,4294967207,2,4294967209,120,4294967210,50,28,3600,7,1,8136,1,4149,ORG VPN Access,4154,Microsoft Routing and Remote Access Service Policy,4155,1,4129,<DOMAIN>\<USERNAME>,4127,4,4120,0x00564955,4136,2,4142,0

10.x.x.x,<DOMAIN>\<USERNAME>,08/23/2017,09:58:49,RAS,<SERVERNAME>,32,<SERVERNAME>,4,10.x.x.x,6,2,7,1,5,6,61,5,64,3,65,1,30,104.x.x.x,67,104.x.x.x,31,207.x.x.x,66,207.x.x.x,25,311 1 104.x.x.x 08/09/2017 14:23:47 1633,44,1634,8,10.x.x.x2,12,1400,28,3600,50,2045,51,1,55,1503507529,45,3,40,1,4108,10.x.x.x,4128,<SERVERNAME>,4147,311,4148,MSRASV5.20,4160,MSRASV5.20,4159,MSRAS-0-DESKTOP-DG5CEKG,8158,{552971A4-83C8-4208-A1F7-54834C94498F},8132,2,4120,0x00564955,4294967206,4,4154,Microsoft Routing and Remote Access Service Policy,4155,1,4136,4,4142,0

10.x.x.x,<DOMAIN>\<USERNAME>,08/23/2017,09:58:51,RAS,<SERVERNAME>,32,<SERVERNAME>,4,10.x.x.x,6,2,7,1,5,6,61,5,64,3,65,1,30,104.x.x.x,67,104.x.x.x,31,207.x.x.x,66,207.x.x.x,25,311 1 104.x.x.x 08/09/2017 14:23:47 1633,44,1634,8,10.x.x.x2,12,1400,28,3600,50,2045,51,1,55,1503507529,45,3,46,0,43,1068,42,1185,48,19,47,26,49,1,40,2,4108,10.x.x.x,4128,<SERVERNAME>,4147,311,4148,MSRASV5.20,4160,MSRASV5.20,4159,MSRAS-0-DESKTOP-DG5CEKG,8158,{552971A4-83C8-4208-A1F7-54834C94498F},8132,2,4120,0x00564955,4294967206,4,4154,Microsoft Routing and Remote Access Service Policy,4155,1,4136,4,4142,0

The tip-off here was “Microsoft Routing and Remote Access Service Policy”. That was not the name of our VPN access policy. In fact that policy is located on a completely separate tab in NPS.

Turns out the issue was a AD account setting:

After some digging I found out that this AD attribute is called ‘msnpallowdialin’ and can have the following values:

Knowing this I wrote a quick PowerShell script to tell me how many accounts we had configured incorrectly:

$usersTrue = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $TRUE}
$usersFalse = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $FALSE}
$usersUnset = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $NULL}

Write-Output "TRUE = $($usersTrue.Count)"
Write-Output "FALSE = $($usersFalse.Count)"
Write-Output "BLANK = $($usersUnset.Count)"

Turns out we had 142 accounts that were incorrect and 1783 accounts that were. All of the accounts that were incorrect have been around a LONG time.

To change this property on all accounts that were set to TRUE or FALSE we used the following script:

$usersTrue = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $TRUE}

foreach ($user in $usersTrue) {

    Set-ADUser $user -clear msnpallowdialin -Server <DC FQDN>

}

$usersFalse = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $FALSE}

foreach ($user in $usersFalse) {

    Set-ADUser $user -clear msnpallowdialin -Server <DC FQDN>

}

$usersTrue = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $TRUE}
$usersFalse = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $FALSE}
$usersUnset = Get-ADUser -SearchBase "OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN" -Filter * -Properties msnpallowdialin -Server <DC FQDN> |Where-Object {$_.msnpallowdialin -eq $NULL}

Write-Output "TRUE = $($usersTrue.Count)"
Write-Output "FALSE = $($usersFalse.Count)"
Write-Output "BLANK = $($usersUnset.Count)"

I didn’t bother making variables of the repeating values. You can just search/replace these scripts. You need to change “OU=<OU>,OU=<OU>,DC=DOMAIN,DC=FQDN” to be the OU of where your users are and “<DC FQDN>” to the FQDN of one of your Domain Controllers.

DFS not working properly over VPN for personal computers

We recently switched to a new VPN server after Mac OS dropped support for PPTP and because we were way overdue to do it anyway. Since then personal computers were unable to access network shares via DFS.

They could go directly to the file server and that would work.

Users who connected to VPN with a organization owned laptops were able to use DFS.

After some digging it turned out the issue was our old VPN allowed for WINS (yes yes I know) and our new VPN has WINS disabled (by design, see… we’re trying)

The proper solution to this problem is to re-configure DFS to use DNS only: https://support.microsoft.com/en-us/help/244380/how-to-configure-dfs-to-use-fully-qualified-domain-names-in-referrals

Unfortunately I didn’t have the time to implement this.

What we ended up doing is re-configured the DHCP scope to set VPN users DNS Suffix to ‘vpn.mydomain.com’

I then added aliases for all of our file servers and DFS servers under ‘vpn.mydomain.com’. Example:

  • fileserver1.vpn.mydomain.com, CNAME, fileserver1.it.mydomain.com
  • dfsserver1.vpn.mydomain.com, CNAME, dfsserver1.it.mydomain.com

This is a crappy hacky work around that isn’t really sustainable but will work for now until we can sit down and plan changing our DFS over to use DNS only.

Domain Computers worked fine because we use group policy to push out multiple DNS search suffixes. DHCP doesn’t allow you to do this with Windows PCs so when they try to lookup ‘fileserver1’ they would try to hit WINS if implemented and then append their DNS suffix (vpn.mydomain.com) and then fail to find the file server resulting in a “Network Path Not Found” errors.

Networking randomly dies on a 2012 R2 vSphere VM

Strange issue. Simple solution.

We had a Windows Server 2012 R2 Domain Controller sitting on vSphere 5.5 (2068190) which would randomly lose it’s network connection.

When you logged into the system locally the network interface appeared to be up but you could not connect to anything outside of the VM.

If I rebooted the VM it would work for a few hours or less and then the network would drop out again.

Digging through the event viewer I came across these:

Log Name:      System
Source:        Microsoft-Windows-Iphlpsvc
Date:          2/15/2016 7:01:51 PM
Event ID:      4202
Task Category: None
Level:         Error
Keywords:      
User:          SYSTEM
Computer:      MYSERVER.MYDOMAIN
Description:
Unable to update the IP address on Isatap interface isatap.{FBE3D830-A8CB-4C9C-809E-25DD9DB086F5}. Update Type: 0. Error Code: 0x57.


Log Name:      System
Source:        Microsoft-Windows-Iphlpsvc
Date:          2/15/2016 4:43:33 PM
Event ID:      4202
Task Category: None
Level:         Error
Keywords:      
User:          SYSTEM
Computer:      MYSERVER.MYDOMAIN
Description:
Unable to update the IP address on Isatap interface isatap.{FBE3D830-A8CB-4C9C-809E-25DD9DB086F5}. Update Type: 1. Error Code: 0x490.

The VM had a E1000 NIC attached to it. I figured the issue was the VM NIC model and got some backup to my theory from here: https://community.spiceworks.com/topic/504405-windows-server-2012-r2-guest-os-on-vmware-keeps-losing-gateway-connection

The solution appears to have been removing the E1000 NIC and adding either a E1000E NIC or in my case a VMXNET 3 NIC.

Palo Alto firewall displays “Session timed out” when you try to login

If you are getting this error message read this article first BEFORE you try to rebooting your firewall.

Screen Shot 2015-01-25 at 09.33.11

I ran into this problem recently with a Palo Alto PA-200 series firewall. I tried to login via the WebUI and would get the error “Session timed out”. I could SSH into the firewall and the internet was working. I’ve had this problem before and a quick reboot of my PA-200 solved the problem. Not so this time.

This time when I rebooted the firewall it did not come back up. Well not fully at least. I could PING and SSH it and load the WebUI (still getting the “Session timed out” error when trying to login) but all network traffic stopped flowing. This was bad.

I did some searching online and found that the issue can occur if you run low on disk space on your Palo Alto.

I logged into my PA-200 via SSH and ran the following:

[email protected]> show system disk-space 

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3             1.9G  1.9G  0M   100% /
/dev/sda5             6.6G  1.3G  5.1G  20% /opt/pancfg
/dev/sda6             1.9G  752M  1.1G  42% /opt/panrepo
tmpfs                 1.3G   67M  1.2G   6% /dev/shm
/dev/sda8             2.4G  1.4G  874M  62% /opt/panlogs

You can see the problem. The root (/) volume of the PA-200 was out of disk space.

I reviewed the following Palo Alto KB’s:

Neither of them helped. They didn’t clear up any disk space on the root volume.

To fix the problem I had to call Palo Alto support. They generated a support key which allows them and only them access to the root file system on a Palo Alto device. Form there they cleared out a few logs in /var/log that were eating up all of the disk space. The main problem was /var/log/secure. If you’re familiar with Linux systems you’ll know that log gains entries for all successful and failed login attempts. In the Palo Alto’s case it was all of the successful and failed logins via SSH.

Once Palo Alto cleared out those logs files we gave our PA-200 another reboot and it came back up as per normal.

Then came the part where we wanted to prevent this from happening again. I knew for a fact we had never created a rule that allowed access to the PA-200’s SSH service and there was no way someone internally was hammering the PA-200 trying to break into it. Fortunately I had a really good support rep on the phone that knew exactly where to look. Management Profiles.

If you login to your Palo Alto via the WebUI and go to ‘Network’ and ‘Interfaces’ you’ll see a column labelled ‘Management Profile’. In our case we had a management profile assigned to our public interface that allowed for SSH. This is how the internet in general was accessing our PA-200’s SSH service. That’s not the best part though. The best part is traffic that is allowed via a Management Profile isn’t logged so you can’t even tell this is happening by looking at your traffic logs. Awesome right?

We changed our management profile to only allow ICMP (pings) and called it a day.

I’ve been told Palo Alto is aware this is an issue but it only really affects the PA-200 since it has the smallest hard drive. Palo Alto isn’t making it a priority to fix it by implementing something as simple as logrotate or even truncating the log after 50mb is written to it.

If you have this exact problem I really hope you have you have an active Palo Alto support contact. If you don’t you’re screwed. Palo Alto is the only one who can access the root file system.

I’m hoping they will eventually fix this problem in future PanOS releases.

Access Supermicro BMC via SSH tunnels

I’ve got a server at home with a Supermicro motherboard that has a BMC in it. The BMC allows me to access a web interface on a dedicated network interface on the motherboard which will let me control the server in the event the OS has frozen or the hardware has powered down. This is extremely useful if something goes wrong at home while I’m out of town and need to power cycle my server remotely.

The problem I have is that my VPN server is a VM hosted on the server in question. So if the server is down I can’t VPN home and access the BMC. I could forward the ports through my firewall but there is a more secure way of doing things so why not?

I got my hands on a Raspberry Pi2 with the intent of connecting it up so I could remotely access it via SSH in the event my main server was offline at home. From the RPi2 I could then load up a browser and access the BMC interface on my server. One big problem though. While I could access the web interface and power cycle the server I could not access the Java based KVM that comes included with the BMC. The KVM lets you access the server as if you were physically in front of it with a keyboard and mouse AND connect media to the server remotely such as a ISO for some diagnostic software if needed. Unfortunately no matter how much I tried I could not get the Java WebApp to work on my RPi2.

Instead I opted to just use SSH tunnels to connect to the BMC via the RPi2. Again this worked great for the WebUI but failed when using the Java KVM. I did find a work around though and it’s pretty simple. When you’re on the Supermicro BMC page and you start the Java KVM you get a download for a “launch.jnlp” file. Save that to your local computer.

Open the launch.jnlp file in your favourite editor and you’ll see something like this:

<jnlp spec="1.0+" codebase="http://mybmc.mydomain.ca:80/">
  <information>
    <title>ATEN Java iKVM Viewer</title>
    <vendor>ATEN</vendor>
    <description>Java Web Start Application</description>
  </information>

  <security>
   <all-permissions/>
  </security>

  <resources>
    <property name="jnlp.packEnabled" value="true"/>
    <property name="jnlp.versionEnabled" value="true"/>
    <j2se version="1.6.0+" java-vm-args="-Xmx128M -Xms128M -Xss1M -XX:PermSize=32M -XX:MaxPermSize=32M"/>
    <jar href="iKVM__V1.69.21.0x0.jar" download="eager" main="true"/>
  </resources>

  <resources os="Windows" arch="x86">
    <nativelib href="libwin_x86__V1.0.5.jar" download="eager"/>
  </resources>
  <resources os="Windows" arch="x86_64">
    <nativelib href="libwin_x86_64__V1.0.5.jar" download="eager"/>
  </resources>
  <resources os="Windows" arch="amd64">
    <nativelib href="libwin_x86_64__V1.0.5.jar" download="eager"/>
  </resources>

  <resources os="Linux" arch="i386">
    <nativelib href="liblinux_x86__V1.0.5.jar" download="eager"/>
  </resources>
  <resources os="Linux" arch="x86">
    <nativelib href="liblinux_x86__V1.0.5.jar" download="eager"/>
  </resources>
  <resources os="Linux" arch="x86_64">
    <nativelib href="liblinux_x86_64__V1.0.5.jar" download="eager"/>
  </resources>
  <resources os="Linux" arch="amd64">
    <nativelib href="liblinux_x86_64__V1.0.5.jar" download="eager"/>
  </resources>

  <resources os="Mac OS X" arch="x86_64">
    <nativelib href="libmac_x86_64__V1.0.5.jar" download="eager"/>
  </resources>

  <resources os="SunOS" arch="sparc">
    <nativelib href="libsun_SPARC__V1.0.5.jar" download="eager"/>
  </resources>

  <application-desc main-class="tw.com.aten.ikvm.KVMMain">
    <argument>mybmc.mydomain.ca</argument>
    <argument>xxxxxxxxxxxxxxxx</argument>
    <argument>xxxxxxxxxxxxxxxx</argument>
	<argument>mybmc.mydomain.ca</argument>
    <argument>5900</argument>
    <argument>623</argument>
    <argument>2</argument>
    <argument>0</argument>
  </application-desc>
</jnlp>

You want to edit a few lines:

Line 1
From: <jnlp spec="1.0+" codebase="http://mybmc.mydomain.ca:80/">
To: <jnlp spec="1.0+" codebase="http://localhost:5901/">

Line 51
From: <argument>mybmc.mydomain.ca</argument>
To: <argument>localhost</argument>

Line 54
From: <argument>mybmc.mydomain.ca</argument>
To: <argument>localhost</argument>

And you’re done. Save and close the file.

You’ll notice changed the web port on Line 1 from 80 to 5901 because I know 5901 isn’t in use on my local system. Now all I had to do was setup my SSH Tunnels so that local host 5901 forwarded to remote host 80 and local host 5900/623 forwarded to their respective remote host ports via my RPi2.

There is one catch with this method. Every time you go back into your BMC web-interface and click ‘Launch Console’ it appears the BMC generates a new set of security keys. All this means is if you access your BMC via the normal method and then want to use tunnels again you’ll have to get a new .jnlp file and re-apply the above edits.