Backing up a VM with a PCIe device attached to it with Veeam

In a previous post I talked about installing a Quadro P620 into my ESXi host so I could attach it to my Plex VM. This worked out great except my Veeam backups started failing.

There is a limitation in VMware vSphere where you can’t take a Snapshot of a VM with a PCIe device passed through to it.

One option is to install the Veeam Agent for the OS you’re running and use it to take guest based backups. This isn’t ideal though in my opinion. I would much rather keep my host based backups of the VM. Fortunately this is a easy solution to this problem.

Shut off the VM before taking the Veeam backup and then power it back on after the backup is complete.

To get this working you need to install the VMware PowerShell Module on your Veeam server. To do this perform the following steps:

  1. Right click on the PowerShell shortcut and choose ‘Run as Administrator’
  2. Run the following commands:
    Find-Module -Name VMware.PowerCLI
    Install-Module -Name VMware.PowerCLI -Scope AllUsers
    Get-Command -Module *VMWare*
    Set-PowerCLIConfiguration -Scope AllUsers -ParticipateInCeip $false -InvalidCertificateAction Ignore
  3. You should see a large list of VMware PowerShell commands output which means you’ve successfully installed the module

Next up you need to make sure your Veeam Services are running under a Service Account with the appropriate permissions in vCenter. I believe this is normally a best practice and chances are you’ve all already done this. In my case I’d installed Veeam as a local service. Don’t know why but to fix it I just flipped over the following Windows Services to run as my backup operator account which had Domain Admin, Backup Operator, Local Admin on the Veeam Server and Administrator on vSphere permissions already.

The services were:

  • Veeam Backup Enterprise Manager
  • Veeam Backup Service
  • Veeam Broker Service
  • Veeam Cloud Connect Service
  • Veeam Guest Catalog Service
  • Veeam RESTful API Service

I then rebooted my Veeam server.

I already have my vCenter service joined to my domain but I did run into an issue where single sign-on wasn’t working properly. If I attempted to connect to my vCenter server via PowerShell using “Connect-VIServer <VCENTER SERVER FQDN>” I would be prompted for credentials which shouldn’t be happening since the account I’m logged in as is an Administrator in vCenter.

Turned out I needed to add my AD Group that gives users Administrative access to the vCenter Global Permissions list:

  1. Login to vCenter as an administrator
  2. Click ‘Menu’ and ‘Administrator’
  3. Click ‘Global Permissions’
  4. Click ‘Add’
  5. Change the ‘User’ field to your domain, search for the user or security group (I recommend security groups) and select it, make sure the role is ‘Administrator’ and check ‘Propagate to children’ and click ‘Ok’

After doing this I could run “Connect-VIServer <VCENTER SERVER FQDN>” and not be prompted for credentials.

Now that all the prep-work is done we can re-configure our backup job in Veeam.

First we’re going to need two scripts, one to shutdown the VM and one to boot it back up. I’ve saved these scripts on my Veeam server in “C:\Scripts\<VM FQDN>\”

The shutdown script is “shutdown.bat”, be sure to search and replace “VCENTER FQDN” and “VM FQDN” with your values:

C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -NoLogo -executionpolicy bypass -Command Connect-VIServer -Server "VCENTER FQDN"; Shutdown-VMGuest -VM "VM FQDN" -Server VCENTER FQDN -Confirm:0; do{$vm=Get-VM -Name "VM FQDN"}while ($vm.PowerState -eq \"PoweredOn\")

The startup script is “startup.bat”, be sure to search and replace “VCENTER FQDN” and “VM FQDN” with your values:

C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -NoLogo -executionpolicy bypass -Command Connect-VIServer -Server "VCENTER FQDN"; Start-VM -VM "VM FQDN" -Server "VCENTER FQDN"; Start-Sleep -s 90

Once you’ve created these fire up the Veeam console and re-configure the VMs job:

  1. Launch Veeam
  2. Find the backup job for your VM, right click on it and choose ‘Edit’
  3. Go to ‘Storage’
  4. Click ‘Advanced’
  5. Go to ‘Scripts’
  6. Checkmark ‘Run the following script before the job:’ and select your “shutdown.bat” script
  7. Checkmark ‘Run the following script after the job:’ and select your “startup.bat” script
  8. Click ‘Ok’
  9. Click ‘Finish’
  10. Perform a test run of the job, you can monitor the start-up/shutdown in vCenter

That’s it. Minor inconvenience but it works. Hopefully vSphere 7 will allow for snapshots on VMs with pass-through devices configured.

References

Adding a Quadro P620 to my Plex VM

I currently run Plex in a CentOS 7 VM (on top of vSphere 6.7) with two 2vCPUs and 2GB of vRAM.

When I needed to transcode video to sync it to a mobile device for a trip the process takes a while and consumes a lot of CPU on the VM. I could just add more vCPUs to the VM but I have a limit on how much CPU I have to toss around and there are more efficient ways to transcode video.

I bought my Dell T340 specifically with a Xeon E-2176G CPU in it so I could take advantage of the on-board GPU to handle my transcoding work. After a bunch of back and forth with VMware, Dell and Intel it turns out that Dell did not build the T340 in a way that it can actually use the on-board GPU on my CPU. Why they offer it as a choice, I don’t know but here we are.

My next option was to purchase a video card to do the work. I did some research and came up with the Quadro P620 (specifically the PYN version) being the most affordable with the features I wanted, specifically NVENC. Added bonus, it supports HEVC (H.265) which should future-proof me for a while and allow me to eventually take advantage of this card for transcoding my Blurays to H.265, but that’s another post.

The card arrived, I installed it, enabled it for passthruough in vSphere, attached it to my Plex VM and booted it up.

I downloaded the latest nVidia driver to my VM and ran the installer (as root):

[[email protected] ~]# chmod a+x NVIDIA-Linux-x86_64-430.50.run
[[email protected] ~]# ./NVIDIA-Linux-x86_64-430.50.run

The installation was straight forward, it in fact took care of everything I needed. It automatically blacklisted the default video device for me, asked me to reboot and re-run the installer, which I did and everything almost worked.

After the drive was successfully installed I ran the nvidia tool provided with the drivers to verify things and was greeted with:

[[email protected] ~]# nvidia-smi

Unable to determine the device handle for GPU 0000:03:00.0: Unknown Error

Fortunately this issue is well documented on the internet and the quick fix was to shut down the VM and make a tweak to it’s configuration. Since I have vCenter I used the GUI to solve this problem instead of downloading the VMX file, editing it and re-uploading the VMX file for the VM:

  1. Login to vCenter
  2. Right click and choose ‘Edit Settings’ on the VM
  3. Go to ‘VM Options’ and expand ‘Advanced’
  4. Click ‘Edit Configuration’
  5. Click ‘Add Configuration Params’
  6. Enter the following without quotes:
    Name: “hypervisor.cpuid.v0”
    Value: “FALSE”
  7. Click ‘Ok’
  8. Click ‘Ok’
  9. Boot up the VM

Once the VM came back up I got the output I was expecting from nvidia-smi

[[email protected] ~]# nvidia-smi

Thu Oct 24 18:36:20 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P620         Off  | 00000000:13:00.0 Off |                  N/A |
| 40%   54C    P0    N/A /  N/A |     10MiB /  2000MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The last thing to do before testing was to make sure Plex was configured to use hardware transcoding:

  1. Login to your Plex’s WebUI
  2. Under ‘Settings’ click ‘Transcoder’
  3. Checkmark ‘Use hardware acceleration when avalible’
  4. Click ‘Save Changes’

I then gave things a quick test by trying to sync a TV show to my iPhone and then re-ran nvidia-smi:

[[email protected] ~]# nvidia-smi

Thu Oct 24 18:38:59 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P620         Off  | 00000000:13:00.0 Off |                  N/A |
| 41%   57C    P0    N/A /  N/A |    177MiB /  2000MiB |     20%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     22510      C   /usr/lib/plexmediaserver/Plex Transcoder     167MiB |
+-----------------------------------------------------------------------------+

Bingo, that was it. Now. How much faster was the Quadro P620 over my Xeon E-2176G, roughly 4.5x faster.

My Plex transcoding settings are:

  • Transcoder quality: Prefer higher quality encoding
  • Background transcoding x264 preset: Medium
  • Maximum simultaneous video transcode: 4

But wait you might say, why set “Maximum simultaneous video transcode” to “4”? A Quadro P620 can only do 2?

This is why, only took a few seconds as root:

# git clone https://github.com/keylase/nvidia-patch.git
# cd nvidia-patch/
# bash patch.sh

 

 

After vCenter 6.5 U1g newly cloned VMs report “HA not enabled”

Our environment is:

  • vCenter 6.5 U1g (8024368) running on a Windows Server 2008 R2 Standard VM (we’re moving to an appliance once we go 6.7)
  • vSphere 5.5 7504623, 8 node cluster

Since upgrading to the latest vCenter every new VM we clone into our production cluster reports “The virtual machine failed to become vSphere HA Protected and HA may not attempt to restart it after a failure”

After contacting VMware Support they informed me that this is a known issue with the versions of vCenter and vSphere we’re using and to upgrade to vSphere 6.0 or 6.5.

Since vSphere 5.5 is EOL in September no patches are coming to resolve this.

It also turns out the issue is a GUI issue only. The VMs displaying this error are still protected via HA and will recover fine if a host goes down.

A workaround for the issue is to disable/enable HA. Doing so will clear the error.

Importing a OVF exported from vCloud into VMware Workstation fails

We’re backing out of a vCloud provider and trying to drag our VMs back into our local vSphere cluster.

I’ve used ovftool to export our VMs from vCloud into OVF templates. I then import the OVF into VMware Workstation 14 and from there drag/drop the VM into our vSphere cluster. There is likely a way to get the ovftool to export in a format that will work directly with vSphere but since this is working I’m just going with it.

This process worked fine on all of our VMs until I got to a group of them that had access to an extra network in vCloud. When trying to import these VMs into VMware Workstation I get the following error:

The source contains more than one network. This target supports at most one network.

I cracked open the ovf file in a text editor and found this near the very top:

    <ovf:NetworkSection>
        <ovf:Info>The list of logical networks</ovf:Info>
        <ovf:Network ovf:name="server-net">
            <ovf:Description/>
        </ovf:Network>
        <ovf:Network ovf:name="myorg-it-pa-protected">
            <ovf:Description/>
        </ovf:Network>
    </ovf:NetworkSection>
    <vcloud:NetworkConfigSection ovf:required="false">
        <ovf:Info>The configuration parameters for logical networks</ovf:Info>
        <vcloud:NetworkConfig networkName="server-net">
            <vcloud:Description/>
            <vcloud:Configuration>
                <vcloud:IpScopes>
                    <vcloud:IpScope>
                        <vcloud:IsInherited>true</vcloud:IsInherited>
                        <vcloud:Gateway>10.201.207.254</vcloud:Gateway>
                        <vcloud:Netmask>255.255.248.0</vcloud:Netmask>
                        <vcloud:IsEnabled>true</vcloud:IsEnabled>
                    </vcloud:IpScope>
                </vcloud:IpScopes>
                <vcloud:ParentNetwork href="" name="server-net"/>
                <vcloud:FenceMode>bridged</vcloud:FenceMode>
                <vcloud:RetainNetInfoAcrossDeployments>false</vcloud:RetainNetInfoAcrossDeployments>
            </vcloud:Configuration>
            <vcloud:IsDeployed>false</vcloud:IsDeployed>
        </vcloud:NetworkConfig>
        <vcloud:NetworkConfig networkName="myorg-it-pa-protected">
            <vcloud:Description/>
            <vcloud:Configuration>
                <vcloud:IpScopes>
                    <vcloud:IpScope>
                        <vcloud:IsInherited>true</vcloud:IsInherited>
                        <vcloud:Gateway>10.201.2.254</vcloud:Gateway>
                        <vcloud:Netmask>255.255.255.0</vcloud:Netmask>
                        <vcloud:IsEnabled>true</vcloud:IsEnabled>
                    </vcloud:IpScope>
                </vcloud:IpScopes>
                <vcloud:ParentNetwork href="" name="test-net"/>
                <vcloud:FenceMode>bridged</vcloud:FenceMode>
                <vcloud:RetainNetInfoAcrossDeployments>false</vcloud:RetainNetInfoAcrossDeployments>
            </vcloud:Configuration>
            <vcloud:IsDeployed>false</vcloud:IsDeployed>
        </vcloud:NetworkConfig>
    </vcloud:NetworkConfigSection>

In here you can see two networks, “myorg-it-pa-protected” and “server-net” on Line 3 and 6

The networking configuration doesn’t really matter to me since it doesn’t match with our vSphere deployment all I want to do is get these VMs imported. I’ll edit their networking afterwards.

I ended up deleting “myorg-it-pa-protected” by taking out lines 6-8 and lines 29-45. I then save/closed the OVF file and ran it through a hashing app to get the files SHA256 value.

I then opened the .mf file that sits in the same directory as the OVF file and updated the SHA256 entry for the OVF file. I was then able to import my VMs into VMware Workstation.

On Mac/Linux you can use “sha256sum <filename>” to get the SHA256 value of the edited OVF file. On Windows I use tools like HashTab and HashCalc OR if you have the Linux Subsystem installed on Windows 10 you can just use “sha256sum <filename>”.

Datastores not listed after deploying VMware Replication Appliance

Just did a fresh deployment of the VRM 6.5.1 appliance into vCenter 6.5.1u1 which controls our vSphere 5.5 hosts.

Installation and configuration went smoothly but when I went to setup a test replication for a VM I could not complete the setup because none of my datastores were being listed.

A reboot of vCenter did not help.

Restarting the VRM service via the appliances WebUI fixed the problem. A reboot of the appliance would have also probably worked.

You can restart the service via: https://<APPLIANCE FQDN>:5480/

  1. Click ‘Configuration’ under the ‘VM’ page
  2. Click ‘Restart’ at the bottom

Pretty straight forward solution but I didn’t find this in the first few pages of Google results. Might save someone else a bunch of troubleshooting.