NetApp SnapManager for Exchange failing with VSS_E_WRITERERROR_RETRYABLE

Running into a problem with Exchange 2010, a FAS2240-4 and SnapManager for Exchange 7.1 where backups would randomly fail every now and then started failing consistently.

Our DFM server would send us an e-mail when the failure occurred that looked like this:

CLIENT APP ERROR Backup: SME Version 7.1: (111) on dora02 at Sun Sep 04 22:09:31 PDT 2016

The backup failure would also knock the databases offline and require us to re-sync them the next day.

Digging into the SME logs we found the following:

[22:10:42.635]  *****BACKUP DETAIL SUMMARY*****

 [22:10:42.635]  Backup group set #1: 

 [22:10:42.635]  Backup SG/DB [FK] Error: SnapManager detected the following Exchange writer error. Please retry SnapManager operation.
 VSS_E_WRITERERROR_RETRYABLE: The writer failed due to an error that might not occur if another snapshot copy is created.


 [22:10:42.635]  Backup SG/DB [LQ] Error: SnapManager detected the following Exchange writer error. Please retry SnapManager operation.
 VSS_E_WRITERERROR_RETRYABLE: The writer failed due to an error that might not occur if another snapshot copy is created.


 [22:10:42.635]  Backup SG/DB [RZ] Error: SnapManager detected the following Exchange writer error. Please retry SnapManager operation.
 VSS_E_WRITERERROR_RETRYABLE: The writer failed due to an error that might not occur if another snapshot copy is created.


 [22:10:42.635]  Backup SG/DB [AE] Error: SnapManager detected the following Exchange writer error. Please retry SnapManager operation.
 VSS_E_WRITERERROR_RETRYABLE: The writer failed due to an error that might not occur if another snapshot copy is created.


 [22:10:42.635]  Backup SG/DB [Public Folders (<SERVER>)] Error: SnapManager detected the following Exchange writer error. Please retry SnapManager operation.
 VSS_E_WRITERERROR_RETRYABLE: The writer failed due to an error that might not occur if another snapshot copy is created.




 [22:10:42.635]  ***SNAPMANAGER BACKUP JOB ENDED AT: [09-04-2016_22.10.42]

 [22:10:42.635]  Failed to backup storage groups/databases.

NetApp provides this page for what they call “Common VSS errors”: https://kb.netapp.com/support/index?page=content&id=1010785&locale=en_US

None of the suggestions there helped us.

In the end I found this forum post for a different product: https://community.emc.com/thread/168678?tstart=0 and applied the registry edits they suggested here: https://community.emc.com/message/705346#705346

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpWindowSize=256000
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\GlobalMaxTcpWindowSize=16777216
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveInterval=1000
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime=600000

and then rebooted our Exchange server running SME.

Since making the change roughly 20 days ago we haven’t had a single failed backup.

VeraCrypt CLI Benchmark Script

I was setting up VeraCrypt on a Raspberry Pi 2 the other day so I could use it as a backup target for my main server and was curious how fast, hahaha just kidding, I mean how slow VeraCrypt would be.

To my disappointment VeraCrypt does not provide a method for running the benchmark built into the GUI via the CLI.

This is the nice benchmark you can run from the GUI:

veracryptBenchmark

So I took some time this weekend and wrote a simple BASH script you can use to benchmark the CLI version of VeraCrypt.

I only tested it with VeraCrupt 1.18a. Chances are if you run it with a previous version you’ll get some really fast times for the new Encryption/Hashes they added in 1.18 because the test won’t actually run.

The benchmark I wrote simply outputs how long it takes to create and encrypt a container of a specific size. It’s not quite as good as the GUI version which outputs the actual speed but it’s at least something. I think this will work on any version of Linux. I tried to use only build-in system utilities and since I wrote it on CentOS 6 means I probably used some of the oldest GNU utilities still commonly used.

During my testing I found that having a container size to small would result in all times being nearly the same with the exception of ripemd160 and streebog. To get better results I recommend using at least a 1GB test file size on modern hardware. Even at 1GB you can see the sample from my main server has each encryption and hash type only varying by 2-45 seconds.

Here is a sample of what the script outputs:

I will admit I don’t fully understand these results. I would have expected much more variety in timing between the different types of encryption under a single hash type. Especially on the Raspberry Pi 2.

The script does a simply container creation benchmark in it’s default state. However if you add <USERNAME RUNNING BENCHMARK> ALL=NOPASSWD: <PATH TO bin/veracrypt>  to your sudo file and change FILLCONTAINER=0  to FILLCONTAINER=1  it will perform file write speed benchmark.

and here is the script itself:

#!/bin/bash
# Simple Veracrypt CLI Benchmark Script
# Tested on Veracrypt: v1.18a
#
# Created by: Eric Schewe
# Created on: 2016-09-03
# Version: 1.1
# Last updated: 2016-06-05 16:38
# Source: http://www.pickysysadmin.ca/
#

#Benchmark size in bytes. Uncomment which ever size you'd like to use.
#If you want to add your own sizes make sure they are multiples of 26214400 (25MB)
#CONTAINERSIZE=104857600 #100MB
#CONTAINERSIZE=209715200 #200MB
#CONTAINERSIZE=524288000 #500MB
CONTAINERSIZE=1048576000 #1GB

#This determines if we are going to write data into the containers we create
#You must temporarily alter your sudo file for this to work by adding this line:
#<USERNAME RUNNING BENCHMARK> ALL=NOPASSWD: <PATH TO bin/veracrypt>
#Be sure to remove this line from your sudo file once your completed benchmarking
FILLCONTAINER=1
FILLFILECOUNT=`expr $CONTAINERSIZE / 26214400 - 1`

#All the hashes currently supported by VeraCrypt
HASH=(sha256 sha512 whirlpool ripemd160 streebog)
#All the encryption methods currently supported by VeraCrupt
ENCRYPTION=(AES Twofish Camellia Kuznyechik Serpent Gost89 AES-Twofish AES-Twofish-Serpent Serpent-AES Serpent-Twofish-AES Twofish-Serpent)
#Get the cpu model of the system running the benchmark
CPUINFO=`cat /proc/cpuinfo |grep -oP "model name.*?:(.*)" | uniq |sed "s/model name.*: //"`
#Hostname of system running the benchmark
HOSTNAME=`hostname`
#Start time in a good format (https://xkcd.com/1179/)
STARTTIME=`date "+%Y-%m-%d - %k:%M:%S"`
#And a Unix Timestamp to calculate elapsed time easily
STARTTIMEUNIX=`date +%s`
#Calculate the megabyte size of the container being created
CONTAINERSIZEMB=`echo "$CONTAINERSIZE / 1024 / 1024" |bc`

#For output
LONGHEADER="-------------------------------------------------------------------------------------------------"
SHORTHEADER="-------------------------------------------------------------------"

#Output the benchmark header based on the type of benchmark we are doing
if [ $FILLCONTAINER -eq 1 ];
then
  echo $LONGHEADER
else
  echo $SHORTHEADER
fi
echo "- Veracrypt Benchmark"
echo "- Started: $STARTTIME"
echo "- Hostname: $HOSTNAME"
echo "- CPU: $CPUINFO"
echo "- Container Size: $CONTAINERSIZEMB megabytes"
if [ $FILLCONTAINER -eq 1 ];
then
  echo $LONGHEADER
  printf "%-10s | %-30s | %-15s | %-15s | %-15s \n" "HASH" "ENCRYPTION" "VOL CREATE TIME " "VOL FILL TIME " "SPEED (MB/sec)"
  echo $LONGHEADER
else
  echo $SHORTHEADER
  printf "%-10s | %-30s | %-1s \n" "HASH" "ENCRYPTION" "TIME"
  echo $SHORTHEADER
fi

#Loop through each HASH
for a in "${HASH[@]}"
do

  #Loop through each ENCRYPTION method for this HASH
  for b in "${ENCRYPTION[@]}"
  do
    #Grab a random 320 character string and re-use it for all encryptions done for the current HASH
    < /dev/urandom tr -dc _A-Z-a-z-0-9 | head -c${1:-320} > randomString.txt 2>&1
    #Grab a random 64 character password (maximum length VeraCrypt supports)
    RANDOMPASSWORD=`< /dev/urandom tr -dc _A-Z-a-z-0-9 | head -c${1:-64};echo;`
    #General a container name that shouldn't conflict with anything already in the directory
    CONTANIERNAME=`echo -n $RANDOMPASSWORD | md5sum -t |head -c${1:-24}`".vc"

    #Create a VeraCrypt volume and time it, little grep to get the only time we care about from the output
    VOLUMECREATETIME=`(time veracrypt --create $CONTANIERNAME --size=$CONTAINERSIZE --password $RANDOMPASSWORD --encryption $b --hash $a --filesystem FAT --pim=2048 --random-source=randomString.txt --keyfiles= --volume-type=Normal) 2>&1 |grep -i "real" |sed "s/real//" |sed "s/ //g"`

    if [ $FILLCONTAINER -eq 1 ];
    then
      #Figure out the directory name we are going to mount the volume to and then create it
      DIRNAME=`echo -n $RANDOMPASSWORD | md5sum -t |head -c${1:-24}`
      mkdir $DIRNAME
      #Mount the container to the directory we just created
      veracrypt --mount $CONTANIERNAME --hash $a --password=$RANDOMPASSWORD $DIRNAME --pim=2048 --keyfiles= --protect-hidden=no >/dev/null 2>&1
      #Get the start time for filling up the container
      FILLSTARTTIME=`date +%s`
      #Fill the container with 25MB files
      VOLUMEFILLTIME=`(time (for (( i=1; i<=$FILLFILECOUNT; i++ )); do dd if=/dev/urandom of=$DIRNAME/$i".dat" bs=26214400 count=1 status=none conv=fdatasync; done)) 2>&1 |grep -i "real" |sed "s/real//" |sed "s/ //g"`
      #Get the end time for filling up the container
      FILLENDTIME=`date +%s`
      #Math to calculate elapsed seconds and then how many MB/sec we did
      FILLELAPSED=`echo "$FILLENDTIME - $FILLSTARTTIME" |bc`
      FILLSPEED=`echo "scale=3; $FILLFILECOUNT * 26214400 / $FILLELAPSED / 1024 / 1024" |bc -l`
      #Output the results
      printf "%-10s | %-30s | %-15s | %-8s | %-15s \n" "$a" "$b" "$VOLUMECREATETIME" "$VOLUMEFILLTIME" "$FILLSPEED"
      #Umount the container and clean up before the next container is created
      veracrypt -d $CONTANIERNAME >/dev/null 2>&1
      rm -rf $DIRNAME
    else
      printf "%-10s | %-30s | %-1s \n" "$a" "$b" "$VOLUMECREATETIME"
    fi

    #Cleanup before next volume is created
    rm -f $CONTANIERNAME
    rm -f randomString.txt

  done

  #Output a footer to signify the completion of this HASH
  if [ $FILLCONTAINER -eq 1 ];
  then
    echo $LONGHEADER
  else
    echo $SHORTHEADER
  fi
done

#UNIX timestamp when we're all done. Then do some math and conversions
ENDTIMEUNIX=`date +%s`
ELAPSEDTIME=`expr ${ENDTIMEUNIX} - ${STARTTIMEUNIX}`
ELAPSEDTIME=`date [email protected]$ELAPSEDTIME -u +%H:%M:%S`

#Output summary footer
echo "- Completed on `date "+%Y-%m-%d - %k:%M:%S"`"
echo "- Elapsed Time (HH:MM:SS): $ELAPSEDTIME"

if [ $FILLCONTAINER -eq 1 ];
then
  echo $LONGHEADER
else
  echo $SHORTHEADER
fi

I’m not super confident in the output. Some of the numbers leave me scratching my head. It’s completely possible I’ve got something totally wrong with this script. Please feel free to post comments/revisions.