Symantec Backup Exec RALUS crashes on CentOS 7

We just deployed our first CentOS 7 machine and are trying to back it up using Backup Exec 2010 R3 and the RALUS agent.

After installing the missing compatibility libraries needed for the RALUS:

yum install compat-libstdc++-33.i686 compat-libstdc++-33.x86_64

the agent installs and starts but once the Media Server connects to it the agent crashes.

Some log digging came up with this from /var/log/messages:

Aug  7 16:09:41 localhost kernel: beremote[10898]: segfault at fffffffffffffffc ip 00007f7543fbc8cc sp 00007f75420c29d8 error 5 in libc-2.17.so[7f7543f3c000+1b6000]
Aug  7 16:09:41 localhost abrt-hook-ccpp: Saved core dump of pid 10894 (/opt/VRTSralus/bin/beremote) to /var/tmp/abrt/ccpp-2014-08-07-16:09:41-10894 (48660480 bytes)
Aug  7 16:09:41 localhost abrt-server: Package 'VRTSralus' isn't signed with proper key
Aug  7 16:09:41 localhost abrt-server: 'post-create' on '/var/tmp/abrt/ccpp-2014-08-07-16:09:41-10894' exited with 1
Aug  7 16:09:41 localhost abrt-server: Deleting problem directory '/var/tmp/abrt/ccpp-2014-08-07-16:09:41-10894'

and running the agent in debug mode shows this:

[[email protected] bin]# ./beremote --log-console
f8a1b740 Thu Aug  7 16:25:38 2014 : Starting BE Remote Agent
f8a1b740 Thu Aug  7 16:25:38 2014 : Requested no generation of log file
f8a1b740 Thu Aug  7 16:25:38 2014 : No configuration file specified.  Using default.
f8a1b740 Thu Aug  7 16:25:38 2014 : Log to console: enabled
f8a1b740 Thu Aug  7 16:25:38 2014 : Successfully set the supplementary groups of the process
f8a1b740 Thu Aug  7 16:25:38 2014 : Initialized locks for SSL callbacks
f8a1b740 Thu Aug  7 16:25:38 2014 : Starting NDMP processor
f8a1b740 Thu Aug  7 16:25:38 2014 : NDMPDMainThreadFunc spawned: grpid=1, tid=-231061760
f23a4700 Thu Aug  7 16:25:38 2014 : FS_InitFileSys
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedsnt5.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedssql2.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedsxchg.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedsxese.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedsmbox.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedspush.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedsnote.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedsmdoc.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedssps2.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedssps3.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedsupfs.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedsshadow.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedsoffhost.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   loaded libbedsvx.so
f23a4700 Thu Aug  7 16:25:38 2014 :   loaded libbedsrman.so
f23a4700 Thu Aug  7 16:25:38 2014 :   loaded libbedssms.so
f23a4700 Thu Aug  7 16:25:38 2014 :   loaded libbedssmsp.so
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedsra.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedsdb2.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 :   loaded libbedsedir.so
f23a4700 Thu Aug  7 16:25:38 2014 :   libbedsvmesx.so could not be loaded: 0x       2 (2)
f23a4700 Thu Aug  7 16:25:38 2014 : Initializing FSs
f23a4700 Thu Aug  7 16:25:38 2014 : FS 1 failed to initialize: 0xE000FE46
f23a4700 Thu Aug  7 16:25:38 2014 : Function called: RMAN_InitFileSys
f23a4700 Thu Aug  7 16:25:38 2014 : Using 'UTF-8' Encoding.
f23a4700 Thu Aug  7 16:25:38 2014 : Using vfm path /opt/VRTSralus/VRTSvxms from config.
f23a4700 Thu Aug  7 16:25:38 2014 : Sucessfully set VFM_PRIVATE_ROOT env to /opt/VRTSralus/VRTSvxms.
f23a4700 Thu Aug  7 16:25:38 2014 : VFM_PRIVATE_ROOT was set with value /opt/VRTSralus/VRTSvxms
f23a4700 Thu Aug  7 16:25:38 2014 :      VXMS Initialization OK.
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <rootfs> mounted at </>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <proc> mounted at </proc>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <sysfs> mounted at </sys>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <devtmpfs> mounted at </dev>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <securityfs> mounted at </sys/kernel/security>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <tmpfs> mounted at </dev/shm>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <devpts> mounted at </dev/pts>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <tmpfs> mounted at </run>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <tmpfs> mounted at </sys/fs/cgroup>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <cgroup> mounted at </sys/fs/cgroup/systemd>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <pstore> mounted at </sys/fs/pstore>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <cgroup> mounted at </sys/fs/cgroup/cpuset>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <cgroup> mounted at </sys/fs/cgroup/cpu,cpuacct>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <cgroup> mounted at </sys/fs/cgroup/memory>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <cgroup> mounted at </sys/fs/cgroup/devices>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <cgroup> mounted at </sys/fs/cgroup/freezer>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <cgroup> mounted at </sys/fs/cgroup/net_cls>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <cgroup> mounted at </sys/fs/cgroup/blkio>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <cgroup> mounted at </sys/fs/cgroup/perf_event>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <cgroup> mounted at </sys/fs/cgroup/hugetlb>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <configfs> mounted at </sys/kernel/config>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <xfs> mounted at </>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <autofs> mounted at </proc/sys/fs/binfmt_misc>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <debugfs> mounted at </sys/kernel/debug>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <mqueue> mounted at </dev/mqueue>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <hugetlbfs> mounted at </dev/hugepages>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <nfsd> mounted at </proc/fs/nfsd>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <xfs> mounted at </boot>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <xfs> mounted at </var>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <rpc_pipefs> mounted at </var/lib/nfs/rpc_pipefs>
f23a4700 Thu Aug  7 16:25:38 2014 : Detected Mounted Filesystem: type <binfmt_misc> mounted at </proc/sys/fs/binfmt_misc>
f23a4700 Thu Aug  7 16:25:38 2014 : INFORMATIONAL: Zero value found for 'DisableRMAL' from ralus.cfg, allowing RMAL to initialize
f23a4700 Thu Aug  7 16:25:38 2014 : Successfully resolved the "ndmp" service to port: 10000 (host order)
f23a4700 Thu Aug  7 16:25:38 2014 : BETCPListener successfully installed a signal handler for SIGTERM
f23a4700 Thu Aug  7 16:25:38 2014 : BETCPListener::BETCPListener: This system appears to be a Dual IP system
f23a4700 Thu Aug  7 16:25:38 2014 : BETCPListener::BETCPListener: Successfully set the IPV6_V6ONLY option, this listener may behave as Dual Stack listener
f23a4700 Thu Aug  7 16:25:38 2014 : Started NDMP Listener on port 10000
f0dfd700 Thu Aug  7 16:25:48 2014 : NrdsAdvertiserThread: advertisement cycle started.
f0dfd700 Thu Aug  7 16:25:48 2014 : RMAN_EnumSelfDLE: AgentConfig GetOracleDBNames returned error. If Oracle Agent is installed, please run AgentConfig.
f0dfd700 Thu Aug  7 16:25:48 2014 : NrdsAdvertiserThread: EnumSelfDLE for file system 14 returned 0(0x0) and 0 DLEs
GetIfAddrs(LINUX): failed err = 11
GetAdaptersAddresses: error = 1, ret=-1
f0dfd700 Thu Aug  7 16:25:48 2014 : VX_RemoveDLE: DestroyDLE()
f0dfd700 Thu Aug  7 16:25:48 2014 : NrdsAdvertiserThread: EnumSelfDLE for file system 22 returned -1(0xFFFFFFFF) and 0 DLEs
f0dfd700 Thu Aug  7 16:25:48 2014 : NrdsAdvertiserThread: Security is enabled!!!
f0dfd700 Thu Aug  7 16:25:48 2014 : This instance of BETCPListener was not requested to install a signal handler and hence will not install one!
GetIfAddrs(LINUX): failed err = 11
GetAdaptersAddresses: error = 1, ret=-1
f0dfd700 Thu Aug  7 16:25:48 2014 : NrdsAdvertiserThread: connect to target=mediaserver.mydomain port=6101 failed
f0dfd700 Thu Aug  7 16:25:48 2014 : NrdsAdvertiserThread: Retrying in 60 seconds
e5ad8700 Thu Aug  7 16:25:53 2014 : NrdsAdvertiserThread: negative (purge) advertisement cycle started.
e5ad8700 Thu Aug  7 16:25:53 2014 : NrdsAdvertiserThread: no purge is pending.
e5ad8700 Thu Aug  7 16:25:53 2014 : NrdsAdvertiserThread: negative (purge) advertisement cycle complete.  Waiting 240 minutes before advertising again.
f0dfd700 Thu Aug  7 16:26:48 2014 : NrdsAdvertiserThread: Security is enabled!!!
f0dfd700 Thu Aug  7 16:26:48 2014 : This instance of BETCPListener was not requested to install a signal handler and hence will not install one!
GetIfAddrs(LINUX): failed err = 11
Segmentation fault (core dumped)

 

I’ve opened a case with Symantec and their answer was that CentOS isn’t supported and neither is RHEL7.

Anyone else running into this? Have you fixed it?

I found a blog post where someone suggested hex editing the beremote binary which I’d rather not do. Plus our version of the agent is newer than the one he describes in his post: http://blog.redweb.at/2012/08/howto-backupexec-2012-linux-agent-and-kernel-3-0-debian/

Symantec Backup Exec NDMP backups fail the verify stage when backing up from a NetApp

We are using Symantec Backup Exec 2010 R3 to perform NDMP backups of our NetApp FAS2020 running Data OnTap 7.3.7.

All was working well until we upgraded from Data OnTap 7.3.6 to 7.3.7. Since then Backup Exec 2010 R3 is reporting every NDMP backup as a failure with the below error during the verify stage of the backup:

Job ended: Wednesday, September 12, 2012 at 3:52:55 AM
Completed status: Failed
Final error: 0xe000fe0d - A device-specific error occurred.
Final error category: Resource Errors

For additional information regarding this error refer to link V-79-57344-65037

After contacting NetApp they told us Symantec was at fault.

Symantec did some digging into job logs and came across this error in the NDMP jobs log file taken using Backup Execs built in logging tools:

BENGINE: [07/09/12 10:58:57] [8864] [ndmp\ndmpcomm] - ERROR: 7 Error: I/O error
BENGINE: [07/09/12 10:58:57] [8864] [loops] - NDMP Log Message: Storing of nlist entries failed.
BENGINE: [07/09/12 10:58:57] [8864] [loops] - NDMP Notify Data Halted: Aborted
BENGINE: [07/09/12 10:58:57] [8864] [loops] - NDMP Log Message: Aborted by client

Symantec then recommended contacting NetApp again after reviewing their ticketing system and seeing other NetApp customers had this exact same problem and were told to contact NetApp.

After getting a hold of NetApp again with the above information they have now told me this is a known issue and there is an internal bug report at NetApp for it. There is supposedly a known fix but it is not yet available for any shipped versions of Data OnTap. The internal bug report lists the following work arounds for NetBackup (which I assume will work on Backup Exec):

  1. Restore the directory to another location and extract the file after the restore completes.
  2. To perform a single file restore without using DAR, set the value of the environment variable EXTRACT to e or E. However, the single file restore reads the whole backup stream on the tape and this restore operation might be slow.
  3. Set the NDMP version on the storage system to version 3 and then perform the restore.

I’ve tested option 3 by running the following commands on our FAS2020

filer> ndmpd off
filer> ndmpd version 3
filer> ndmpd on

and backups are now failing with a different error:

Job ended: Wednesday, September 12, 2012 at 2:49:51 PM
Completed status: Failed
Final error: 0xe000feb9 - The NDMP subsystem reports that a request cannot be processed because it is in the wrong state to service the request.
Final error category: Resource Errors

For additional information regarding this error refer to link V-79-57344-65209

I reverted the NDMP version back to 4 and will now wait for a conference call with NetApp and Symantec to get to the bottom of this.

Despite the reported failures the backups do appear to still be good.

Update – September 14th, 2012

After a conference call with Symantec and NetApp the final conclusion is that this is a bug that only exists in Data OnTap 7.3.7 and it will be fixed in Data OnTap 7.3.7p1 which should be released sometime in the near future. No exact dates were provided.

The public bug report for this on NetApps site is 613414. You can subscribe to that bug with your NetApp account and when it’s resolved (the release of 7.3.7p1) you will receive an e-mail. The NetApp rep wasn’t certain if general e-mails go out to NetApp custers on ‘p’ releases of Data OnTap and subscribing to the bug should guarantee notification when the new version is released.

That public bug report lists the problem only effects Data OnTap 7.3.8. That is incorrect and it should read 7.3.7.

In the mean time the workarounds remain almost the same:

  1. Create CIFS shares for the volumes you backup via NDMP and change your backups to use the CIFS share instead of NDMP
  2. Disable backup verification in Backup Exec for your NDMP jobs
  3. Downgrade to Data OnTap 7.3.6
  4. Wait for Data OnTap 7.3.7p1

 

Update – October 4th, 2012

Sam in the comments got this update from NetApp

This fix will be included in 7.3.7P1. We are expecting 7.3.7P1 currently has a target release date of Oct 29th.

Here’s hoping.

 

Update – October 30th, 2012

Data OnTap 7.3.7P1 is out! We have one confirmation that this patch has fixed the verify problem.

Release Notes and Download: http://support.netapp.com/NOW/download/software/ontap/7.3.7P1/

 

Update – November 27th, 2012

I can confirm that the 7.3.7P1 patch has corrected this problem for us.

Symantec Endpoint Protection 12.1 RTM and SYSFER.DLL

We ran into the following problem after upgrading our servers (Windows 2003 through 2008 R2) to Symantec Endpoint Protection 12.1 RTM from Symantec Endpoint Protection 11.

Our Symantec 12.1 RTM package  that we deployed on our servers had the following components in it:

  • Virus, Spyware, and Basic Download Protection
    • Advanced Download Protection
  • Proactive Threat Protection
    • SONAR Protection
    • Application and Device Control

When a user tries to remote desktop into a server OR after remoting into a server tries to launch an application they get the following error:

This application has failed to start because \System32\SYSFER.DLL was not found. Re-installing the application may fix this problem.

Symantec SYSFER.DLL error

If the error comes up during a RDP session you never get to the desktop. I’ve also had it come up when physically in front of a server. Fortunately if you’re physically in front of the server you can click ‘Ok’ on the error and then hit ‘CTRL+ALT+DEL’ and reboot the server. Typically you can get into the server right away after the reboot.

I also had luck trying different accounts when trying to get into servers. If my account didn’t work the local admin account sometimes would.

After a call to Symantec I got two solutions out of them:

  1. Upgrade to Symantec Endpoint Protection 12.1 RU1
  2. Remove the ‘Application and Device Control’ component from Symantec Endpoint Protection on each of our servers.

I’ve gone to one of our servers having this problem, removed the ADC component from Symantec and rebooted the server. The error message has not re-appeared after about 1.5 hours.

Update: Removing the ADC component from Symantec has resolved this issue for us.