Symantec Backup Exec NDMP backups fail the verify stage when backing up from a NetApp

We are using Symantec Backup Exec 2010 R3 to perform NDMP backups of our NetApp FAS2020 running Data OnTap 7.3.7.

All was working well until we upgraded from Data OnTap 7.3.6 to 7.3.7. Since then Backup Exec 2010 R3 is reporting every NDMP backup as a failure with the below error during the verify stage of the backup:

Job ended: Wednesday, September 12, 2012 at 3:52:55 AM
Completed status: Failed
Final error: 0xe000fe0d - A device-specific error occurred.
Final error category: Resource Errors

For additional information regarding this error refer to link V-79-57344-65037

After contacting NetApp they told us Symantec was at fault.

Symantec did some digging into job logs and came across this error in the NDMP jobs log file taken using Backup Execs built in logging tools:

BENGINE: [07/09/12 10:58:57] [8864] [ndmp\ndmpcomm] - ERROR: 7 Error: I/O error
BENGINE: [07/09/12 10:58:57] [8864] [loops] - NDMP Log Message: Storing of nlist entries failed.
BENGINE: [07/09/12 10:58:57] [8864] [loops] - NDMP Notify Data Halted: Aborted
BENGINE: [07/09/12 10:58:57] [8864] [loops] - NDMP Log Message: Aborted by client

Symantec then recommended contacting NetApp again after reviewing their ticketing system and seeing other NetApp customers had this exact same problem and were told to contact NetApp.

After getting a hold of NetApp again with the above information they have now told me this is a known issue and there is an internal bug report at NetApp for it. There is supposedly a known fix but it is not yet available for any shipped versions of Data OnTap. The internal bug report lists the following work arounds for NetBackup (which I assume will work on Backup Exec):

  1. Restore the directory to another location and extract the file after the restore completes.
  2. To perform a single file restore without using DAR, set the value of the environment variable EXTRACT to e or E. However, the single file restore reads the whole backup stream on the tape and this restore operation might be slow.
  3. Set the NDMP version on the storage system to version 3 and then perform the restore.

I’ve tested option 3 by running the following commands on our FAS2020

filer> ndmpd off
filer> ndmpd version 3
filer> ndmpd on

and backups are now failing with a different error:

Job ended: Wednesday, September 12, 2012 at 2:49:51 PM
Completed status: Failed
Final error: 0xe000feb9 - The NDMP subsystem reports that a request cannot be processed because it is in the wrong state to service the request.
Final error category: Resource Errors

For additional information regarding this error refer to link V-79-57344-65209

I reverted the NDMP version back to 4 and will now wait for a conference call with NetApp and Symantec to get to the bottom of this.

Despite the reported failures the backups do appear to still be good.

Update – September 14th, 2012

After a conference call with Symantec and NetApp the final conclusion is that this is a bug that only exists in Data OnTap 7.3.7 and it will be fixed in Data OnTap 7.3.7p1 which should be released sometime in the near future. No exact dates were provided.

The public bug report for this on NetApps site is 613414. You can subscribe to that bug with your NetApp account and when it’s resolved (the release of 7.3.7p1) you will receive an e-mail. The NetApp rep wasn’t certain if general e-mails go out to NetApp custers on ‘p’ releases of Data OnTap and subscribing to the bug should guarantee notification when the new version is released.

That public bug report lists the problem only effects Data OnTap 7.3.8. That is incorrect and it should read 7.3.7.

In the mean time the workarounds remain almost the same:

  1. Create CIFS shares for the volumes you backup via NDMP and change your backups to use the CIFS share instead of NDMP
  2. Disable backup verification in Backup Exec for your NDMP jobs
  3. Downgrade to Data OnTap 7.3.6
  4. Wait for Data OnTap 7.3.7p1

 

Update – October 4th, 2012

Sam in the comments got this update from NetApp

This fix will be included in 7.3.7P1. We are expecting 7.3.7P1 currently has a target release date of Oct 29th.

Here’s hoping.

 

Update – October 30th, 2012

Data OnTap 7.3.7P1 is out! We have one confirmation that this patch has fixed the verify problem.

Release Notes and Download: http://support.netapp.com/NOW/download/software/ontap/7.3.7P1/

 

Update – November 27th, 2012

I can confirm that the 7.3.7P1 patch has corrected this problem for us.

17 thoughts on “Symantec Backup Exec NDMP backups fail the verify stage when backing up from a NetApp”

  1. Holy Crap. I just upgraded our 2050 to 7.3.7 and now I’m having this exact problem. I’ve been going back and forth with NetApp and Symantec. Thank you for posting. I’m not going crazy… again. Were you able to get a fix?

    Thanks,

    Sam

    Reply
    • Hey Sam,

      I wasn’t able to get it resolved beyond being told to wait for 7.3.7p1. I wasn’t given an ETA on that release only that it was coming. The NetApp rep suggested putting a watch on this bug (http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=613414) and when it was marked as resolved it probably meant 7.3.7p1 was coming soon.

      It seems that bug is now very blank with no details. Shame I didn’t screenshot it two weeks ago when I first saw it.

      Other that waiting for the new firmware you can downgrade (if that’s possible for you. It isn’t for us) or run all your backups via CIFS shares instead of NDMP (lol yeah right).

      -Eric

      Reply
  2. Hi Eric,

    I contacted NetApp for a status on the bug and there is not one. Their fix is upgrade to 8.x. With a 2050 I can’t upgrade, 7.3.7 is the highest I can go. NetApp did say with a 2040 you can go to 8.x. That might be your fix.

    I was wondering, have you tried BackupEx 2012? I’m running 2010 R3 like what your post stated.

    Sam

    Reply
    • We haven’t gone to BE2012 yet. We’re waiting on some new hardware before making that upgrade.

      Our FAS2040 can go to 8.x but our SnapMirror destination (FAS2020) cannot. So we’re stuck on 7.x until we upgrade our FAS2020 or get something new to replace our FAS2040.

      I was told that a 7.3.7p1 was coming to resolve this issue.

      I have my old case number still. Would you like me to try calling and see if I can some information on a 7.x patch?

      Reply
  3. Yeah, that would be awesome.

    I really don’t want to downgrade. We are kind of in your same situation, stuck in hardware and software compatibility problems.

    Reply
  4. I’m also having this bug with FAS2020. But NetApp told me:
    – It was Symantec fault
    – BE2012 is not supported on FAS2020. And I should go with 2010 and OnTap 7.3.3
    – Upgrade to 8.0

    Reply
  5. By the way, if you read the release notes for 7.3.7 it can be seen that NetApp messed up with the NDMP code, to support renames when doing a restore.
    So, to me it is quite easy to figure out who’s to blame. The changes made to the code.

    May I ask you why you can’t downgrade to 7.3.6P5? I was planning to downgrade mine, but now I wonder if there is any downside I’m not aware of.

    Reply
    • We SnapMirror from a FAS2040 to a FAS2020. Downgrading requires me to do both filers.

      There is also a bug in 7.3.6 which will randomly enable inheritance when copying files to a CIFS share. We’re at the start of a two file server migration and I can’t have inheritance randomly enabling itself,

      We’re waiting for 7.3.8 or 7.3.7p1 since backups are still good and the failures are just annoying. We’re using BE2010R3.

      Reply
  6. I’m currently snapmirroring between 7.3.6 and 7.3.7 without problems.
    Also, how do you know restores work fine?

    Reply
    • You’re going from 7.3.6 -> 7.3.7 for your SnapMirrors right? I read you can’t go the other way (newer version to older version).

      I did a test restore of a folder and did a comparison between the live data and the restored data with BeyondCompare and it came up identical.

      Reply
  7. Here’s a little update from a NetApp SE

    This bug is fixed in 7.3.7D2. That is a debug release that has not gone to full review, typically we don’t like customers running D releases unless it is a dire issue.

    This fix will be included in 7.3.7P1. We are expecting 7.3.7P1 currently has a target release date of Oct 29th.

    Reply

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.