How to securely erase your data on a NetApp

When drives in a NetApp are being obsoleted and replaced we need to make sure we securely erase all data that used to be on them. Unless you’re just going to crush your disks.

In this example we’ve got an aggregate of 14 disks (aggr0) that need to be wiped and removed from our NetApp so they can be replaced with new, much larger disks.

There are two methods that you can use to wipe disks using your NetApp. The first is to simply delete the aggregate they are a member of, turning them into spares and then running “disk zero spares” from the command line on your NetApp. This only does a single pass and only zero’s the disks. There are arguments I’ve seen where some people say this is enough. I honestly don’t know and we have a requirement to do a 7 pass wipe in our enterprise. You could run the zero command 7 times but I don’t imagine that would be as effective as option number two. The second option is to run the ‘disk sanitize’ command which allows you to specify which disks you want to erase and how many passes to perform. This is what we’re going to use.

The first thing you’ll need to do is get a license for your NetApp to enable the ‘disk sanitize’. It’s a free license (so I’ve been told) and you can contact your sales rep to get one. We got ours for free and I’ve seen forum posts from other NetApp owners saying the same thing.

There is a downside to installing the disk sanitization license. Once it’s installed on a NetApp it cannot be removed. It also restricts the use of three commands once installed:

  • dd (to copy blocks of data)
  • dumpblock (to print dumps of disk blocks)
  • setflag wafl_metadata_visible (to allow access to internal WAFL files)

There are also a few limitations regarding disk sanitization you should know about:

  • It is not supported in takeover mode for systems in an HA configuration. (If a storage system is disabled, it remains disabled during the disk sanitization process.)
  • It cannot be carried out on disks that were failed due to readability or writability problems.
  • It does not perform its formatting phase on ATA drives.
  • If you are using the random pattern, it cannot be performed on more than 100 disks at one time.
  • It is not supported on array LUNs.
  • It is not supported on SSDs.
  • If you sanitize both SES disks in the same ESH shelf at the same time, you see errors on the console about access to that shelf, and shelf warnings are not reported for the duration of the sanitization. However, data access to that shelf is not interrupted.
I’ve also read that you shouldn’t sanitize more then 6 disks at once. I’m going to sanitize our disks in batches of 5, 5 and 4 (14 total). I’ve also read you do not want to sanitize disks across shelves at the same time.

Licensing disk sanitization

Once you’ve got your license you’ll need to install it. Login to your NetApp via SSH and run the following:

netapp> license add <DISK SANTIZATION LICENSE>

You will not be able to remove this license, are you sure you
wish to continue? [no] yes
A disk_sanitization site license has been installed.
        Disk Sanitization enabled.

Thu Apr 19 10:00:28 PDT [rc:notice]: disk_sanitization licensed

Sanitizing your disks

1. Identify what disks you want to sanitize

netapp> sysconfig -r

Aggregate aggr0 (online, raid_dp) (block checksums)
  Plex /aggr0/plex0 (online, normal, active)
    RAID group /aggr0/plex0/rg0 (normal)

      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------  ------------- ---- ---- ---- ----- --------------    --------------
      dparity   0a.16   0a    1   0   FC:A   -  ATA   7200 211377/432901760  211921/434014304
      parity    0a.17   0a    1   1   FC:A   -  ATA   7200 211377/432901760  211921/434014304
      data      0a.18   0a    1   2   FC:A   -  ATA   7200 211377/432901760  211921/434014304
      data      0a.19   0a    1   3   FC:A   -  ATA   7200 211377/432901760  211921/434014304
      data      0a.20   0a    1   4   FC:A   -  ATA   7200 211377/432901760  211921/434014304
      data      0a.21   0a    1   5   FC:A   -  ATA   7200 211377/432901760  211921/434014304
      data      0a.22   0a    1   6   FC:A   -  ATA   7200 211377/432901760  211921/434014304
      data      0a.23   0a    1   7   FC:A   -  ATA   7200 211377/432901760  211921/434014304
      data      0a.24   0a    1   8   FC:A   -  ATA   7200 211377/432901760  211921/434014304
      data      0a.25   0a    1   9   FC:A   -  ATA   7200 211377/432901760  211921/434014304
      data      0a.26   0a    1   10  FC:A   -  ATA   7200 211377/432901760  211921/434014304
      data      0a.29   0a    1   13  FC:A   -  ATA   7200 211377/432901760  211921/434014304
      data      0a.28   0a    1   12  FC:A   -  ATA   7200 211377/432901760  211921/434014304

Spare disks

RAID Disk       Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------  ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare           0a.27   0a    1   11  FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)

Here I’ve got 13 disks in aggr0 and the 14th acting as a spare. I need to delete aggr0 to free up the disks to be sanitized.

2. Delete the aggregate the disks are part of

netapp> aggr offline aggr0
Aggregate 'aggr0' is now offline.

netapp> aggr destroy aggr0
Are you sure you want to destroy this aggregate? yes
Aggregate 'aggr0' destroyed.

3. Verify all the disks you want to sanitize are now spares

netapp> sysconfig -r

Spare disks

RAID Disk       Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------  ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare           0a.16   0a    1   0   FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0a.17   0a    1   1   FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0a.18   0a    1   2   FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0a.19   0a    1   3   FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0a.20   0a    1   4   FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0a.21   0a    1   5   FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0a.22   0a    1   6   FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0a.23   0a    1   7   FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0a.24   0a    1   8   FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0a.25   0a    1   9   FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0a.26   0a    1   10  FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0a.27   0a    1   11  FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0a.28   0a    1   12  FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0a.29   0a    1   13  FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)

4. Sanitize the first batch of disks (7 passes)

netapp> disk sanitize start -c 7 0a.16 0a.17 0a.18 0a.19 0a.20

WARNING:  The sanitization process may include a disk format.
If the system is power cycled or rebooted during a disk format
the disk may become unreadable. The process will attempt to
restart the format after 10 minutes.

The time required for the sanitization process may be quite long
depending on the size of the disk and the number of patterns and
cycles specified.
Do you want to continue (y/n)? y

The disk sanitization process has been initiated.  You will be notified via the system log when it is complete.
Thu Apr 19 11:10:41 PDT [disk.failmsg:error]: Disk 0a.20 (XXXXXXXX): message received.
Thu Apr 19 11:10:41 PDT [disk.failmsg:error]: Disk 0a.19 (XXXXXXXX): message received.
Thu Apr 19 11:10:41 PDT [disk.failmsg:error]: Disk 0a.18 (XXXXXXXX): message received.
Thu Apr 19 11:10:41 PDT [disk.failmsg:error]: Disk 0a.17 (XXXXXXXX): message received.
Thu Apr 19 11:10:41 PDT [disk.failmsg:error]: Disk 0a.16 (XXXXXXXX): message received.
Thu Apr 19 11:10:41 PDT [raid.disk.unload.done:info]: Unload of Disk 0a.20 Shelf 1 Bay 4 [NETAPP   X262_SGLXY250SSX AQNZ] S/N [XXXXXXXX] has completed successfully
Thu Apr 19 11:10:41 PDT [raid.disk.unload.done:info]: Unload of Disk 0a.19 Shelf 1 Bay 3 [NETAPP   X262_SGLXY250SSX AQNZ] S/N [XXXXXXXX] has completed successfully
Thu Apr 19 11:10:41 PDT [raid.disk.unload.done:info]: Unload of Disk 0a.18 Shelf 1 Bay 2 [NETAPP   X262_SGLXY250SSX AQNZ] S/N [XXXXXXXX] has completed successfully
Thu Apr 19 11:10:41 PDT [raid.disk.unload.done:info]: Unload of Disk 0a.17 Shelf 1 Bay 1 [NETAPP   X262_SGLXY250SSX AQNZ] S/N [XXXXXXXX] has completed successfully
Thu Apr 19 11:10:41 PDT [raid.disk.unload.done:info]: Unload of Disk 0a.16 Shelf 1 Bay 0 [NETAPP   X262_SGLXY250SSX AQNZ] S/N [XXXXXXXX] has completed successfully

You can periodically check the status of the sanitization by running:

netapp> disk sanitize status
sanitization for 0a.16 is 2 % complete
sanitization for 0a.18 is 2 % complete
sanitization for 0a.19 is 2 % complete
sanitization for 0a.17 is 2 % complete
sanitization for 0a.20 is 2 % complete

When the disks have been sanitized if you want to re-use them instead of replace them run this command:

netapp> disk sanitize release disk_list

# Example
netapp> disk sanitize release 0a.16 0a.17 0a.18 0a.19 0a.20

This will add the sanitized disks to the spare pool.

There are a few options you can customize when ‘disk santize’ command.

disk sanitize start [-p pattern1|-r [-p pattern2|-r [-p pattern3|-r]]] [-c cycle_count] disk_list

-p pattern1 -p pattern2 -p pattern3 specifies a cycle of one to three user-defined hex byte overwrite patterns that can be applied in succession to the disks being sanitized. The default pattern is three passes, using 0x55 for the first pass, 0xaa for the second pass, and 0x3c for the third pass.

-r replaces a patterned overwrite with a random overwrite for any or all of the passes.

-c cycle_count specifies the number of times the specified overwrite patterns will be applied. The default value is one cycle. The maximum value is seven cycles.

disk_list specifies a space-separated list of the IDs of the spare disks to be sanitized.

References (NetApp login require)

22 thoughts on “How to securely erase your data on a NetApp”

  1. Thanks! This is the only documentation I found that included both:

    – the requirement to make disks spares

    – the commands to make disks spares

    Reply
  2. Great Post….

    Do you have supporting documentation going into the restrictions once its enabled…also I’ve seen some comments stating maximum number would be ((# of disks – # of shelves) – # disks in root volume) and Both SES (SCSI Enclosure Services) drives of a shelf cannot be sanitized at the same time. The command will allow one SES drive to be sanitized at a time.  You have to make a second pass to get the second SES drive.

    Can you maybe share your experience on those comments above?

    Thanks,

    B

    Reply
    • Unfortunately when I was looking into this I couldn’t find much about the restrictions once this was enabled.

      I haven’t worked with our NetApp in quite a while since I was re-assigned ~2 years ago so I can’t offer you much more information than what is in the blog post.

      If you’re concerned about the limitations give NetApp support a call. I’m sure they can tell you.

      Reply
  3. When i try to make aggregate offline it gives me the following error:

    aggr offline: Cannot offline aggregate ‘aggr0’ because it contains
    one or more flexible volumes.

     

    do you know a work around for this?

    Reply
  4. Aggr has vol0, which contains the root:

    filer*> vol status
    Volume State Status Options
    vol0 online raid_dp, flex root, create_ucode=on,
    sis maxdirsize=28835

    Filer*> vol offline vol0
    vol offline: Offlining root volume ‘vol0’ is not allowed.

    Any suggestions would be greatly appreciated.

    Reply
    • You can’t offline vol0 since that’s the root volume with the OS on it.

      You’ll have to Snapmirror it to another aggregate of leave it alone.

      Reply
      • Ended up wiping the filers ( didn’t care if we lost the data on it).

        Your doc work once we did that.

        Thanks for posting the info.

        Reply
      • Hello. We must delete all data, including those on the aggregate 0 that contains vol0. How to do ? Best regards

        Reply
        • I think aggr0 and vol0 need to be around for the wipe commands to work.

          You might just have to physically destroy those drives if it won’t let you wipe them.

          You could try creating a new aggr1 with a default install of the OS on a fresh vol0 and then from there wipe aggr0 and the old vol0. Just make sure when you build the new vol0 you don’t re-use any sensitive data like passwords and such since you won’t be able to securely wipe it.

          Reply
  5. Hi, can you post how long it took for the process to complete for your disk types and command used?  was it hours/days, etc.  Thanks.

    Reply
    • I’ve done SAS (250gb) and SATA (don’t remember the size) on a FAS2020.

      Didn’t time it so I can’t say how long it took.

      Commands I used are documented in the article.

      Reply
  6. What tool has been used to connect with netapp filer. I could see the output in different color and its nice looking.

    Reply
    • That’s actually just the syntax highlighter I use for this site. It’s not the actual output of the NetAppl’s CLI. Sure would be awesome if it was though.

      Reply
  7. I am a newbie to NetApp. I have a NetApp DS2246 NAJ-1001. Is there a way I can connect this enclosure to my PC and run KillDisk to sanitze hard disks?

    Reply
  8. Hello Eric,

    Excellent post. Are you able to source any details with regards to what happens to “Failed Disks” for data protection and compliance. typically the failed drive will evacute the data over to the parity drives thus staying online etc and then the disk gets replaced but what actually happens to the failed drive itself – is it possible to sanitise this? although for obvious reason and as you rightly said – this may no be possible due to the reason the disk might of failed in the first place.

    Thus is the only real secure way to dispose of the data on the disks is to retain the disks (RMA) and then dispose of them yourself or via 3rd party. I wasn’t sure if there was any documentation to provide reasurance that due to the way WAFL works across disk and maybe some sort of background process that back be trigger (i.e. sanitiser) whether this would meet some compliance needs.

    Many thanks in advance,

    Adrian

    Reply
    • You can always run a magnet over the disk before returning it to NetApp.

      Once the disk has failed I don’t think you can do much with it within the NetApp. They are just SAS/SATA disks. You could remove them from the caddie and dban them or something, return them to the caddie and then hope NetApp doesn’t notice when you ship the disk back.

      I’ve done some storage RFPs as part of my job and NetApp says they securely destroy the disks when they get them back from a RMA. Also depending on your service contract you can opt to keep the disks and then destroy them yourself.

      Reply
  9. Hello.
    I have a problem with my AFF 8080 System. I need to prep it for decommission and to erase all data zero in all Drives. Problem is that I have thousands of LUNs on two vServers and Thousands of Volumes. How can I put them offline and delete them faster? Manual LUN Path for “offline” or “destroy” is out of discussion here. It could take days, weeks.

    Reply
    • I suspect you can do “lun delete -vserver *” and then the same thing again with “vol offline -vserver *” and “vol destroy”.

      If not you can certainly do “lun show” and “vol show” and then copy/paste that output into a text editor or excel and with a little magic add the commands you need around the names and then copy/paste it back into the CLI.

      Reply

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.