0

I recently detected failing hard-drive in my ZFS raid-5 array. So i bought drive, shut-down & replaced failing one. I'm afraid I should have removed failing drive from the pool first. It is causing big troubles right now...

      pool: maxtorage
     state: DEGRADED
    status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
    action: Replace the device using 'zpool replace'.
       see: http://zfsonlinux.org/msg/ZFS-8000-4J
      scan: scrub repaired 0B in 7h10m with 0 errors on Sun Jul 11 07:34:21 2021
    config:
    
    NAME                     STATE     READ WRITE CKSUM
    maxtorage                DEGRADED     0     0     0
      raidz1-0               DEGRADED     0     0     0
        sdd                  ONLINE       0     0     0
        sdf                  ONLINE       0     0     0
        3022016455510322769  UNAVAIL      0     0     0  was /dev/sda1
        sde                  ONLINE       0     0     0
    cache
      sdg                    ONLINE       0     0     0
    
    errors: No known data errors

If i try to replace:

$ sudo zpool replace maxtorage 3022016455510322769 /dev/sdc
invalid vdev specification
use '-f' to override the following errors:
/dev/sdc1 is part of active pool 'maxtorage'

zpool labelclear -f /dev/sdc1 does nothing to my situation

When trying to remove sdc(1)

$ sudo zpool remove maxtorage /dev/sdc1
cannot remove /dev/sdc1: no such device in pool
$ sudo zpool remove maxtorage /dev/sdc
cannot remove /dev/sdc: no such device in pool
$ 

I'm stuck right now, not sure, how to fix my pool. Anyone have some tip for me?

ZFS documentation says (zfsonlinux.org/msg/ZFS-8000-4J):

If the device has been replaced by another disk in the same physical slot, then the device can be replaced using a single argument to the 'zpool replace' command:

zpool replace test c0t0d1

ZFS will begin migrating data to the new device as soon as the replace is issued. Once the resilvering completes, the original device (if different from the replacement) will be removed, and the pool will be restored to the ONLINE state.

$ sudo zpool replace maxtorage /dev/sda
cannot open '/dev/sda': Médium nebylo nalezeno
internal error: Médium nebylo nalezeno
Neúspěšně ukončen (SIGABRT)

(Drive not found, Failed execution)

czechDude
  • 11
  • 2

1 Answers1

0

So i figured eventually. Main issue was using drive letters (sda, sdb...) in my pool. These (can) change anytime you plug-in another drive. This helped me a lot https://serverfault.com/a/953026

#zpool export maxtorage

#zpool import -d /dev/disk/by-id maxtorage

edited /etc/default/zfs to use /dev/disk/by-id

ls -la /dev/disk/by-id/

Found id of new drive wwn-0x5000c500c8599b96

#zpool export maxtorage

Had to force (-f) label clear as it was saying new drive might be part of the pool

#zpool labelclear -f wwn-0x5000c500c8599b96
#zpool import maxtorage
#zpool replace maxtorage 3022016455510322769 wwn-0x5000c500c8599b96

Result:

(notice drives are no longer referenced as sd[a-z])

  pool: maxtorage
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Sep 14 20:22:45 2021
5,78T scanned out of 6,91T at 116M/s, 2h49m to go
1,45T resilvered, 83,75% done
config:

NAME                          STATE     READ WRITE CKSUM
maxtorage                     DEGRADED     0     0     0
  raidz1-0                    DEGRADED     0     0     0
    wwn-0x5000c500a82110c8    ONLINE       0     0     0
    wwn-0x5000c500a81d9855    ONLINE       0     0     0
    replacing-2               DEGRADED     0     0     0
      3022016455510322769     UNAVAIL      0     0     0  was /dev/sda1
      wwn-0x5000c500c8599b96  ONLINE       0     0     0  (resilvering)
    wwn-0x5000c500a81da899    ONLINE       0     0     0
cache
  sdg                         ONLINE       0     0     0

errors: No known data errors
czechDude
  • 11
  • 2