ZFS arrays do not support what you may know as RAID level expansion.
There are two ways to increase the capacity of a ZFS pool… either add more disks to the pool (e.g. 3 more disks in RAIDZ1), or replace all the existing disks with larger ones… this is the method discussed here today. These instructions assume that you have followed my installation guide for ZFS. If you have varied from that guide at all, you may need to vary the instructions below. I am not responsible for any data loss by following any of these instructions!
NOTE: you can only increase the size of a mirror or raidz1/2/3 pool using this method.
I am replacing the 4 x 3TB disks in my storage array for 4 x 4TB disks. This is a time consuming process and is risky if using mirror/RAIDZ1 (as you have to degrade the array !) – if you do not have full backups of the contents, do so at your own risk. (if you’re using RAIDZ2 then you’re just at reduced resilience and a little safer)
First, we want to make sure that the autoexpand option is enabled, this can be run at any time with the following command:
zpool set autoexpand=on zroot
Next, check the status of your ZFS pool to make sure it is healthy… here’s the command and the output from my array:
zpool status
pool: zroot
state: ONLINE
scan: scrub repaired 0 in 6h38m with 0 errors on Thu Nov 8 16:06:21 2012
config:NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada0p2 ONLINE 0 0 0
ada1p2 ONLINE 0 0 0
ada2p2 ONLINE 0 0 0
ada3p2 ONLINE 0 0 0errors: No known data errors
As you can see, my array consists of 4 members (ada0p2 through ada3p2) and is currently healthy. We’re good to proceed!
First we shutdown the machine, and replace one of the disks… I prefer to start with the last disk and work backwards so i’m going to replaceada3… Once replaced, start the machine up again.
Now we can confirm that the disk is missing (and confirm which one) as follows… (command and output listed):
zpool status
pool: zroot
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using ‘zpool online’ or replace the device with
‘zpool replace’.
scan: scrub repaired 0 in 6h38m with 0 errors on Thu Nov 8 16:06:21 2012
config:NAME STATE READ WRITE CKSUM
zroot DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ada0p2 ONLINE 0 0 0
ada1p2 ONLINE 0 0 0
ada2p2 ONLINE 0 0 0
5075744959138230672 REMOVED 0 0 0 was /dev/ada3p2errors: No known data errors
You can see that my ada3p2 device is now missing, and the array is degraded (it will run slower while degraded, but no data loss unless another disk fails during this long process)
Now we need to partition the newly installed ada3 disk so that it is bootable and contains a large ZFS partition for us to use… commands as follows:
gpart create -s gpt ada3
gpart add -s 128 -t freebsd-boot ada3
gpart add -t freebsd-zfs -l disk3 ada3
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada3
The above creates a GPT partition table, adds a small boot loader parition and the remainder of the disk for ZFS. It then installs the boot loader into the small partition.
We are now ready to re-add the disk into the ZFS pool. This will trigger an auto-resilver of the disks (a rebuild of the disk)…
zpool replace zroot ada3p2 /dev/ada3p2
This command takes a little while to process, so be patient. The resilver stage can take a long time (it depends how much data you have on the pool, how many disks are in it and how fast you can read from them!)
You can check on the status of the rebuild with the following command:
zpool status zroot
Here’s an example output so you know what to look for:
pool: zroot
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Nov 14 18:34:57 2012
16.8G scanned out of 6.59T at 116M/s, 16h30m to go
4.19G resilvered, 0.25% done
config:NAME STATE READ WRITE CKSUM
zroot DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ada0p2 ONLINE 0 0 0
ada1p2 ONLINE 0 0 0
ada2p2 ONLINE 0 0 0
replacing-3 REMOVED 0 0 0
5075744959138230672 REMOVED 0 0 0 was /dev/ada3p2
ada3p2 ONLINE 0 0 0 (resilvering)errors: No known data errors
Once the disk has been fully reconstructed, the array will be healthy again (like at the start), and you can move onto the next disk. Repeat until all disks have been replaced and resilvered.
You will only see the new space once all the disks have finished resilvering.
I will note again that your array is vulnerable if a mirror or raidz1 configuration while doing this. If a 2nd disk fails during the resilver of any of the disks and you’re doing a mirror or raidz1 pool, you will LOSE your data.
What’s going to happen if we apply this method to a GELI encrypted pool?
you’d have to encrypt the new disk and then add it into the pool – quite a bit more complicated 🙁
Pingback: The time has finally come to rebuild my home server | The Lone C++ Coder's Blog
This is a daft way to increase the size of the pool for a person who does indeed have a pool backup or chassis space for an addtional 4x4TB pool.
For those with backups, export the 4x3TB pool, remove the 3TB disks, fit the 4TB disks, setup a new 4x4TB pool and restore your data from backup. If something goes wrong with the restore then you still have your 4x3TB pool which can be imported.
If chassis space permits then fit the 4x4TB disks, create your new pool and just copy straight from the 4x3TB pool. Even quicker lower risk.
Matthew, sometimes you need to call a spade a spade and you are a class A arrogant moron. The whole point of this blog post is we don’t have magnetic tape backups and spare drives lying around for an entire new storage pool, if we did then practically all logistical problems turn into ‘backup and restore’, as your brainless reply suggested.
Hey there,
just a simple question..
what happen if you replace just one disk..
does ZFS pool increase or not?
cause my servers are a bit far away so.. i like som time replace my 3TB disks with 6tb disk in a while..
thanks fo rreply!
You don’t get any additional storage until you complete the entire process, so replacing a single drive won’t increase available space.
When you replace the final drive (and finish resilvering), the extra space will appear.
Similar to comment above, how can I follow this procedure for an encrypted zpool?
(in summary) For an encrypted zpool, you would need to prepare the replacement disk in the same way you did when creating the pool originally (using gpart to create all the partitions, and using geli to encrypt and attach the partition… take care NOT to create a new encryption.key file or you may lose all your data on all disks!) – then you would use the zpool replace with the encrypted device name (ending .eli) to ensure that it remained encrypted.
As usual… i’m not responsible for any data loss… make sure you have full backups etc etc etc.!
Hi Dan, appreciate this very much. One question though: why are you making each disk bootable?
If the first disk fails, the system will need to boot from the 2nd disk etc.
Making them all bootable means the BIOS may boot from any disk in the set.
what is the reason to put the GPT bootloader on each of the drives?
If you only have the bootloader on the first drive, you wouldn’t be able to boot if it failed…
By having it on multiple drives, you can boot from any of them (so long as you have sufficient pool members available of course!)
You don’t strictly need it on every drive, but it makes administration a lot easier if they are all identical 🙂
As an aside why are you booting off of your pool? You should be doing that off a USB stick or something similar.
Second question. I have done the above for all 16 drives of a pool and the end result is no increase in size. Is there something weird about FreeNAS that doesn’t allow this?
I trust my multiple SATA drives far more than any USB stick… so I boot from the pool.
(if you’re running FreeNAS then you can’t boot from any data drive, so generally people boot from USB stick)
I’ve not used FreeNAS in many years, certainly before they had ZFS available.
FreeNAS may not set the auto expand option on which would prevent you from seeing an increase in size.
You might find a reboot will give the extra space, but often it doesn’t.
If not, you can try “zpool online -e POOL DEV1 DEV2…” where POOL is your zpool name, and DEV1 DEV2 etc are ALL of your devices (from a zpool status) – the “-e” flag tells it to expand into any remaining space.
You don’t need to take the pool offline to use the online command.
Note: The above info is from man pages, and i’ve never tried it personally. As always, ensure you have backups etc etc etc!
Can’t this be done without degrading the pool? If I’m not mistaken, you could simply connect another disk, and then run the replace command with the old one still connected. ZFS will detach the old disk after the replacement is complete. Also, as data in this scenario could be copied from the drive you are replacing, I believe the resilvering process should go significantly faster as well. If you lack the permanent space for a fifth disk in the box, a cheap pcie sata card could be used just during the replacement of the disk. You mention the risk with a degraded pool, and it is a risk I feel it would be better to avoid it altogether. Watching hat resilvering chugging on for over 30 hours when you know a single other disk failure will eat your data is pretty scary.
A great suggestion, and yes – that would work fine!
Most people cram their devices full of disks to begin with, which is why it suggests replacing each disk in the pool instead.
Certainly, if you have the ability to add further drives then go for it!
Of course, you could also use an array type (raidz2/3) that allowed for multiple drive failures too.
Pingback: The time has finally come to rebuild my home server - The Lone C++ Coder's Blog
Hey Dan,
Great article, I have a pool of 6 mirrored (raid1 drives) (12 physical drives).
If i replaced both disks of a single mirror pair with larger drives using “replace”, could I then get the added space for only the single upgraded mirror by running the commend ““zpool online -e POOL DEV1 DEV2”. The rest of the mirrors in my pool would still continue to be the original lower capacity drives. I expect it would work, but was just curious if there was something I was missing. It sure would be nice to replace drives slowly as required at the best possible price point rather than doubling my pool size every 4 years and replacing all the disks.
Thanks!
The pool is exactly that, a pool of resources… so long as you upgrade each member of part of the pool (e.g. if you have a pool full of mirrors then you just have to increase the capacity of one set of mirrors) then you get the extra space in the pool. In a ‘zpool status’ output, you can see each part of the pool as it is indented separately.
Assuming your pool is 12 disks arranged as 6 pairs then each time you upgrade a pair of disks that are listed as mirror you will get the increase in the pool.
(hope that makes sense!)
Perfectly clear. I suspected as much but it’s good to know for certain as I’m broaching on 85% full and needing to grow a little, but don’t really feel like shelling out for all new drives yet. Thanks again for the super fast reply too!
If I have a pool of 24 disks that is a stripe over a few raidz2, will this work?
NAME STATE READ WRITE CKSUM
minus_tmp ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdh ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
sda ONLINE 0 0 0
sdg ONLINE 0 0 0
sdi ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
sdl ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
sdm ONLINE 0 0 0
sdn ONLINE 0 0 0
sdo ONLINE 0 0 0
sdp ONLINE 0 0 0
sdq ONLINE 0 0 0
sdr ONLINE 0 0 0
raidz2-3 ONLINE 0 0 0
sds ONLINE 0 0 0
sdt ONLINE 0 0 0
sdu ONLINE 0 0 0
sdv ONLINE 0 0 0
sdw ONLINE 0 0 0
sdx ONLINE 0 0 0
I should have read all the comments. The answer is there! It should work.
Yea it should work… it’ll take a long time with all those drives to resilver though!
As each group of raidz2 disks are increased, you should see the increased free space.
Dan, I was wondering if you could help me. A friend of mine helped me build a server. He’s the brains behind it, and I just like building PCs. I’d never used Solaris before but he suggested building a raidz1 set of arrays using Solaris, and I usually get access to the server via napp-it on my PC. As I started running out of space on the drives he suggested that I replace the drives one-by-one in a single raidz1. I’ve done that: replaced, one at a time, each 2TB HDD with a 4TB HDD, resilvered it (took about 8 hrs), then moved on to the next drive. This particular raidz had 5x 2TB HDDs but now has 5x 4TB HDDs. The problem is, the free space has not appeared. My friend is now incommunicado so I haven’t been able to figure out the correct way for the array to see the increased size.
Any suggestions? Thanks in advance.
Hi Dan – I actually figured it out after re-reading your article. I used the “zpool set autoexpand=on zroot” command and voila!, the space appeared.
Thanks for the article.
I’m glad it worked for you. The autoexpand is disabled by default 🙂
H Dan.
I do not have much experience with ZFS so I wanted to ask one question.
Could there be any problems when i replace a failing drive in a raidz1 group with a bigger drive (1000GB) and leave the rest of the group lower capacity (500GB)?
I am not looking for a capacity increase yet just want the pool to stay safe
Thank you very much.
There’s no issue with using a larger drive (other than you will not get the extra space in the array) – it will resilver onto it without a problem and your resilience will be restored.