Replica Rebuilding
When Longhorn detects a failed or deleted replica, it automatically initiates a rebuilding process. This document outlines the replica rebuilding workflow for v1 data engine, including full, delta, and fast rebuilding methods. It also explains the limitations associated with each method.
Rebuilding will not start in the following scenarios:
Replica rebuilding may be triggered in the following scenarios for v1 data engine:
WO
(Write-Only) mode.If the replica is unrecoverable or has no existing data, Longhorn synchronizes all data from a healthy replica. It reconstructs the replica by transferring the full snapshot chain.
Full replica rebuilding consumes significant network bandwidth and results in heavy disk write operations on the target node. However, it is required when the target replica has no usable data.
Delta replica rebuilding is only for v1 data engine. It starts with a reusable failed replica, and it checks the data integrity for all snapshots’ data block by block.
This is available for failed replica reuse only, and there is an existing snapshot file (with the same name) in the failed replica data directory
When a snapshot has no checksum, Longhorn performs delta replica rebuilding for this snapshot instead.
Pros:
Cons:
Fast rebuilding is enabled when:
The fast replica building setting is enabled:
fast-replica-rebuild-enabled: true
Snapshot checksum files are created (the snapshot checksums are pre-computed) via one of the following methods:
snapshot-data-integrity
is set to enabled
: scheduled job calculates checksums for all snapshots at a configured interval (default: 7 days), orsnapshot-data-integrity-immediate-check-after-snapshot-creation
is set to true
: the snapshot checksum is calculated immediately after snapshot creation.Note: These checksum calculations consume storage and computing resources. The calculation time is unpredictable and may negatively impact the storage performance.
For more details, see Snapshot Data Integrity
Pros:
Cons:
For more details, see Fast Replica Rebuilding.
snapshot-data-integrity-immediate-check-after-snapshot-creation
or snapshot-data-integrity
, so checksums are precomputed.
Trade-off: Increases CPU, disk I/O, and storage usage during checksum computation.snapshot-data-integrity-immediate-check-after-snapshot-creation
to ensure the checksums are generated after purging.concurrent-replica-rebuild-per-node-limit
setting.auto-cleanup-system-generated-snapshot
setting is true
and no user-created snapshots exist, then when two replicas fail before either has been rebuilt, Longhorn must perform at least one full data transfer to restore volume health.auto-cleanup-system-generated-snapshot before doing maintenance
before performing maintenance.When a worker node with replicas is rebooted as part of a planned upgrade:
replica-replenishment-wait-interval
, Longhorn initiates a rebuild using the reusable failed replica.During the rebuilding process:
If a worker node is drained for short-term maintenance and then quickly restored:
replica-replenishment-wait-interval
expires, Longhorn attempts to reuse the failed replica..Setting | Default | Description |
---|---|---|
fast-replica-rebuild-enabled | true | Enables fast replica rebuilding. Relies on precomputed snapshot checksums. |
snapshot-data-integrity | fast-check | Hashes snapshot disk files only if they are unhashed or their modification time has changed. |
snapshot-data-integrity-cronjob | 0 0 */7 * * | Cron schedule to compute checksums for all snapshots (default: every 7 days). |
snapshot-data-integrity-immediate-check-after-snapshot-creation | false | If enabled, checksums are computed immediately after snapshot creation. |
replica-replenishment-wait-interval | 600 | Time in seconds to wait before creating a new replica, allowing reuse of failed replicas. |
concurrent-replica-rebuild-per-node-limit | 5 | Limits the number of concurrent replica rebuilds per node. |
offline-replica-rebuilding | false | Determines if degraded replicas are rebuilt while the volume is detached. |
enabled
disabled
enabled
0 0 */7 * *
snapshot-data-integrity
setting is enabled
, it defines when snapshot checksums are recalculated. Snapshots created within this interval may lack precomputed checksums.true
false
snapshot-data-integrity
enabled
. Delta rebuilding will be used if checksums are missing.600
seconds5
© 2019-2025 Longhorn Authors | Documentation Distributed under CC-BY-4.0
© 2025 The Linux Foundation. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our Trademark Usage page.