Durability and Availability

Exaba is designed for greater than 12 nines of object durability: in a given year, the probability of losing any one object is less than one in a trillion. That is in line with the durability targets common to hyperscale object storage. This page explains how it is achieved, at the level of the mechanisms involved rather than as a sizing calculation.

How Exaba protects your data

Erasure coding (MARS). Every object is split into data and parity shards by Exaba’s Multiple Area Reed-Solomon parity engine and spread across drives and nodes, so it survives the loss of multiple drives or nodes and is reconstructed from the surviving shards. How much failure it tolerates is configurable (see Configurable resilience).
Fault-domain-aware placement. Shards of the same object are placed across independent fault domains (drive, node, rack) so that the failure of any single fault domain cannot exceed the erasure tolerance.
Online through failure. Within the cluster’s configured resilience, the cluster remains available with no data loss when a drive or node fails — surviving shards reconstruct affected objects on read. The failed drive or node can be evicted to rebuild full parity in the background. See Drive failure.
Durable before acknowledgement. A write is committed to durable, replicated staging before the client receives an acknowledgement.
End-to-end checksums. Data is checksummed end to end to detect and repair silent corruption (bit rot).

Configurable resilience

Durability is not fixed. You set resilience targets for a cluster, and the parity engine derives the erasure-coding layout that meets them while making the best use of usable capacity. You choose:

Node-failure tolerance: how many whole nodes can fail at once without data loss.
Drive-failure tolerance: how many additional drives can fail beyond that.

Higher resilience means more parity and slightly less usable capacity; lower resilience favours capacity. Exaba computes the optimal shard layout from these targets at cluster creation, and the scheme applies cluster-wide. Per-bucket policy controls versioning, Object Lock, and compression, not the erasure scheme.

Durability and availability are different

Durability is whether your data is safe. Erasure coding, fault-domain placement, and background rebuilds give Exaba its designed-for greater-than-12-nines durability.
Availability is whether your data is reachable. Exaba clusters are active/active: every node serves reads and writes and there is no single point of failure, so losing a drive, node, or rack reduces headroom rather than access.

Durability is not immutability

Durability and availability both concern hardware failure. Protecting data against ransomware, malicious deletion, or accidental overwrite is a separate property, immutability, provided by S3 Object Lock. See Features and Veeam Backup & Replication.

Nines are not the whole story

Durability figures expressed in “nines” assume random, independent drive failures. In practice, data loss is usually driven by correlated events: a firmware bug across a batch of drives, an enclosure or power-distribution failure, a rack outage, or human error. Erasure coding alone does not protect against these.

Exaba mitigates correlated failures with fault-domain-aware placement, so that a single enclosure, rack, or power feed cannot take out more shards than the scheme tolerates.

Sizing the scheme

The exact erasure-coding scheme and resilience targets are derived for each cluster from your durability and capacity goals, and set at cluster creation. See Hardware Sizing for the sizing process, or talk to Exaba.