Physical backup and recovery architecture
cloudnative-mysql physical backups are Percona XtraBackup streams stored in an
S3-compatible object store. A Backup object is immutable once completed: it
selects a source instance, creates a worker Job, uploads a single xbstream
archive plus metadata, and records enough status for a future Cluster recovery.
PITR builds on this base-backup mechanism. This page focuses on the physical backup and restore-to-backup-point path.
Object store configuration
Backups need an S3-compatible object store on the source Cluster or directly on the Backup object:
spec:
backup:
objectStore:
bucket: cloudnative-mysql-backups
path: production
endpoint: http://minio.minio.svc:9000
region: us-east-1
forcePathStyle: true
credentials:
accessKeyId:
name: minio-creds
key: accessKey
secretAccessKey:
name: minio-creds
key: secretKey
When Backup.spec.objectStore is omitted, the backup inherits
Cluster.spec.backup.objectStore. The object store supports custom endpoints,
path-style addressing, signature v2 or v4, TLS CA bundles, insecure test
endpoints, server-side encryption, storage class, static credentials, and IAM
role inheritance.
Creating a backup
A one-shot Backup references its Cluster. Use the plugin for a quick on-demand backup with sensible defaults:
kubectl cloudnative-mysql backup cluster-sample
This creates a Backup object with xtrabackup method, prefer-standby
target, and online mode. The Backup reconciler runs the XtraBackup-to-object-store
data path.
You can also define a Backup by hand when you need more control:
apiVersion: mysql.cloudnative-mysql.io/v1alpha1
kind: Backup
metadata:
name: backup-sample
spec:
cluster:
name: cluster-sample
method: xtrabackup
target: prefer-standby
online: true
target: prefer-standby takes the backup from a healthy replica when one is
available and falls back to the primary. target: primary requires the current
primary.
The reconciler creates a Kubernetes Job owned by the Backup. The Job uses the same cloudnative-mysql instance image as the Cluster so the XtraBackup version matches the server version. Object-store credentials are mounted into the short-lived Job, not into the long-running database Pods.
Backup data path
The worker Job:
- resolves the selected source instance;
- connects to its instance-manager endpoint over mTLS;
- requests
GET /cluster/backup; - streams XtraBackup stdout directly to the object store;
- computes SHA256 while uploading;
- uploads
metadata.json; - exits and lets the controller mirror Job outcome into Backup status.
The backup stream itself is data and is not logged. Child process stderr is captured as structured logs.
Object layout
cloudnative-mysql uses deterministic, inspectable object keys:
<path>/<cluster>/<backup-name>/<backup-id>/backup.xbstream
<path>/<cluster>/<backup-name>/<backup-id>/metadata.json
metadata.json includes cluster identity, Backup identity, source instance,
method, backup ID, object key, timing, size, checksum, and GTID/binlog metadata
when available.
SHA256 in cloudnative-mysql metadata is the integrity source of truth. S3 ETag may be useful provider metadata, but it is not used as the backup checksum.
Backup status
The Backup status reports:
phase:pending,running,completed, orfailed;instanceName: selected backup source;method;backupId;jobName;destinationPath;sha256;beginGTIDandendGTID;beginBinlogandendBinlog;startedAtandstoppedAt;error;conditions.
A completed Backup is not rerun. To take another backup, create another Backup
object or use a ScheduledBackup.
Restore from a backup
A new Cluster can recover from a completed Backup:
apiVersion: mysql.cloudnative-mysql.io/v1alpha1
kind: Cluster
metadata:
name: restored-cluster
spec:
instances: 3
imageName: ghcr.io/cloudnative-mysql/cloudnative-mysql-instance:8.4
storage:
size: 10Gi
bootstrap:
recovery:
backup:
name: backup-sample
backup:
objectStore:
bucket: cloudnative-mysql-backups
path: production
endpoint: http://minio.minio.svc:9000
credentials:
accessKeyId:
name: minio-creds
key: accessKey
secretAccessKey:
name: minio-creds
key: secretKey
The recovery init container downloads backup.xbstream, verifies the checksum,
extracts it, runs XtraBackup prepare, copy-backs into the data directory, and
starts the first instance as the recovered primary. Additional replicas clone
from that recovered primary through the normal replica join path.
Without recoveryTarget, the Cluster restores to the backup's consistent point.
With recoveryTarget, the PITR path replays archived binlogs after the base
backup restore.
Restore from raw object store (no Backup CR)
A Cluster can also recover directly from an object-store bucket without any
Backup object existing in the API server. This is the disaster-recovery path:
the source cluster's API server (and its Backup CRs) may be gone, GC'd by
retention, or in another cluster entirely. Recovery is driven entirely by the
objects already in S3.
Point bootstrap.recovery.source at an externalClusters entry. The entry
carries its own objectStore (bucket, path, credentials) and its name is the
S3 key prefix the backups were stored under. No source Cluster or Backup CR
needs to exist anywhere.
apiVersion: mysql.cloudnative-mysql.io/v1alpha1
kind: Cluster
metadata:
name: recovered-cluster
spec:
instances: 3
imageName: ghcr.io/cloudnative-mysql/cloudnative-mysql-instance:8.4
storage:
size: 10Gi
bootstrap:
recovery:
source: prod-cluster # externalClusters entry name = S3 key prefix
backupID: "" # empty = latest; set to pin a specific backup
recoveryTarget: # optional PITR, identical to the Backup path
targetGTID: "uuid:1-99"
externalClusters:
- name: prod-cluster
objectStore:
bucket: cloudnative-mysql-backups
path: production
endpoint: http://minio.minio.svc:9000
credentials:
accessKeyId:
name: minio-creds
key: accessKey
secretAccessKey:
name: minio-creds
key: secretKey
The operator lists the base backups under the prefix, selects the latest
completed one (or the entry matching backupID when set), derives the archive
and metadata keys, and restores exactly as the Backup-based path does. source
and backup are mutually exclusive. PITR with recoveryTarget works
identically: the binlog archive is resolved from the same object store under the
source name.
Failure surfaces
Common failure points are reported through Backup phase, Backup conditions, Job status, and Events:
- missing referenced Cluster;
- missing object-store configuration;
- unsupported backup method;
- no healthy source instance;
- mTLS connection failure to the source instance manager;
- XtraBackup failure;
- object-store upload/download failure;
- checksum mismatch;
- failed restore prepare or copy-back;
- raw-S3 recovery:
sourcedoes not name anexternalClustersentry; - raw-S3 recovery: the referenced external cluster entry has no
objectStore; - raw-S3 recovery: no base backups found under the source prefix;
- raw-S3 recovery: the requested
backupIDis not present in the object store.
The controller-manager never handles backup payload bytes. Large data movement stays in Jobs and init containers so retries are isolated and observable through Kubernetes primitives.
Operational notes
- Keep completed Backup objects for as long as recovery clusters may reference them.
- Preserve both
backup.xbstreamandmetadata.json; recovery needs metadata as well as bytes. - Use the same major-version-compatible cloudnative-mysql instance image for recovery.
- Prefer standby backups for large clusters when replicas can absorb the backup load.
- Object-store retention and immutability are external responsibilities until cloudnative-mysql retention GC is implemented.
- Physical backup alone restores only to the backup consistency point. Enable continuous archiving when PITR is required.
Verification coverage
Unit tests cover object-store key construction, checksum helpers, backup target selection, Job rendering, status transitions, and restore command behavior. The Kind + MinIO e2e suite validates backup upload, metadata writing, restore into a new Cluster, and recovered data correctness.