Operations runbooks
cloudnative-mysql ships a kubectl plugin, kubectl-cloudnative-mysql, that wraps common day-two
operations. Install it once:
make install-plugin
Most commands accept an optional CLUSTER argument. When you omit it, the
plugin picks the only cluster in the current namespace and warns if there are
several.
Commands in this guide use cluster-sample as the Cluster name.
Inspect cluster state
kubectl cloudnative-mysql status
kubectl cloudnative-mysql status cluster-sample
Add -w or --watch to refresh every 2s, like watch(1):
kubectl cloudnative-mysql status -w
kubectl cloudnative-mysql status -w --watch-interval=5s
The status command shows instance topology, phase, conditions, and health. For
raw Kubernetes output, kubectl describe cluster and kubectl get events still
work and give you more detail when you need it.
Key status fields on the Cluster resource:
status.readyInstancesstatus.currentPrimarystatus.targetPrimarystatus.gtidExecutedByInstancestatus.divergedInstancesstatus.continuousArchivingstatus.phaseandstatus.phaseReason
Stream logs
kubectl cloudnative-mysql logs cluster-sample # all instances, merged with a prefix
kubectl cloudnative-mysql logs cluster-sample cluster-sample-2 # single instance
Scale up
kubectl patch cluster cluster-sample --type merge -p '{"spec":{"instances":4}}'
kubectl wait --for=condition=Ready cluster/cluster-sample --timeout=15m
Scale-up is ordered. cloudnative-mysql creates one replica at a time and waits for it to be healthy before creating the next one.
Scale down
kubectl patch cluster cluster-sample --type merge -p '{"spec":{"instances":1}}'
Scale-down removes highest-ordinal replicas first. cloudnative-mysql deletes replica Pods but retains PVCs. It never scales below one instance and does not remove the current primary during normal scale-down.
List retained PVCs:
kubectl get pvc -l mysql.cloudnative-mysql.io/cluster=cluster-sample
Delete retained PVCs only after confirming the data is no longer needed.
Planned switchover
cloudnative-mysql follows the CNPG-style status transition model. A planned switchover promotes a named healthy replica. Use the plugin:
kubectl cloudnative-mysql promote cluster-sample cluster-sample-2
Watch progress:
kubectl cloudnative-mysql status -w
The operator validates the target, waits for GTID containment, bounds the
operation by spec.maxSwitchoverDelay, and lets the selected instance promote
itself. Role Services move after the database role is safe.
You can also trigger a switchover manually through the subresource:
kubectl patch cluster cluster-sample --subresource=status --type merge \
-p '{"status":{"targetPrimary":"cluster-sample-2"}}'
Fence an instance
Fencing takes an instance out of service without deleting it or its data. The Pod stays, the PVC stays, but the instance drops out of all routing Services and is held read only:
kubectl cloudnative-mysql fence on cluster-sample cluster-sample-2
Unfence it to restore normal routing and role reconciliation:
kubectl cloudnative-mysql fence off cluster-sample cluster-sample-2
The operator tracks fenced instances in status.fencedInstances. A fenced
instance is skipped as a failover candidate. Fencing the primary stops writes
for the cluster because the rw Service has no endpoint. That is deliberate: use
fencing to freeze an instance for inspection or maintenance, not as a failover
trigger.
Automatic failover
Automatic failover is driven by primary health, Pod readiness, and GTID safety.
spec.failoverDelay controls how long cloudnative-mysql waits after detecting the
primary as failed. 0 means immediate failover.
spec:
failoverDelay: 30
During failover cloudnative-mysql:
- chooses a ready replica with healthy replication SQL state;
- checks that candidate GTID sets are comparable;
- fences the old primary Pod while retaining its PVC;
- sets
targetPrimaryto the safe candidate; - updates role labels and Services after promotion.
If GTID sets are divergent or no safe candidate exists, failover is blocked instead of risking data loss.
Former primary rejoin
A former primary that returns after failover starts read-only and follows the current primary if its GTID set is compatible.
If it contains errant transactions, cloudnative-mysql marks it diverged and keeps it out of service. Do not delete the retained PVC until you have decided whether manual recovery is required.
Check:
kubectl cloudnative-mysql status cluster-sample
Look for entries under divergedInstances.
Restart an instance
Restart all instances in a rolling fashion, or a single instance:
kubectl cloudnative-mysql restart cluster-sample # rolling restart
kubectl cloudnative-mysql restart cluster-sample cluster-sample-2 # single instance
The command prompts for confirmation. Skip the prompt with --yes or -y.
Every instance boots read only. The in-pod role reconciler observes Cluster status and only clears read-only mode when the instance is the confirmed primary.
Destroy an instance
Delete a single instance Pod and its PVC:
kubectl cloudnative-mysql destroy cluster-sample cluster-sample-3
This command also prompts for confirmation. Use it to clean up a failed or diverged instance you have decided to discard. The remaining instances keep running unaffected.
Reload MySQL parameters
After you change spec.mysql.parameters, apply dynamic parameters without
restarting:
kubectl cloudnative-mysql reload cluster-sample
This connects to each instance over mTLS and issues the equivalent of reloading the running configuration. Parameters that require a restart are noted and need a follow-up rolling restart.
Update parameters:
kubectl patch cluster cluster-sample --type merge -p \
'{"spec":{"mysql":{"parameters":{"require_secure_transport":"ON"}}}}'
cloudnative-mysql owns replication, backup, PITR, identity, and lifecycle-critical settings. User parameters that conflict with managed keys are rejected by the configuration layer.
Take an on-demand backup
Instead of crafting a Backup YAML by hand, use the plugin:
kubectl cloudnative-mysql backup cluster-sample
This creates a Backup object with sensible defaults: xtrabackup method,
prefer-standby target, online mode. The Backup reconciler then runs the actual
XtraBackup job. Track it:
kubectl cloudnative-mysql status cluster-sample
kubectl get backup -l mysql.cloudnative-mysql.io/cluster=cluster-sample
For recurring backups, create a ScheduledBackup resource. See the Scheduled
Backups page for the schedule format and options.
Deleting the Backup Kubernetes object does not delete the remote object-store
artifacts today. Remote cleanup is a planned finalizer/retention feature.
User management
cloudnative-mysql manages MySQL users through the control-tier API, reached over mTLS port-forwarding inside the cluster:
kubectl cloudnative-mysql user create cluster-sample --name=app --password-stdin < secret.txt
kubectl cloudnative-mysql user alter cluster-sample --name=app # prompt for new password
kubectl cloudnative-mysql user list cluster-sample
kubectl cloudnative-mysql user drop cluster-sample --name=old-user
Passwords are never accepted as flags. Use --password-stdin for piping from a
secret, or let the plugin prompt on the terminal with echo disabled.
Users can be created with optional grants (--superuser), TLS requirements
(--require-x509), and named privileges.
Database management
Manage MySQL databases the same way:
kubectl cloudnative-mysql database create cluster-sample --name=analytics
kubectl cloudnative-mysql database list cluster-sample
kubectl cloudnative-mysql database drop cluster-sample --name=analytics
You can specify character set and collation on create:
kubectl cloudnative-mysql database create cluster-sample --name=utf8db --charset=utf8mb4 --collation=utf8mb4_unicode_ci
Node maintenance window
Toggle the maintenance window before draining a node or performing Kubernetes node maintenance:
kubectl cloudnative-mysql maintenance set cluster-sample
kubectl cloudnative-mysql maintenance unset cluster-sample
Use --reuse-pvc to retain the existing PVC across node restarts. This is
useful when the underlying storage is durable and you want to avoid a full clone.
Scrape Prometheus metrics
kubectl cloudnative-mysql metrics cluster-sample # primary
kubectl cloudnative-mysql metrics cluster-sample cluster-sample-2 # specific instance
kubectl cloudnative-mysql metrics -w --filter=mysql_global_status_threads # watch mode, filtered
Add -w for continuous refresh. Use --filter with a pattern to narrow the
output to matching metric names (grep-style substring match).
Continuous archiving operations
When continuous archiving is enabled, inspect:
kubectl cloudnative-mysql status cluster-sample
Look for continuousArchiving in the output. Growing pending files or a
degraded condition usually means an object-store, credential, network, or
throughput issue.
Safe maintenance habits
- Prefer planned switchover before node or primary maintenance.
- Keep at least three instances for meaningful automatic failover.
- Use semi-sync when acknowledged-write durability matters.
- Keep object-store lifecycle rules aligned with backup and PITR retention.
- Treat retained PVCs and remote backups as recovery assets.