Scaling Down a DxSqlAg Availability Group
This guide will walk you through the process of scaling down a SQL Server Availability Group cluster managed by DxOperator.
When scaling down, DxOperator prioritizes predictability, safety, and data preservation. In some cases, manual intervention may be required for scale-down tasks. This section explains details and provides guidance on how to approach common scenarios.
Prerequisites
- DxOperator installed in the Kubernetes cluster.
- DxSqlAg created
How to Scale Down
To remove replica(s) from a DxSqlAg Availability Group, reduce the quantity of synchronousReplicas, asynchronousReplicas, or configurationOnly replicas in the DxSqlAg custom resource. For example, suppose we originally deployed as below with synchronousReplicas: 3. Edit the file to reduce it to 2:
apiVersion: dh2i.com/v1
kind: DxSqlAg
metadata:
name: dxsqlag
spec:
sqlAgConfiguration:
synchronousReplicas: 2
asynchronousReplicas: 0
availabilityGroupName: AG1
# Other fields omitted for brevity, such as statefulSetSpec
statefulSetSpec:
...
Afterward, apply the new values using kubectl apply.
DxOperator will then set to work making the changes to the Availability Group, including removing one of the replicas.
Watching Progress
You can watch its progress in a number of ways:
- Connect to the DxEnterprise cluster using DxAdmin
- Pod removal can be observed by kubectl:
kubectl get pod -n default -w
Storage Cleanup
DxOperator will not automatically delete any of the persistent volume claims that were used by a replica pod after scale-down. If you no longer need any of the data, consider manually deleting the persistent volume claims. Make sure to fill in the ordinal of the pod that was removed above.
kubectl delete pvc/mssql-dxsqlag-<pod-ordinal> pvc/dxe-dxsqlag-<pod-ordinal>
Persistent volume deletion is irreversible. Ensure that you no longer have any use of the data on the persistent volume before deleting the PVCs.
Scale-Down Considerations
Pod Ordinals
The pods of a DxSqlAg Availability Group cluster are created in ascending order. For example, a DxSqlAg with synchronousReplicas: 3, would have three pods named dxsqlag-0, dxsqlag-1, and dxsqlag-2. When scaling down this cluster, DxOperator will remove pods starting at the highest number (ordinal). To scale down this cluster by one replica, the pod named dxsqlag-2 will be removed.
Primary Replica
DxOperator will never remove the primary replica when scaling down an Availability Group. If the highest ordinal pod, e.g. dxsqlag-2 in the previous section, is the current primary replica, DxOperator will suspend any actions that would cause it to be removed.
Prior to scale-down, ensure that none of the pod(s) to be removed are the current primary replica. If so, another replica must be manually promoted to the primary role. Typically this is accomplished using:
kubectl exec -it -c dxe -n <namespace> pod/<pod-name> dxcli vhost-start-node <vhostname> <new-primary-pod-name>
See also: documentation for vhost-start-node.
Health Checks
DxOperator will not scale down an Availability Group that has active health alerts. If a replica is disconnected or has unsynchronized databases, scale-down will be suspended until the conditions are resolved.
Prior to scale-down, ensure that all health alerts are resolved. To check health alerts:
kubectl exec -it -c dxe -n <namespace> pod/<pod-name> dxcli get-alerts
See also: documentation for get-alerts.
Additionally, DxOperator will not perform any scale-down operations while a pod is in the process of being configured. Typically this happens as part of an earlier creation or scale-up operation.
Availability Mode Changes
Scale-down can be done for a specific availability mode - it's possible to reduce synchronousReplicas or asynchronousReplicas independently. Scale-down also requires DxOperator to remove pods starting with the highest ordinal. In cases where the availability mode of the highest ordinal pod does not match the availability mode being scaled-down, DxOperator will reassign the availability mode of one of the remaining pods.
As an example, suppose we have an Availability Group with two synchronous replicas and one asynchronous replica:
| Pod | Availability Mode |
|---|---|
| dxsqlag-0 | SYNCHRONOUS_COMMIT |
| dxsqlag-1 | SYNCHRONOUS_COMMIT |
| dxsqlag-2 | ASYNCHRONOUS_COMMIT |
Then we reduce the synchronousReplicas count from 2 to 1. In this case, DxOperator would reassign dxsqlag-1 to be ASYNCHRONOUS_COMMIT, and then remove dxsqlag-2 from the Availability Group.
Automatic availability mode switching is enabled by default. It can be disabled for a specific DxSqlAg by changing the spec.sqlAgConfiguration.disableModeSwitching option. If this option is disabled, scale-down operations may be suspended until the availability mode of the highest ordinal pod is resolved.
Configuration-Only Replicas
Configuration-only replicas are a special availability mode in which no database contents are replicated. They can be used to maintain Availability Group quorum, which can enable automatic failover in Availability Groups with only two synchronous replicas.
DxOperator creates pods for configuration-only replicas in separate StatefulSet. When updating the configurationOnlyReplicas value in the DxSqlAg, only pods from this separate StatefulSet will be added or removed. The naming convention for the first configuration-only replica, following the prior examples, would be dxsqlag-cfg-0.
Storage Re-Use
As mentioned in the Storage Cleanup section, DxOperator will not delete the persistent volume claims of pods removed by the scale-down process.
In situations where an Availability Group is scaled down and subsequently scaled back up, and no actions are taken to remove the persistent volumes belonging to the once-removed pods, the newly scaled up pods will retain the storage of the previous pods, including any data that had been replicated to them.
If this behavior is not desired, ensure that persistent volume claims are deleted on scale-down.