Version: v2.0

Scaling Down a DxSqlAg Availability Group

This guide will walk you through the process of scaling down a SQL Server Availability Group cluster managed by DxOperator.

When scaling down, DxOperator prioritizes predictability, safety, and data preservation. In some cases, manual intervention may be required for scale-down tasks. This section explains details and provides guidance on how to approach common scenarios.

Prerequisites

DxOperator installed in the Kubernetes cluster.
DxSqlAg created

How to Scale Down

To remove replica(s) from a DxSqlAg Availability Group, reduce the quantity of synchronousReplicas, asynchronousReplicas, or configurationOnly replicas in the DxSqlAg custom resource. For example, suppose we originally deployed as below with synchronousReplicas: 3. Edit the file to reduce it to 2:

apiVersion: dh2i.com/v1
kind:  DxSqlAg
metadata:
  name: dxsqlag
spec:
  sqlAgConfiguration:
    synchronousReplicas: 2  
    asynchronousReplicas: 0
    availabilityGroupName: AG1
  # Other fields omitted for brevity, such as statefulSetSpec
  statefulSetSpec:
    ...

Afterward, apply the new values using kubectl apply.

DxOperator will then set to work making the changes to the Availability Group, including removing one of the replicas.

Watching Progress

You can watch its progress in a number of ways:

Connect to the DxEnterprise cluster using DxAdmin
Pod removal can be observed by kubectl:
```
kubectl get pod -n default -w
```

Storage Cleanup

DxOperator will not automatically delete any of the persistent volume claims that were used by a replica pod after scale-down. If you no longer need any of the data, consider manually deleting the persistent volume claims. Make sure to fill in the ordinal of the pod that was removed above.

kubectl delete pvc/mssql-dxsqlag-<pod-ordinal> pvc/dxe-dxsqlag-<pod-ordinal>

danger

Persistent volume deletion is irreversible. Ensure that you no longer have any use of the data on the persistent volume before deleting the PVCs.

Scale-Down Considerations

Pod Ordinals

The pods of a DxSqlAg Availability Group cluster are created in ascending order. For example, a DxSqlAg with synchronousReplicas: 3, would have three pods named dxsqlag-0, dxsqlag-1, and dxsqlag-2. When scaling down this cluster, DxOperator will remove pods starting at the highest number (ordinal). To scale down this cluster by one replica, the pod named dxsqlag-2 will be removed.

Primary Replica

DxOperator will never remove the primary replica when scaling down an Availability Group. If the highest ordinal pod, e.g. dxsqlag-2 in the previous section, is the current primary replica, DxOperator will suspend any actions that would cause it to be removed.

Prior to scale-down, ensure that none of the pod(s) to be removed are the current primary replica. If so, another replica must be manually promoted to the primary role. Typically this is accomplished using:

kubectl exec -it -c dxe -n <namespace> pod/<pod-name> dxcli vhost-start-node <vhostname> <new-primary-pod-name>

See also: documentation for vhost-start-node.

Health Checks

DxOperator will not scale down an Availability Group that has active health alerts. If a replica is disconnected or has unsynchronized databases, scale-down will be suspended until the conditions are resolved.

Prior to scale-down, ensure that all health alerts are resolved. To check health alerts:

kubectl exec -it -c dxe -n <namespace> pod/<pod-name> dxcli get-alerts

See also: documentation for get-alerts.

Additionally, DxOperator will not perform any scale-down operations while a pod is in the process of being configured. Typically this happens as part of an earlier creation or scale-up operation.

Availability Mode Changes

Scale-down can be done for a specific availability mode - it's possible to reduce synchronousReplicas or asynchronousReplicas independently. Scale-down also requires DxOperator to remove pods starting with the highest ordinal. In cases where the availability mode of the highest ordinal pod does not match the availability mode being scaled-down, DxOperator will reassign the availability mode of one of the remaining pods.

As an example, suppose we have an Availability Group with two synchronous replicas and one asynchronous replica:

Pod	Availability Mode
dxsqlag-0	SYNCHRONOUS_COMMIT
dxsqlag-1	SYNCHRONOUS_COMMIT
dxsqlag-2	ASYNCHRONOUS_COMMIT

Then we reduce the synchronousReplicas count from 2 to 1. In this case, DxOperator would reassign dxsqlag-1 to be ASYNCHRONOUS_COMMIT, and then remove dxsqlag-2 from the Availability Group.

caution

Automatic availability mode switching is enabled by default. It can be disabled for a specific DxSqlAg by changing the spec.sqlAgConfiguration.disableModeSwitching option. If this option is disabled, scale-down operations may be suspended until the availability mode of the highest ordinal pod is resolved.

Configuration-Only Replicas

Configuration-only replicas are a special availability mode in which no database contents are replicated. They can be used to maintain Availability Group quorum, which can enable automatic failover in Availability Groups with only two synchronous replicas.

DxOperator creates pods for configuration-only replicas in separate StatefulSet. When updating the configurationOnlyReplicas value in the DxSqlAg, only pods from this separate StatefulSet will be added or removed. The naming convention for the first configuration-only replica, following the prior examples, would be dxsqlag-cfg-0.

Storage Re-Use

As mentioned in the Storage Cleanup section, DxOperator will not delete the persistent volume claims of pods removed by the scale-down process.

In situations where an Availability Group is scaled down and subsequently scaled back up, and no actions are taken to remove the persistent volumes belonging to the once-removed pods, the newly scaled up pods will retain the storage of the previous pods, including any data that had been replicated to them.

If this behavior is not desired, ensure that persistent volume claims are deleted on scale-down.

Additional Information

SQL Server: High availability and data protection for availability group configurations.
DxEnterprise: Quorum Considerations for SQL Server Availability Groups

Prerequisites​

How to Scale Down​

Watching Progress​

Storage Cleanup​

Scale-Down Considerations​

Pod Ordinals​

Primary Replica​

Health Checks​

Availability Mode Changes​

Configuration-Only Replicas​

Storage Re-Use​

Additional Information​