Non-compliant objects in cluster – VMware Cloud on AWS – What now?

Have you been among a few customers, who did receive an email with the above Subject?
If so, please let me dive into the meaning a little more in detail.

With VMware Cloud on AWS (or VMC in short), VMware offers a powerful Cloud solution jointly with AWS. With this service, the main priorities are data availability and security.

To make sure your data is securely available, even in the unlikely event of a disruption of Service, it is necessary to accommodate for certain types of failures.
May it be a single Node in a cluster failing, or even an entire Availability Zone.

Since VMC is based on vSAN Technology, we are leveraging the vSAN Storage Policies to be able to tolerate those kinds/types of failures.

With that being said: why do I get an email reading that I am not SLA compliant?

The answer is pretty straight forward, but it may have some implications on your current cluster design.
By saying this, I am referring to the SLA compliance document, which can be found here:
https://docs.vmware.com/en/VMware-Cloud-on-AWS/services/com.vmware.vmc-aws.getting-started/GUID-5BE76DD4-AB0E-4514-846D-9D3CFA6DC07D.html

Within the SLA document you will find the following passage:

This clearly describes that all objects within a Cluster need to comply with the prescribed Storage Policies, to be eligible for so called SLA credits.

To be very clear: this DOES NOT affect any kind of support, service, etc.

You will get the same support, time to resolve a failure (e.g. failed Node), among other things, for those clusters in question.

However, being not compliant to the SLAs may have a financial impact, as you are not entitled to get a refund for the time the Service has not been available, as outlined in the SLA document (depending on the Cluster design 99.9-99.99% availability).

Now let’s see if you can avoid running in such a scenario.

Let’s assume you have workload that you may not want to be highly available and therefore you are ok to not get an SLA credit refund in case of an unavailability.
If you decide so, that is absolutely fine.

Should you mix these workloads with business-critical Services within the same Cluster, for which you may claim a refund?
Absolutely NOT!
For this kind of scenario it is highly advised to create separate clusters for such workloads.

An example to separate your workload could look like the following:

_SDDC 1_
Stretched 10 Node Cluster for all SLA relevant workloads, SLA credit eligible.
All VMs in this Cluster need to have the Storage Profile reflect the following:
– Site Disaster Tolerance (PFTT) = Dual Site Mirroring
– (minimum) Secondary level of failures to tolerate (SFTT) = 1
(This will protect your workload in case of a (single) Node failure, but also if an entire AZ becomes unavailable (reminder: there must be sufficient capacity on the cluster to support starting a VM).)

_SDDC 2_
Cluster-1: 4 Node Cluster (i.e. internal apps, test/dev, etc.), SLA credit eligible:
To be compliant for SLA refund credits, all VMs need to have the following policy:
– (minimum) Numbers of Failures to Tolerate (FTT) = 1
(The FTT=1 policy allows for one Node to fail within the Cluster.)

Cluster-2: A secondary Cluster for which you may not wish to claim a refund based on the SLA, like for example a 3 Node VDI only Cluster:
– No specific Policy needed, (almost) any can be used
NO REFUND !!

If you are currently working with a single Cluster SDDC, I would strongly advise to re-check your Storage policies and your architecture, to either meet the requirements to be SLA compliant, or to set up new/different/additional Cluster(s).
Since stretched and non-stretched clusters can not be mixed in a single SDDC, I’ve mentioned two separate SDDCs in my above example. (See also this Blogpost from my colleague Gilles on this topic here: https://www.gilles.cloud/2019/08/vmware-cloud-on-aws-sddc-design.html )

Storage Policy change?

As a general rule, please DO NOT change any storage policy, as long as VMs are associated with it. Doing so may have a big impact on the Performance, as well as storage use within the SDDC – it has an immediate effect on the underlying infrastructure!

You should instead either create or duplicate a new policy, based on the needs to satisfy the SLA compliance, or utilise an existing SLA compliant policy – and add your VMs separately.

If you wish to learn more about vSAN Storage Policies for VMware Cloud on AWS, please read on:

https://docs.vmware.com/en/VMware-Cloud-on-AWS/services/com.vmware.vsphere.vmc-aws-manage-data-center-vms.doc/GUID-EDBB551B-51B0-421B-9C44-6ECB66ED660B.html

Some more Info on managed Storage policies can be found here:
https://vmc.techzone.vmware.com/resource/managed-storage-policy-profiles

More Info about the Service Status:

https://status.vmware-services.io/

I hope this post will help you to better understand the benefits – and implications – of being compliant with your VMC storage profiles.

If you have any further questions, please reach out to your VMC specialist(s), your Customer Success Team – or get in touch with us on the VMTN Forums here:

https://communities.vmware.com/t5/Cloud-Activation-Architecture/SLA-compliance-and-SLA-refund-Credits-for-VMC-on-AWS/m-p/2834572#M89

Be well and stay safe,
Rick

Leave a Reply

Leave a Reply