My customer has successfully rolled out VMware vSphere Virtual Volumes (or “vVols”) in their environment. They’re loving the simplicity of storage management in vSphere, but were a little stuck when they added a pair of newly installed ESXi hosts to their environment. The hosts were not mounting the vVols datastore as expected meaning hosts could not run VMs backed by vVols. All existing hosts were OK.
To start, they dug in to the logs at /var/log/vvold.log
on the problematic hosts and found the following error:
Endpoint, err=SSL Exception: Verification parameters:
--> PeerThumbprint: <removed to protect the innocent>
--> ExpectedThumbprint:
--> ExpectedPeerName: <IP address of VASA provider>
--> The remote host certificate has these problems:
-->
--> * self signed certificate in certificate chain, using default
2019-10-30T23:51:16.645Z info vvold[2160126] [Originator@6876 sub=Default] VasaSession::Initialize url is empty
2019-10-30T23:51:16.645Z warning vvold[2160126] [Originator@6876 sub=Default] VasaSession::DoSetContext: Empty VP URL for VP (<VASA provider hostname>)!
2019-10-30T23:51:16.645Z info vvold[2160126] [Originator@6876 sub=Default] Initialize: Failed to establish connection https://<VASA provider IP address>:8084/version.xml
2019-10-30T23:51:16.645Z error vvold[2160126] [Originator@6876 sub=Default] Initialize: Unable to init session to VP <VASA provider hostname> state: 0
To no ones surprise, it was an SSL error. It looks like the ESXi host is not able to securely connect to the storage array VASA Provider as indicated by SSL Exception: Verification parameters
. This error appears in the vvold.log file of both problematic hosts but none of the existing, so the symptom seems to affect only the new hosts.
We checked the SSL certificate on the VASA Provider and validated that it was created correctly and had not expired (a given really as the other hosts were connected to it just fine). We also confirmed connectivity was OK on 8084 from the ESXi hosts to the VASA provider. That’s the basics out of the way.
For almost every SSL validation issue I’ve ever encountered, the connecting machine did not have the target certificate in its trusted certificates store. Usually the trusted root CA’s will be put in a trusted certificate store. Knowing this, we checked the ESXi host’s trusted certificate store at /etc/vmware/ssl/castore.pem
. To get a list of the certificates in this store, run openssl x509 -in /etc/vmware/ssl/castore.pem -text
.
Take a look at the output of this file on a working host and depending on your environment, you should see a number of CA’s: the VMCA certificate, maybe some enterprise Root CA certificates, etc. The contents of this file is managed by the vCenter Server managing the host. The vCenter Server pushes a list of trusted certificates to the host from its own certificate store (the VECS). For us, the contents of the castore.pem file did not match the contents of the VECS, or the contents of another hosts’ castore.pem in the same environment. It looked like the vCenter Server was failing to push trusted certificates, at least for these 2 new hosts.
Why is this important? One of the trusted certificates that vCenter pushes down to the host is the Root CA certificate that issued the SSL certificate presented by the VASA Provider cue dramatic music. Without that Root CA certificate, ESXi cannot trust the certificate presented by the VASA provider and subsequently fails to connect.
To test this, we backed up the affected hosts castore.pem file (mv /etc/vmware/ssl/castore.pem.bak
) and copied a castore.pem file from a working host. After the copy, we restarted the host. ESXi connected to the VASA Provider successfully and the vVols Datastore mounted without a problem!
We’re still working together to identify the root cause of vCenter not refreshing the host CA store. Once we’ve figured it out I’ll do another post specific to that issue.
Update! I’ve written a new post addressing the faulty certificate in the VECS. See Unable to push CA certificates and CRLs to host
This workaround is great if you have hosts with the right certificates in the castore file. If not, you can actually create your own castore.pem file consisting of all the trusted certificates you need in Base64 text. I might do a post just on this topic, let me know if you’re interested.
Thanks for reading! I hope this was helpful.
- Restore of a Wordpress deployment is stuck redirecting the port
- Backups and Restores using Velero in TKGm 1.6.1
- Unable to upgrade the database: org.postgresql.util.PSQLException: ERROR: could not open shared memory segment: No such file or directory
- Upgrading Cloud Director 10.4.1 to 10.5
- Installing and Configuring Velero in TKGm 1.6.1 on vSphere