vSAN Skyline Health: A possible storage capacity limitation with vSAN OSA versions 8.0U2 and 8.0U2b
Another Day another Warning inside vSAN Skyline Health. 🙂
Frequent Deletion of VMs….. Hmm….. I’ve been using vSAN since vSphere 6.7 U3 in my home lab and I’m pretty sure I deleted VMs more frequently, especially in my rookie times and never had this kind of warning in vSAN Skyline Health. Anyway, lets try to fix it! 😉
What I really like about vSphere 8 is the Troubleshoot option for each warning. You get some hints in advance and maybe even the solution to the issue. So let’s click on Troubleshoot right away:
You get a “why” and a “how to solve the problem”. Normally I would look for the warning/error message in the VMware KB article first, in this case the solution is already displayed.
However, I have searched for the appropriate KB article, which tells me nothing more than vSAN Skyline Health: https://knowledge.broadcom.com/external/article/312822/vsan-osa-clusters-running-on-80-u2-or-80.html
But I think especially for this warning, the symptoms from the KB article are important
- VMs fail to deploy/create with the error “There is no more space”, yet there is plenty of space on the vSAN datastore
- vSAN Health “physdiskcapacity” reports out of logical space
- Examples of errors seen in the Logs, not specific to any Host(s):
2024-03-06T20:41:02.679Z Wa(180) vmkwarning: cpu64:2561511 opID=ac4b68ff)WARNING:
2024-03-06T22:57:53.736Z Wa(180) vmkwarning: cpu124:2099922)WARNING: LSOM: LSOMOutOfLogical:12361: Throttled: cp 4194304 cu 0
2024-03-06T23:07:05.161Z Wa(180) vmkwarning: cpu71:2099290)WARNING: VSAN: VsanSparseWriteDone:4590: Throttled: Write error on '3a1d9e51-eba6e865-06d4-c916-727e-00620bc23b50': token status 'No space left on device',SCSI status 8 (OK:BUSY)
Symptoms one and two are not present in my environment, I looked at the vmkwarning.log and couldn’t find any warnings there either:
[root@esx01:/var/log] cat vmkwarning.log | grep VsanSparseWriteDone
[root@esx01:/var/log] cat vmkwarning.log | grep LSOMOutOfLogical
[root@esx01:/var/log] cat vmkwarning.log | grep Warning
[root@esx01:/var/log] cat vmkwarning.log | grep busy
2024-06-08T11:21:12.311Z Wa(180) vmkwarning: cpu3:2148007)WARNING: UserFile: 2298: etcd: Directory changing too often to perform readdir operation (11 retries), returning busy
[root@esx01:/var/log]
Since I have no support at VMware and cannot open a ticket, I will follow the solution approach from the KB article or vSAN Skyline Health and apply the below advanced configuration on all hosts (see impacted host in screenshot 2) in the cluster and run auto-backup.sh:
esxcfg-advcfg -s 0 /LSOM/lsomPlogZeropV2;/sbin/auto-backup.sh
As soon as another online retest is performed in vSAN Skyline Health, the warning disappears and another issue is resolved. 🙂
If you have any questions, please use the comment field below.