21. August 2023

Lost connectivity to the device naa.61xxxxxx backing the boot filesystem 

By H. Cemre Günay

Today I noticed that my ESXi 04 displays the following information:

Lost connectivity to the device naa.61xxxxxx backing the boot filesystem /vmfs/devices/disks/naa.61xxxxxx. As a result, host configuration changes will not be saved to persistent storage.

So let us open a SSH session to have a look at the mentioned device by using following command:

esxcli storage core device stats get

As we can see, there are some Failed Operations, to get more information we can use another command:

esxcli storage core device list

The device is not connected and if we try to get S.M.A.R.T informations we will get a “Cannot open device” message:

 esxcli storage core device smart get -d naa.61xxxxxx

Allright, allright, so something is going wrong I got help from this Knowledge Base article: https://kb.vmware.com/s/article/50441

According to VMware there is no Risk or Impact as whole ESXi OS is loaded into memory so there is no outage for the VMs. Once the connectivity is restored the host can access the storage again. The alarm was for the fact that the error does not clear automatically.

To clear this alarm, use one of these options:

Put the host into maintenance mode to vacate all VMs to other hosts and reboot the host

or

Restart the Management Agents on the ESXi host by running this command:

/etc/init.d/hostd restart 

I tried the maintenance mode and reboot way, as we Germans use to say “Reboot does Good” 😉

In the POST state my Dell T430 Server says that there are offline or missing virtual drives. So I will shutdown the server completely and reseat the boot device.

Well in my case, I could see in iDRAC very fast, that my boot device is defective… I love it… 😉

One or more physical disks included in the virtual disk have failed. If the virtual disk is non-redundant (does not use mirrored or parity data), then the failure of a single physical disk can cause the virtual disk to fail. If the virtual disk is redundant, then more physical disks have failed than you can rebuild using mirrored or parity information.

So I have to install ESXi OS on a new boot device again. Shit happens 🙂

In the normal case, your boot device should work properly after the restart. However, if the hardware is defective, you will unfortunately have to reinstall the ESXi OS. Ideally, you have new server systems, e.g. Dell R740 with dedicated BOSS boot devices. That is it for this blog post, I hope it helps you. If you have any questions, please leave them in the comments. 🙂