2. 隨後在連到vCenter中去檢查Event,然後把Server發生重啟的相對應時間的Event,整個export出來分析比對,我節錄主要兩個Key Word訊息如下:
*yourservername on youresxihost in cluster yourclustername in yourdatacenter reset by HA. Reason: VMware Tools heartbeat failure. A screenshot is saved at /vmfs/volumes/93a8a5f4-e161a15a/yourservername-screenshot-0.png
*Alarm 'Virtual machine high availability error' on yourservername changed from Gray to Gray
Alarm 'Virtual machine high availability error' on yourservername triggered an action
Alarm 'Virtual machine high availability error': an SNMP trap for entity yourservername was sent
從上面看出一些端倪,應該是VMware Tools heartbeat出了問題,翻了一些官方文件,找到以下兩個連結,可以提供參考:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1007899
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1027734
發生主要原因是在Guest OS上,我們都會安裝VMware Tools這一隻程式,而如果在你的vCenter中的Cluster setting裡,如果有Enable VM Monitoring功能,則在ESXI主機上的Host agent會依照你設定的條件去跟Guest OS上VMware Tools去做heartbeat溝通,而一旦ESXi主機沒有收到Guest OS的heartbeat,就會判定Guest OS已經掛了,然後進行重啟,以下是我原本的VM Monitoring設定:
可以看到,原本的Monitoring sensitivity是設成High,後來SI建議我把他設成Low,或者是把VM Monitoring功能關閉,以避免ESXi造成誤判,把原本活的好好的Guest OS,硬把它認為已經掛了,結果造成非預期性重開情事再次發生。