I am running ESXI 5.0 on a server that I have multiple VM's on with no High Availability or Vmotion. Earlier today the esxi OS essentially crash from best I can tell. When I went into my Vcenter the server was listed but had a big red explanation point through it, and the management IP was not reachable nor was any of the IP's on my vm's. The physical server was still running so I switched over to the physical server and re-booted it. Every thing came back up and appears to be working fine right now. As I look through the even log it appears at around 2:10 AM this morning there was a warning as follows:
Device naa.6d4ae520af74ae0017eb715f0f69149a
performance has deteriorated. I/O latency
increased from average value of 1012
microseconds to 31367 microseconds.
warning
11/19/2013 2:03:59 AM
10.19.1.6
Device naa.6d4ae520af74ae0017eb715f0f69149a
performance has deteriorated. I/O latency
increased from average value of 1013
microseconds to 31799 microseconds.
warning
11/19/2013 2:10:43 AM
10.19.1.6
Device naa.6d4ae520af74ae0017eb715f0f69149a
performance has deteriorated. I/O latency
increased from average value of 1013
microseconds to 63939 microseconds.
warning
11/19/2013 2:10:43 AM
10.19.1.6
Then I had some CPU usage alarms on 1 of the 3 VM's, I got several of those then at 11:16 AM I get this (time the server crashed)
Host is not responding
error
11/19/2013 11:16:59 AM
10.19.1.6
So question is do I possibly have some disk errors? Or could the CPU usage on that 1 particular VM caused my crash? The data store is local storage running a hardware raid10. I am concerned going forward, this is a very important app server in my network.