Hello,
I'm trying to fix an issue where my root ramdisk / Scatch/logs fills up after a few hows of uptime and esxi can no longer write to its log files.
Unfortunately the only datastore I have is the vsan datastore, and redirecting system logs to that datastore is not supported. I do have vsphere log insight setup but the local path is still {} /scratch/log
I have 3 servers in the vsan, but one of them fills up the root ramdisk much faster than the others. This is an Lenovo server whereas the others are HPE servers.
After spending a fair bit of time looking into this, trying to see if the IBM has a higher logging level than the others etc but I can't see why the Lenovo's logs grow so fast.
Here's what the logs currently look like:
root@MOX-ESXi1:/scratch/log] ls -lah -lS
total 29880
-rw------- 1 root root 10.0M Mar 15 14:54 vsanmgmt.log
-rw------- 1 root root 7.6M Mar 15 18:28 hostd.log
-rw------- 1 root root 3.1M Mar 15 18:29 vsanvpd.log
-rw------- 1 root root 1.9M Mar 15 18:30 vpxa.log
-rw------- 1 root root 1.6M Mar 15 18:34 vsansystem.log
-rw------- 1 root root 943.9K Mar 15 18:29 syslog.log
-rw------- 1 root root 875.4K Mar 15 18:31 osfsd.log
-rw------- 1 root root 744.3K Mar 15 18:28 rhttpproxy.log
-rw------- 1 root root 651.6K Mar 15 17:58 cmmdsTimeMachineDump.log
-rw------- 1 root root 650.4K Mar 15 18:32 vmkernel.log
-rw------- 1 root root 312.0K Mar 15 18:17 fdm.log
-rw------- 1 root root 211.5K Mar 15 18:35 hostd-probe.log
-rw------- 1 root root 172.0K Mar 15 18:40 clomd.log
And here;s the ramdisks:
Ramdisk Size Used Available Use% Mounted on
root 32M 32M 0B 100% --
etc 28M 396K 27M 1% --
opt 32M 0B 32M 0% --
var 48M 816K 47M 1% --
tmp 256M 72K 255M 0% --
iofilters 32M 0B 32M 0% --
hostdstats 803M 4M 798M 0% --
vsantraces 300M 153M 146M 51% --
The largest log file is vsanmgmt.log.
[root@MOX-ESXi1:/scratch/log] tail -f vsanmgmt.log
2018-03-15T14:53:37Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-2] [VsanHealthUtil::log] Profiler:
2018-03-15T14:53:37Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-2] [VsanHealthUtil::log] VsanHealthHelpers.IsWitnessNode: 0.00s
2018-03-15T14:53:37Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-2] [VsanHealthUtil::log] LsomHealth.getHealthStats(): 0.05s
2018-03-15T14:53:37Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-2] [VsanHealthUtil::log] plog devices: 0.00s, 0.00s
2018-03-15T14:53:37Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-2] [VsanHealthUtil::log] get disks: 0.03s
2018-03-15T14:53:37Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-2] [VsanHealthUtil::log] QueryPhysicalHealth.loopfor allVsanDisks: 0.09s
2018-03-15T14:53:37Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-2] [VsanHealthUtil::log] impl.QueryDiskRebalanceStatus: 0.00s, 0.00s, 0.00s, 0.00s, 0.00s, 0.00s, 0.00s
2018-03-15T14:53:37Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-2] [VsanHealthUtil::log] impl._QueryPhysicalDiskHealthSummary: 0.14s
2018-03-15T14:53:37Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-2] [VsanHealthUtil::log] maxComps: 0.00s
2018-03-15T14:53:37Z VSANMGMTSVC: WARNING vsanperfsvc[Thread-2] [VsanHealthUtil::log] QueryPhysicalHealth.GetLocalVsanSystem: 0.01s
If I compare this to one of the HP servers who's logs are tiny in comparison, the output is basically the same:
[root@MOX-ESXi2:/tmp/scratch/log] tail vsanmgmt.log
2018-03-15T18:54:19Z VSANMGMTSVC: WARNING vsanperfsvc[43fc0038-2882-11e8] [VsanHealthUtil::log] cls.ManipulateControllersWithStressOptions: 0.00s
2018-03-15T18:54:19Z VSANMGMTSVC: WARNING vsanperfsvc[43fc0038-2882-11e8] [VsanHealthUtil::log] cls._LookupDriverVersion: 0.00s, 0.00s, 0.00s, 0.00s
2018-03-15T18:54:19Z VSANMGMTSVC: WARNING vsanperfsvc[43fc0038-2882-11e8] [VsanHealthUtil::log] cls._LookupControllerQueueDepth: 0.00s, 0.00s, 0.00s, 0.00s
2018-03-15T18:54:19Z VSANMGMTSVC: WARNING vsanperfsvc[43fc0038-2882-11e8] [VsanHealthUtil::log] Get storage-core-path, core-adapter, hardware-pci, system-version: 0.03s
2018-03-15T18:54:19Z VSANMGMTSVC: WARNING vsanperfsvc[43fc0038-2882-11e8] [VsanHealthUtil::log] cls.checkDiskMode: 0.00s, 0.00s, 0.00s, 0.00s
2018-03-15T18:54:19Z VSANMGMTSVC: WARNING vsanperfsvc[43fc0038-2882-11e8] [VsanHealthUtil::log] Profiler:
2018-03-15T18:54:19Z VSANMGMTSVC: WARNING vsanperfsvc[43fc0038-2882-11e8] [VsanHealthUtil::log] GetHclInfo.ConnectToLocalHosted: 0.01s
2018-03-15T18:54:19Z VSANMGMTSVC: WARNING vsanperfsvc[43fc0038-2882-11e8] [VsanHealthUtil::log] impl.GetHclInfo: 1.19s
2018-03-15T18:54:19Z VSANMGMTSVC: WARNING vsanperfsvc[43fc0038-2882-11e8] [VsanHealthUtil::log] invoke-method:ServiceInstance:RetrieveContent: 0.01s
2018-03-15T18:54:19Z VSANMGMTSVC: INFO vsanperfsvc[Thread-12] [PyVmomiServer::log_message] ('127.0.0.1', 45342) - - "POST /vsan HTTP/1.1" 200 -
Any ideas??