My lab has 3x HP GEN 10 Microservers, each containing 1x SSD and 3x HDDs.
I was previously running VSAN on v6.5 fine until I decided to upgrade to v6.7. The vCenter upgraded fine but the hosts kept failing, so I decided to just wipe the hosts and the storage and start fresh. Each of the hosts were freshly installed with v6.7, disks wiped, vmkernel ports configured etc. When I created the disk groups hosts 2 and 3 completed successfully but host 1 hanged and eventually failed. If I rebooted host 1 the boot would hang until I removed the SSD and then it booted successfully. I then had to wipe the SSD before the host would boot again. I've tried swapping the disks between the hosts and rebuilding the disk groups but the problem only affects host 1 and the SSD.
I've found this in the logs but I'm confused why it affects only 1 host on v6.7
vmkwarning.log
2018-12-29T23:54:01.484Z cpu1:2097640)WARNING: vmw_ahci[00000100]: IssueCommand:ERROR: Tag 1 SActive already set: SACT:6 CI:7 reissue_flag:0
2018-12-29T23:54:01.484Z cpu0:2097736)WARNING: NMP: nmpCompleteRetryForPath:357: Retry cmd 0x28 (0x459a40cba2c0) to dev "t10.ATA_____V42DCT064V4SSD2__________________________200105984___________" failed on path "vmhba1:C0:T0:L0" H:0x1 D:0x2 P:0x$
2018-12-29T23:54:01.484Z cpu0:2097736)WARNING: NMP: nmpCompleteRetryForPath:387: Logical device "t10.ATA_____V42DCT064V4SSD2__________________________200105984___________": awaiting fast path state update before retrying failed command again...
2018-12-29T23:54:02.486Z cpu2:2097640)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "t10.ATA_____V42DCT064V4SSD2__________________________200105984___________" - issuing command 0x459a40cba2c0
vmkernel.log
2018-12-30T00:03:55.487Z cpu0:2097640)0x451a0a71b9d0:[0x41801a5f1a03]ahciRequestIo@(vmw_ahci)#<None>+0x4ac stack: 0x4306c3654a88, 0x4306c3654990, 0x451a0a71bb68, 0x0, 0x0
2018-12-30T00:03:55.487Z cpu0:2097640)0x451a0a71ba50:[0x41801a5f7f38]scsiExecReadWriteCommand@(vmw_ahci)#<None>+0x51 stack: 0x4306c3657de8, 0x41801a5f816e, 0x4306c36549c8, 0x41801a5f82fa, 0x451a0a71bb58
2018-12-30T00:03:55.487Z cpu0:2097640)0x451a0a71ba70:[0x41801a5f816d]ataIssueCommand@(vmw_ahci)#<None>+0x46 stack: 0x451a0a71bb58, 0x10005893c29a88f, 0x0, 0x459a40d2cb00, 0x4302fb4dbd40
2018-12-30T00:03:55.487Z cpu0:2097640)0x451a0a71ba80:[0x41801a5f82f9]scsiQueueCommand@(vmw_ahci)#<None>+0xd2 stack: 0x0, 0x459a40d2cb00, 0x4302fb4dbd40, 0x0, 0x459a40d2c8c0
2018-12-30T00:03:55.487Z cpu0:2097640)0x451a0a71bac0:[0x418019f78ceb]SCSIIssueCommandDirect@vmkernel#nover+0xf8 stack: 0x15, 0x41801a5f8228, 0x418019f78cd5, 0x451a0a71bf80, 0x451a0a71bb68
2018-12-30T00:03:55.487Z cpu0:2097640)0x451a0a71bb30:[0x418019f7a01f]SCSIStartAdapterCommands@vmkernel#nover+0x384 stack: 0x4302fb4dbec0, 0x40000108, 0x4302fb4dc2b0, 0x4302fb4dc238, 0x451a00000001
2018-12-30T00:03:55.487Z cpu0:2097640)0x451a0a71bbc0:[0x418019f8924b]SCSIStartPathCommands@vmkernel#nover+0x4d8 stack: 0x451a075a3000, 0x418019f01124, 0x0, 0x58900000000, 0x451a0a71bcf0
2018-12-30T00:03:55.487Z cpu0:2097640)0x451a0a71bd70:[0x418019f8fc0f]SCSIIssueAsyncPathCommandDirect@vmkernel#nover+0x240 stack: 0x3436305443443234, 0x5f5f324453533456, 0x5f5f5f5f5f5f5f5f, 0x5f5f5f5f5f5f5f5f, 0x41801a8115c4
2018-12-30T00:03:55.487Z cpu0:2097640)0x451a0a71be30:[0x418019f9153b]vmk_ScsiIssueAsyncPathCommandDirect@vmkernel#nover+0x1c stack: 0x4308b09d3ad0, 0x41801a7d7726, 0x1, 0xd120000000030, 0x451a0a71bec8
2018-12-30T00:03:55.487Z cpu0:2097640)0x451a0a71be50:[0x41801a7d7725]nmp_SelectPathAndIssueCommand@com.vmware.vmkapi#v2_5_0_0+0xea stack: 0x451a0a71bec8, 0x451a0a71be70, 0x0, 0x4308b09d3ad0, 0x459a40d2c440
2018-12-30T00:03:55.487Z cpu0:2097640)0x451a0a71beb0:[0x41801a7d2c81]nmpAttemptFailover@com.vmware.vmkapi#v2_5_0_0+0x102 stack: 0x459a40d2c140, 0x4308b09aa098, 0xffffffff, 0x4308b09d4590, 0x43007e6c5070
2018-12-30T00:03:55.487Z cpu0:2097640)0x451a0a71bf30:[0x418019cea442]HelperQueueFunc@vmkernel#nover+0x30f stack: 0x4308b09b37a8, 0x4308b09b3798, 0x4308b09b37d0, 0x451a0a723000, 0x4308b09b37a8
2018-12-30T00:03:55.487Z cpu0:2097640)0x451a0a71bfe0:[0x418019f09112]CpuSched_StartWorld@vmkernel#nover+0x77 stack: 0x0, 0x0, 0x0, 0x0, 0x0
2018-12-30T00:03:55.487Z cpu0:2097175)WARNING: NMP: nmpCompleteRetryForPath:357: Retry cmd 0x28 (0x459a40d2c140) to dev "t10.ATA_____V42DCT064V4SSD2__________________________200105984___________" failed on path "vmhba1:C0:T0:L0" H:0x1 D:0x2 P:0x$
2018-12-30T00:03:55.487Z cpu0:2097175)WARNING: NMP: nmpCompleteRetryForPath:387: Logical device "t10.ATA_____V42DCT064V4SSD2__________________________200105984___________": awaiting fast path state update before retrying failed command again...
2018-12-30T00:03:56.486Z cpu1:2097640)WARNING: NMP: nmpDeviceAttemptFailover:640: Retry world failover device "t10.ATA_____V42DCT064V4SSD2__________________________200105984___________" - issuing command 0x459a40d2c140
2018-12-30T00:03:56.486Z cpu1:2097640)WARNING: vmw_ahci[00000100]: IssueCommand:ERROR: Tag 1 SActive already set: SACT:6 CI:7 reissue_flag:0