Hi everyone,
I'm currently in the process of setting up a virtualisation server at work. The goal is to have one high-powered tower serve two users simultaneously (each with their own monitor, mouse and keyboard) with GPU passthrough.
The specs I have are the following:
Component | Selection |
---|---|
CPU | AMD - Threadripper 2950X 3.5 GHz 16-Core Processor |
CPU Cooler | Deepcool - Castle 240 RGB 69.34 CFM Liquid CPU Cooler |
Motherboard | Asus - PRIME X399-A EATX TR4 Motherboard |
Memory | Corsair - Vengeance LPX 64 GB (4 x 16 GB) DDR4-2133 Memory |
Storage | Kingston - A400 120 GB 2.5" Solid State Drive |
Crucial - MX500 500 GB 2.5" Solid State Drive | |
Crucial - MX500 500 GB 2.5" Solid State Drive | |
GPU | AMD - Radeon Pro WX 5100 8 GB Video Card |
AMD - Radeon Pro WX 5100 8 GB Video Card | |
MSI - GeForce GT 710 2 GB Video Card* | |
PSU | Thermaltake - Toughpower Grand 850W |
*only used for the console
I'm currently using VMware ESXi as the hypervisor running from the 120GB SSD. It works flawlessly for my first VM, passing through one of the Radeon Pro GPUs. However, when trying to do the same for my second VM, I get an error saying the device was not found.
Inspecting the config file, it shows the PCI device with the wrong identifier. Even with changing the device identifier, it refuses to boot, saying the device isn't available.
Using the shell, I get the following:
[root@localhost:~] lspci -v 0000:00:00.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex Class 0600: 1022:1450 0000:00:00.2 IOMMU Generic system peripheral: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit Class 0806: 1022:1451 0000:00:01.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge Class 0600: 1022:1452 0000:00:01.1 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [PCIe RP[0000:00:01.1]] Class 0604: 1022:1453 0000:00:01.3 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [PCIe RP[0000:00:01.3]] Class 0604: 1022:1453 0000:00:02.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge Class 0600: 1022:1452 0000:00:03.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge Class 0600: 1022:1452 0000:00:04.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge Class 0600: 1022:1452 0000:00:07.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge Class 0600: 1022:1452 0000:00:07.1 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [PCIe RP[0000:00:07.1]] Class 0604: 1022:1454 0000:00:08.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge Class 0600: 1022:1452 0000:00:08.1 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [PCIe RP[0000:00:08.1]] Class 0604: 1022:1454 0000:00:14.0 SMBus Serial bus controller: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller Class 0c05: 1022:790b 0000:00:14.3 ISA bridge Bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge Class 0601: 1022:790e 0000:00:18.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 Class 0600: 1022:1460 0000:00:18.1 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 Class 0600: 1022:1461 0000:00:18.2 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 Class 0600: 1022:1462 0000:00:18.3 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 Class 0600: 1022:1463 0000:00:18.4 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 Class 0600: 1022:1464 0000:00:18.5 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 Class 0600: 1022:1465 0000:00:18.6 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 Class 0600: 1022:1466 0000:00:18.7 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 Class 0600: 1022:1467 0000:00:19.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 Class 0600: 1022:1460 0000:00:19.1 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 Class 0600: 1022:1461 0000:00:19.2 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 Class 0600: 1022:1462 0000:00:19.3 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 Class 0600: 1022:1463 0000:00:19.4 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 Class 0600: 1022:1464 0000:00:19.5 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 Class 0600: 1022:1465 0000:00:19.6 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 Class 0600: 1022:1466 0000:00:19.7 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 Class 0600: 1022:1467 0000:01:00.0 USB controller Serial bus controller: Advanced Micro Devices, Inc. [AMD] X399 Series Chipset USB 3.1 xHCI Controller [vmhba32] Class 0c03: 1022:43ba 0000:01:00.1 SATA controller Mass storage controller: Advanced Micro Devices, Inc. [AMD] X399 Series Chipset SATA Controller [vmhba2] Class 0106: 1022:43b6 0000:01:00.2 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] X399 Series Chipset PCIe Bridge Class 0604: 1022:43b1 0000:02:00.0 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port Class 0604: 1022:43b4 0000:02:01.0 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port Class 0604: 1022:43b4 0000:02:02.0 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port Class 0604: 1022:43b4 0000:02:03.0 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port Class 0604: 1022:43b4 0000:02:04.0 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port Class 0604: 1022:43b4 0000:02:09.0 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port Class 0604: 1022:43b4 0000:05:00.0 Ethernet controller Network controller: Intel Corporation I211 Gigabit Network Connection [vmnic0] Class 0200: 8086:1539 0000:07:00.0 Network controller Network controller: Realtek Semiconductor Co., Ltd. RTL8192EE PCIe Wireless Network Adapter Class 0280: 10ec:818b 0000:08:00.0 USB controller Serial bus controller: Class 0c03: 1b21:2142 0000:09:00.0 VGA compatible controller Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon Pro WX 5100] Class 0300: 1002:67c7 0000:09:00.1 Audio device Multimedia controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 580] Class 0403: 1002:aaf0 0000:0a:00.0 : Class 1300: 1022:145a 0000:0a:00.2 Encryption controller Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor Class 1080: 1022:1456 0000:0a:00.3 USB controller Serial bus controller: Advanced Micro Devices, Inc. [AMD] USB 3.0 Host controller Class 0c03: 1022:145f 0000:0b:00.0 : Class 1300: 1022:1455 0000:0b:00.2 SATA controller Mass storage controller: Advanced Micro Devices Inc AMD FCH SATA Controller [AHCI Mode] [vmhba0] Class 0106: 1022:7901 0000:0b:00.3 Audio device Multimedia controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller Class 0403: 1022:1457 0000:40:00.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex Class 0600: 1022:1450 0000:40:00.2 IOMMU Generic system peripheral: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit Class 0806: 1022:1451 0000:40:01.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge Class 0600: 1022:1452 0000:40:01.3 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [PCIe RP[0000:40:01.3]] Class 0604: 1022:1453 0000:40:02.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge Class 0600: 1022:1452 0000:40:03.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge Class 0600: 1022:1452 0000:40:03.1 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [PCIe RP[0000:40:03.1]] Class 0604: 1022:1453 0000:40:04.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge Class 0600: 1022:1452 0000:40:07.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge Class 0600: 1022:1452 0000:40:07.1 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [PCIe RP[0000:40:07.1]] Class 0604: 1022:1454 0000:40:08.0 Host bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge Class 0600: 1022:1452 0000:40:08.1 PCI bridge Bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [PCIe RP[0000:40:08.1]] Class 0604: 1022:1454 0000:41:00.0 VGA compatible controller Display controller: NVIDIA Corporation GK208B [GeForce GT 710] Class 0300: 10de:128b 0000:41:00.1 Audio device Multimedia controller: NVIDIA Corporation GK208 HDMI/DP Audio Controller Class 0403: 10de:0e0f 0000:42:00.0 VGA compatible controller Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon Pro WX 5100] Class 0300: 1002:67c7 0000:42:00.1 Audio device Multimedia controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 580] Class 0403: 1002:aaf0 0000:43:00.0 : Class 1300: 1022:145a 0000:43:00.2 Encryption controller Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor Class 1080: 1022:1456 0000:43:00.3 USB controller Serial bus controller: Advanced Micro Devices, Inc. [AMD] USB 3.0 Host controller Class 0c03: 1022:145f 0000:44:00.0 : Class 1300: 1022:1455 0000:44:00.2 SATA controller Mass storage controller: Advanced Micro Devices Inc AMD FCH SATA Controller [AHCI Mode] [vmhba1] Class 0106: 1022:7901
The error I get when starting the second VM is that device 0000:65:00. is not found. This is obviously an error since the second GPU is 0000:41:00.0. Changing the device number in config then tells me device 0000:41:00.0 is not found.
I have then run the following:
[root@localhost:~] dmesg | grep IOMMU TSC: 266805 cpu0:1)BootConfig: 324: noIOMMU = FALSE 0:00:00:05.258 cpu0:2097152)AMDIOMMU: 855: AMD IOMMU driver version 1.26, built on: Apr 3 2018 0:00:00:05.279 cpu0:2097152)AMDIOMMU: 804: iommuDevArray entry not present for devId 00:00.1 0:00:00:05.279 cpu0:2097152)WARNING: AMDIOMMU: 829: IOMMU unit 00:00.2: IOAPIC id 128 reported twice, ignoring the duplicate entry 0:00:00:05.279 cpu0:2097152)AMDIOMMU: 804: iommuDevArray entry not present for devId 00:00.1 0:00:00:05.279 cpu0:2097152)WARNING: AMDIOMMU: 829: IOMMU unit 00:00.2: IOAPIC id 129 reported twice, ignoring the duplicate entry 0:00:00:05.279 cpu0:2097152)AMDIOMMU: 894: AMD IOMMU driver has been loaded successfully. 0:00:00:05.279 cpu0:2097152)AMDIOMMU: 157: Max IOPN supported by the IOMMU hardware = 0xfffffffffffffffe 0:00:00:05.452 cpu0:2097152)AMDIOMMU: 802: Registering AMDIommu with VMkernel 0:00:00:05.452 cpu0:2097152)AMDIOMMU: 402: Created domain 0 0:00:00:05.452 cpu0:2097152)AMDIOMMU: 306: Domain 0: bypass = Yes, identity-mapped = Yes, top page table = 0x88c827000 0:00:00:05.452 cpu0:2097152)AMDIOMMU: 369: assign device 0x2 (alias=0x2) to domain 0 0:00:00:05.453 cpu0:2097152)AMDIOMMU: 369: assign device 0x100 (alias=0x100) to domain 0 0:00:00:05.454 cpu0:2097152)AMDIOMMU: 369: assign device 0x101 (alias=0x101) to domain 0 0:00:00:05.455 cpu0:2097152)AMDIOMMU: 369: assign device 0x500 (alias=0x500) to domain 0 0:00:00:05.456 cpu0:2097152)AMDIOMMU: 369: assign device 0x700 (alias=0x700) to domain 0 0:00:00:05.457 cpu0:2097152)AMDIOMMU: 369: assign device 0x800 (alias=0x800) to domain 0 0:00:00:05.458 cpu0:2097152)AMDIOMMU: 369: assign device 0x900 (alias=0x900) to domain 0 0:00:00:05.459 cpu0:2097152)AMDIOMMU: 369: assign device 0x901 (alias=0x901) to domain 0 0:00:00:05.461 cpu0:2097152)AMDIOMMU: 369: assign device 0xa00 (alias=0xa00) to domain 0 0:00:00:05.462 cpu0:2097152)AMDIOMMU: 369: assign device 0xa02 (alias=0xa02) to domain 0 0:00:00:05.463 cpu0:2097152)AMDIOMMU: 369: assign device 0xa03 (alias=0xa03) to domain 0 0:00:00:05.464 cpu0:2097152)AMDIOMMU: 369: assign device 0xb00 (alias=0xb00) to domain 0 0:00:00:05.465 cpu0:2097152)AMDIOMMU: 369: assign device 0xb02 (alias=0xb02) to domain 0 0:00:00:05.466 cpu0:2097152)AMDIOMMU: 369: assign device 0xb03 (alias=0xb03) to domain 0 0:00:00:05.467 cpu0:2097152)AMDIOMMU: 369: assign device 0xa0 (alias=0xa0) to domain 0 0:00:00:05.468 cpu0:2097152)IOMMU: 1865: 0000:40:00.2: Device not supported by IOMMU hardware. 0:00:00:05.468 cpu0:2097152)IOMMU: 1865: 0000:41:00.0: Device not supported by IOMMU hardware. 0:00:00:05.468 cpu0:2097152)IOMMU: 1865: 0000:41:00.1: Device not supported by IOMMU hardware. 0:00:00:05.468 cpu0:2097152)IOMMU: 1865: 0000:42:00.0: Device not supported by IOMMU hardware. 0:00:00:05.468 cpu0:2097152)IOMMU: 1865: 0000:42:00.1: Device not supported by IOMMU hardware. 0:00:00:05.468 cpu0:2097152)IOMMU: 1865: 0000:43:00.0: Device not supported by IOMMU hardware. 0:00:00:05.468 cpu0:2097152)IOMMU: 1865: 0000:43:00.2: Device not supported by IOMMU hardware. 0:00:00:05.468 cpu0:2097152)IOMMU: 1865: 0000:43:00.3: Device not supported by IOMMU hardware. 0:00:00:05.468 cpu0:2097152)IOMMU: 1865: 0000:44:00.0: Device not supported by IOMMU hardware. 0:00:00:05.468 cpu0:2097152)IOMMU: 1865: 0000:44:00.2: Device not supported by IOMMU hardware. 0:00:00:05.468 cpu0:2097152)AMDIOMMU: 1470: enable IOMMU 00:00.2 2018-11-21T15:56:46.167Z cpu4:2097732)DMA: 166: 'DMAIOMMU' DMA mapper is now available.. 2018-11-21T16:01:50.428Z cpu30:2100335)PCIPassthru: 3525: Device 0000:42:00.0 not supported by IOMMU hardware. 2018-11-21T16:14:34.430Z cpu2:2101511)AMDIOMMU: 402: Created domain 1 2018-11-21T16:14:34.430Z cpu2:2101511)AMDIOMMU: 306: Domain 1: bypass = No, identity-mapped = No, top page table = 0xc6b122000 2018-11-21T16:14:38.433Z cpu1:2101511)AMDIOMMU: 369: assign device 0x900 (alias=0x900) to domain 1 2018-11-21T16:14:38.433Z cpu1:2101511)IOMMU: 2451: Device 0000:09:00.0 placed in new domain 0x43052f28bd60. 2018-11-21T16:14:42.437Z cpu1:2101511)AMDIOMMU: 369: assign device 0x901 (alias=0x901) to domain 1 2018-11-21T16:14:42.437Z cpu1:2101511)IOMMU: 2451: Device 0000:09:00.1 placed in new domain 0x43052f28bd60. 2018-11-21T16:15:03.048Z cpu9:2101511)AMDIOMMU: 429: Removing device 09:00.1 (alias=09:00.1) from domain 1 2018-11-21T16:15:03.048Z cpu9:2101511)AMDIOMMU: 369: assign device 0x901 (alias=0x901) to domain 0 2018-11-21T16:15:07.051Z cpu9:2101511)AMDIOMMU: 429: Removing device 09:00.0 (alias=09:00.0) from domain 1 2018-11-21T16:15:07.051Z cpu9:2101511)AMDIOMMU: 369: assign device 0x900 (alias=0x900) to domain 0 2018-11-21T16:15:07.397Z cpu9:2097254)AMDIOMMU: 330: Freeing domain 1 2018-11-21T16:15:45.320Z cpu16:2101776)AMDIOMMU: 402: Created domain 1 2018-11-21T16:15:45.320Z cpu16:2101776)AMDIOMMU: 306: Domain 1: bypass = No, identity-mapped = No, top page table = 0x11874c000 2018-11-21T16:15:49.324Z cpu16:2101776)AMDIOMMU: 369: assign device 0x900 (alias=0x900) to domain 1 2018-11-21T16:15:49.324Z cpu16:2101776)IOMMU: 2451: Device 0000:09:00.0 placed in new domain 0x43052f28bd60. 2018-11-21T16:15:53.329Z cpu16:2101776)AMDIOMMU: 369: assign device 0x901 (alias=0x901) to domain 1 2018-11-21T16:15:53.329Z cpu16:2101776)IOMMU: 2451: Device 0000:09:00.1 placed in new domain 0x43052f28bd60. 2018-11-21T16:26:44.318Z cpu9:2101915)AMDIOMMU: 429: Removing device 09:00.1 (alias=09:00.1) from domain 1 2018-11-21T16:26:44.318Z cpu9:2101915)AMDIOMMU: 369: assign device 0x901 (alias=0x901) to domain 0 2018-11-21T16:26:48.322Z cpu9:2101915)AMDIOMMU: 369: assign device 0x901 (alias=0x901) to domain 1 2018-11-21T16:26:48.322Z cpu9:2101915)IOMMU: 2451: Device 0000:09:00.1 placed in new domain 0x43052f28bd60. 2018-11-21T16:26:52.325Z cpu0:2101915)AMDIOMMU: 429: Removing device 09:00.0 (alias=09:00.0) from domain 1 2018-11-21T16:26:52.325Z cpu0:2101915)AMDIOMMU: 369: assign device 0x900 (alias=0x900) to domain 0 2018-11-21T16:26:56.329Z cpu9:2101915)AMDIOMMU: 369: assign device 0x900 (alias=0x900) to domain 1 2018-11-21T16:26:56.329Z cpu9:2101915)IOMMU: 2451: Device 0000:09:00.0 placed in new domain 0x43052f28bd60. 2018-11-21T16:28:37.459Z cpu8:2101915)AMDIOMMU: 429: Removing device 09:00.1 (alias=09:00.1) from domain 1 2018-11-21T16:28:37.459Z cpu8:2101915)AMDIOMMU: 369: assign device 0x901 (alias=0x901) to domain 0 2018-11-21T16:28:41.462Z cpu0:2101915)AMDIOMMU: 369: assign device 0x901 (alias=0x901) to domain 1 2018-11-21T16:28:41.462Z cpu0:2101915)IOMMU: 2451: Device 0000:09:00.1 placed in new domain 0x43052f28bd60. 2018-11-21T16:28:45.466Z cpu0:2101915)AMDIOMMU: 429: Removing device 09:00.0 (alias=09:00.0) from domain 1 2018-11-21T16:28:45.466Z cpu0:2101915)AMDIOMMU: 369: assign device 0x900 (alias=0x900) to domain 0 2018-11-21T16:28:49.469Z cpu0:2101915)AMDIOMMU: 369: assign device 0x900 (alias=0x900) to domain 1 2018-11-21T16:28:49.469Z cpu0:2101915)IOMMU: 2451: Device 0000:09:00.0 placed in new domain 0x43052f28bd60. 2018-11-21T17:03:42.739Z cpu10:2101915)AMDIOMMU: 429: Removing device 09:00.1 (alias=09:00.1) from domain 1 2018-11-21T17:03:42.739Z cpu10:2101915)AMDIOMMU: 369: assign device 0x901 (alias=0x901) to domain 0 2018-11-21T17:03:46.744Z cpu10:2101915)AMDIOMMU: 369: assign device 0x901 (alias=0x901) to domain 1 2018-11-21T17:03:46.744Z cpu10:2101915)IOMMU: 2451: Device 0000:09:00.1 placed in new domain 0x43052f28bd60. 2018-11-21T17:03:50.749Z cpu0:2101915)AMDIOMMU: 429: Removing device 09:00.0 (alias=09:00.0) from domain 1 2018-11-21T17:03:50.749Z cpu0:2101915)AMDIOMMU: 369: assign device 0x900 (alias=0x900) to domain 0 2018-11-21T17:03:54.749Z cpu0:2101915)AMDIOMMU: 369: assign device 0x900 (alias=0x900) to domain 1 2018-11-21T17:03:54.749Z cpu0:2101915)IOMMU: 2451: Device 0000:09:00.0 placed in new domain 0x43052f28bd60.
This shows that the device is not supported by IOMMU. I have tried switching the cards around, and running them with only one GPU plugged into the system at a time. Both work with the first VM. The issue only arises for the second GPU added (irrespective of which one).
I'm at my wits end trying to figure this out. I really would appreciate any help in trying to resolve this.
Many thanks!