提交 · 96788c7a7f1e7206519d4d736f89a2072dcfe0fc · openeuler / Kernel

13 3月, 2020 1 次提交

iommu/vt-d: dmar_parse_one_rmrr: replace WARN_TAINT with pr_warn + add_taint · 96788c7a

由 Hans de Goede 提交于 3月 09, 2020

Quoting from the comment describing the WARN functions in
include/asm-generic/bug.h:

 * WARN(), WARN_ON(), WARN_ON_ONCE, and so on can be used to report
 * significant kernel issues that need prompt attention if they should ever
 * appear at runtime.
 *
 * Do not use these macros when checking for invalid external inputs

The (buggy) firmware tables which the dmar code was calling WARN_TAINT
for really are invalid external inputs. They are not under the kernel's
control and the issues in them cannot be fixed by a kernel update.
So logging a backtrace, which invites bug reports to be filed about this,
is not helpful.

Some distros, e.g. Fedora, have tools watching for the kernel backtraces
logged by the WARN macros and offer the user an option to file a bug for
this when these are encountered. The WARN_TAINT in dmar_parse_one_rmrr
+ another iommu WARN_TAINT, addressed in another patch, have lead to over
a 100 bugs being filed this way.

This commit replaces the WARN_TAINT("...") call, with a
pr_warn(FW_BUG "...") + add_taint(TAINT_FIRMWARE_WORKAROUND, ...) call
avoiding the backtrace and thus also avoiding bug-reports being filed
about this against the kernel.

Fixes: f5a68bb0 ("iommu/vt-d: Mark firmware tainted if RMRR fails sanity check")
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Cc: stable@vger.kernel.org
Cc: Barret Rhoden <brho@google.com>
Link: https://lore.kernel.org/r/20200309140138.3753-3-hdegoede@redhat.com
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1808874

96788c7a

10 3月, 2020 1 次提交

iommu/vt-d: Fix RCU-list bugs in intel_iommu_init() · 2d48ea0e

由 Qian Cai 提交于 3月 05, 2020

There are several places traverse RCU-list without holding any lock in
intel_iommu_init(). Fix them by acquiring dmar_global_lock.

 WARNING: suspicious RCU usage
 -----------------------------
 drivers/iommu/intel-iommu.c:5216 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 no locks held by swapper/0/1.

 Call Trace:
  dump_stack+0xa0/0xea
  lockdep_rcu_suspicious+0x102/0x10b
  intel_iommu_init+0x947/0xb13
  pci_iommu_init+0x26/0x62
  do_one_initcall+0xfe/0x500
  kernel_init_freeable+0x45a/0x4f8
  kernel_init+0x11/0x139
  ret_from_fork+0x3a/0x50
 DMAR: Intel(R) Virtualization Technology for Directed I/O

Fixes: d8190dc6 ("iommu/vt-d: Enable DMA remapping after rmrr mapped")
Signed-off-by: NQian Cai <cai@lca.pw>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

2d48ea0e

03 3月, 2020 1 次提交

iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page · 77a1bce8

由 Yonghyun Hwang 提交于 2月 26, 2020

intel_iommu_iova_to_phys() has a bug when it translates an IOVA for a huge
page onto its corresponding physical address. This commit fixes the bug by
accomodating the level of page entry for the IOVA and adds IOVA's lower
address to the physical address.

Cc: <stable@vger.kernel.org>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: NMoritz Fischer <mdf@kernel.org>
Signed-off-by: NYonghyun Hwang <yonghyun@google.com>
Fixes: 38717946 ("VT-d: Changes to support KVM")
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

77a1bce8

19 2月, 2020 5 次提交

iommu/vt-d: Simplify check in identity_mapping() · 1ddb32da

由 Joerg Roedel 提交于 2月 17, 2020

The function only has one call-site and there it is never called with
dummy or deferred devices. Simplify the check in the function to
account for that.

Fixes: 1ee0186b ("iommu/vt-d: Refactor find_domain() helper")
Cc: stable@vger.kernel.org # v5.5
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

1ddb32da

iommu/vt-d: Remove deferred_attach_domain() · 96d170f3

由 Joerg Roedel 提交于 2月 17, 2020

The function is now only a wrapper around find_domain(). Remove the
function and call find_domain() directly at the call-sites.

Fixes: 1ee0186b ("iommu/vt-d: Refactor find_domain() helper")
Cc: stable@vger.kernel.org # v5.5
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

96d170f3

iommu/vt-d: Do deferred attachment in iommu_need_mapping() · a11bfde9

由 Joerg Roedel 提交于 2月 17, 2020

The attachment of deferred devices needs to happen before the check
whether the device is identity mapped or not. Otherwise the check will
return wrong results, cause warnings boot failures in kdump kernels, like

	WARNING: CPU: 0 PID: 318 at ../drivers/iommu/intel-iommu.c:592 domain_get_iommu+0x61/0x70

	[...]

	 Call Trace:
	  __intel_map_single+0x55/0x190
	  intel_alloc_coherent+0xac/0x110
	  dmam_alloc_attrs+0x50/0xa0
	  ahci_port_start+0xfb/0x1f0 [libahci]
	  ata_host_start.part.39+0x104/0x1e0 [libata]

With the earlier check the kdump boot succeeds and a crashdump is written.

Fixes: 1ee0186b ("iommu/vt-d: Refactor find_domain() helper")
Cc: stable@vger.kernel.org # v5.5
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

a11bfde9

iommu/vt-d: Move deferred device attachment into helper function · 034d98cc

由 Joerg Roedel 提交于 2月 17, 2020

Move the code that does the deferred device attachment into a separate
helper function.

Fixes: 1ee0186b ("iommu/vt-d: Refactor find_domain() helper")
Cc: stable@vger.kernel.org # v5.5
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

034d98cc

iommu/vt-d: Add attach_deferred() helper · 1d461597

由 Joerg Roedel 提交于 2月 17, 2020

Implement a helper function to check whether a device's attach process
is deferred.

Fixes: 1ee0186b ("iommu/vt-d: Refactor find_domain() helper")
Cc: stable@vger.kernel.org # v5.5
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

1d461597

25 1月, 2020 2 次提交

iommu/vt-d: Remove VMD child device sanity check · e3560ee4

由 Jon Derrick 提交于 1月 21, 2020

Remove the sanity check required for VMD child devices. The new
pci_real_dma_dev() DMA alias mechanism places them in the same IOMMU group
as the VMD endpoint. Assignment of the group would require assigning the
VMD endpoint, where unbinding the VMD endpoint removes the child device
domain from the hierarchy.

Link: https://lore.kernel.org/r/1579613871-301529-6-git-send-email-jonathan.derrick@intel.comSigned-off-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>

e3560ee4

iommu/vt-d: Use pci_real_dma_dev() for mapping · 2b0140c6

由 Jon Derrick 提交于 1月 21, 2020

The PCI device may have a DMA requester on another bus, such as VMD
subdevices needing to use the VMD endpoint. This case requires the real
DMA device for the IOMMU mapping, so use pci_real_dma_dev() to find that
device.

Link: https://lore.kernel.org/r/1579613871-301529-5-git-send-email-jonathan.derrick@intel.comSigned-off-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>

2b0140c6

24 1月, 2020 5 次提交

iommu/vt-d: Unnecessary to handle default identity domain · b89b6605

由 Lu Baolu 提交于 1月 15, 2020

The iommu default domain framework has been designed to take
care of setting identity default domain type. It's unnecessary
to handle this again in the VT-d driver. Hence, remove it.
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

b89b6605

iommu/vt-d: Allow devices with RMRRs to use identity domain · 9235cb13

由 Lu Baolu 提交于 1月 15, 2020

Since commit ea2447f7 ("intel-iommu: Prevent devices with
RMRRs from being placed into SI Domain"), the Intel IOMMU driver
doesn't allow any devices with RMRR locked to use the identity
domain. This was added to to fix the issue where the RMRR info
for devices being placed in and out of the identity domain gets
lost. This identity maps all RMRRs when setting up the identity
domain, so that devices with RMRRs could also use it.
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

9235cb13

iommu/vt-d: Add RMRR base and end addresses sanity check · ce4cc52b

由 Barret Rhoden 提交于 1月 15, 2020

The VT-d spec specifies requirements for the RMRR entries base and
end (called 'Limit' in the docs) addresses.

This commit will cause the DMAR processing to mark the firmware as
tainted if any RMRR entries that do not meet these requirements.
Signed-off-by: NBarret Rhoden <brho@google.com>
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

ce4cc52b

iommu/vt-d: Mark firmware tainted if RMRR fails sanity check · f5a68bb0

由 Barret Rhoden 提交于 1月 15, 2020

RMRR entries describe memory regions that are DMA targets for devices
outside the kernel's control.

RMRR entries that fail the sanity check are pointing to regions of
memory that the firmware did not tell the kernel are reserved or
otherwise should not be used.

Instead of aborting DMAR processing, this commit marks the firmware
as tainted. These RMRRs will still be identity mapped, otherwise,
some devices, e.x. graphic devices, will not work during boot.
Signed-off-by: NBarret Rhoden <brho@google.com>
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Fixes: f036c7fa ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved")
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

f5a68bb0

iommu/vt-d: Call __dmar_remove_one_dev_info with valid pointer · bf708cfb

由 Jerry Snitselaar 提交于 1月 21, 2020

It is possible for archdata.iommu to be set to
DEFER_DEVICE_DOMAIN_INFO or DUMMY_DEVICE_DOMAIN_INFO so check for
those values before calling __dmar_remove_one_dev_info. Without a
check it can result in a null pointer dereference. This has been seen
while booting a kdump kernel on an HP dl380 gen9.

Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org # 5.3+
Cc: linux-kernel@vger.kernel.org
Fixes: ae23bfb6 ("iommu/vt-d: Detach domain before using a private one")
Signed-off-by: NJerry Snitselaar <jsnitsel@redhat.com>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

bf708cfb

08 1月, 2020 2 次提交

iommu/vt-d: Unlink device if failed to add to group · f78947c4

由 Jon Derrick 提交于 12月 31, 2019

If the device fails to be added to the group, make sure to unlink the
reference before returning.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Fixes: 39ab9555 ("iommu: Add sysfs bindings for struct iommu_device")
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

f78947c4

iommu/vt-d: Fix adding non-PCI devices to Intel IOMMU · 4a350a0e

由 Patrick Steinhardt 提交于 12月 27, 2019

Starting with commit fa212a97 ("iommu/vt-d: Probe DMA-capable ACPI
name space devices"), we now probe DMA-capable ACPI name
space devices. On Dell XPS 13 9343, which has an Intel LPSS platform
device INTL9C60 enumerated via ACPI, this change leads to the following
warning:

    ------------[ cut here ]------------
    WARNING: CPU: 1 PID: 1 at pci_device_group+0x11a/0x130
    CPU: 1 PID: 1 Comm: swapper/0 Tainted: G                T 5.5.0-rc3+ #22
    Hardware name: Dell Inc. XPS 13 9343/0310JH, BIOS A20 06/06/2019
    RIP: 0010:pci_device_group+0x11a/0x130
    Code: f0 ff ff 48 85 c0 49 89 c4 75 c4 48 8d 74 24 10 48 89 ef e8 48 ef ff ff 48 85 c0 49 89 c4 75 af e8 db f7 ff ff 49 89 c4 eb a5 <0f> 0b 49 c7 c4 ea ff ff ff eb 9a e8 96 1e c7 ff 66 0f 1f 44 00 00
    RSP: 0000:ffffc0d6c0043cb0 EFLAGS: 00010202
    RAX: 0000000000000000 RBX: ffffa3d1d43dd810 RCX: 0000000000000000
    RDX: ffffa3d1d4fecf80 RSI: ffffa3d12943dcc0 RDI: ffffa3d1d43dd810
    RBP: ffffa3d1d43dd810 R08: 0000000000000000 R09: ffffa3d1d4c04a80
    R10: ffffa3d1d4c00880 R11: ffffa3d1d44ba000 R12: 0000000000000000
    R13: ffffa3d1d4383b80 R14: ffffa3d1d4c090d0 R15: ffffa3d1d4324530
    FS:  0000000000000000(0000) GS:ffffa3d1d6700000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000000460a001 CR4: 00000000003606e0
    Call Trace:
     ? iommu_group_get_for_dev+0x81/0x1f0
     ? intel_iommu_add_device+0x61/0x170
     ? iommu_probe_device+0x43/0xd0
     ? intel_iommu_init+0x1fa2/0x2235
     ? pci_iommu_init+0x52/0xe7
     ? e820__memblock_setup+0x15c/0x15c
     ? do_one_initcall+0xcc/0x27e
     ? kernel_init_freeable+0x169/0x259
     ? rest_init+0x95/0x95
     ? kernel_init+0x5/0xeb
     ? ret_from_fork+0x35/0x40
    ---[ end trace 28473e7abc25b92c ]---
    DMAR: ACPI name space devices didn't probe correctly

The bug results from the fact that while we now enumerate ACPI devices,
we aren't able to handle any non-PCI device when generating the device
group. Fix the issue by implementing an Intel-specific callback that
returns `pci_device_group` only if the device is a PCI device.
Otherwise, it will return a generic device group.

Fixes: fa212a97 ("iommu/vt-d: Probe DMA-capable ACPI name space devices")
Signed-off-by: NPatrick Steinhardt <ps@pks.im>
Cc: stable@vger.kernel.org # v5.3+
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

4a350a0e

07 1月, 2020 14 次提交

iommu/vt-d: debugfs: Add support to show page table internals · e2726dae

由 Lu Baolu 提交于 1月 02, 2020

Export page table internals of the domain attached to each device.
Example of such dump on a Skylake machine:

$ sudo cat /sys/kernel/debug/iommu/intel/domain_translation_struct
[ ... ]
Device 0000:00:14.0 with pasid 0 @0x15f3d9000
IOVA_PFN                PML5E                   PML4E
0x000000008ced0 |       0x0000000000000000      0x000000015f3da003
0x000000008ced1 |       0x0000000000000000      0x000000015f3da003
0x000000008ced2 |       0x0000000000000000      0x000000015f3da003
0x000000008ced3 |       0x0000000000000000      0x000000015f3da003
0x000000008ced4 |       0x0000000000000000      0x000000015f3da003
0x000000008ced5 |       0x0000000000000000      0x000000015f3da003
0x000000008ced6 |       0x0000000000000000      0x000000015f3da003
0x000000008ced7 |       0x0000000000000000      0x000000015f3da003
0x000000008ced8 |       0x0000000000000000      0x000000015f3da003
0x000000008ced9 |       0x0000000000000000      0x000000015f3da003

PDPE                    PDE                     PTE
0x000000015f3db003      0x000000015f3dc003      0x000000008ced0003
0x000000015f3db003      0x000000015f3dc003      0x000000008ced1003
0x000000015f3db003      0x000000015f3dc003      0x000000008ced2003
0x000000015f3db003      0x000000015f3dc003      0x000000008ced3003
0x000000015f3db003      0x000000015f3dc003      0x000000008ced4003
0x000000015f3db003      0x000000015f3dc003      0x000000008ced5003
0x000000015f3db003      0x000000015f3dc003      0x000000008ced6003
0x000000015f3db003      0x000000015f3dc003      0x000000008ced7003
0x000000015f3db003      0x000000015f3dc003      0x000000008ced8003
0x000000015f3db003      0x000000015f3dc003      0x000000008ced9003
[ ... ]
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

e2726dae

iommu/vt-d: Use iova over first level · b802d070

由 Lu Baolu 提交于 1月 02, 2020

After we make all map/unmap paths support first level page table.
Let's turn it on if hardware supports scalable mode.
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

b802d070

iommu/vt-d: Update first level super page capability · 64229e8f

由 Lu Baolu 提交于 1月 02, 2020

First-level translation may map input addresses to 4-KByte pages,
2-MByte pages, or 1-GByte pages. Support for 4-KByte pages and
2-Mbyte pages are mandatory for first-level translation. Hardware
support for 1-GByte page is reported through the FL1GP field in
the Capability Register.
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

64229e8f

iommu/vt-d: Make first level IOVA canonical · cb8b892d

由 Lu Baolu 提交于 1月 02, 2020

First-level translation restricts the input-address to a canonical
address (i.e., address bits 63:N have the same value as address
bit [N-1], where N is 48-bits with 4-level paging and 57-bits with
5-level paging). (section 3.6 in the spec)

This makes first level IOVA canonical by using IOVA with bit [N-1]
always cleared.
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

cb8b892d

iommu/vt-d: Flush PASID-based iotlb for iova over first level · 33cd6e64

由 Lu Baolu 提交于 1月 02, 2020

When software has changed first-level tables, it should invalidate
the affected IOTLB and the paging-structure-caches using the PASID-
based-IOTLB Invalidate Descriptor defined in spec 6.5.2.4.
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

33cd6e64

iommu/vt-d: Setup pasid entries for iova over first level · ddf09b6d

由 Lu Baolu 提交于 1月 02, 2020

Intel VT-d in scalable mode supports two types of page tables for
IOVA translation: first level and second level. The IOMMU driver
can choose one from both for IOVA translation according to the use
case. This sets up the pasid entry if a domain is selected to use
the first-level page table for iova translation.
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

ddf09b6d

iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr · 2cd1311a

由 Lu Baolu 提交于 1月 02, 2020

This adds the Intel VT-d specific callback of setting
DOMAIN_ATTR_NESTING domain attribution. It is necessary
to let the VT-d driver know that the domain represents
a virtual machine which requires the IOMMU hardware to
support nested translation mode. Return success if the
IOMMU hardware suports nested mode, otherwise failure.
Signed-off-by: NYi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

2cd1311a

iommu/vt-d: Identify domains using first level page table · a1948f2e

由 Lu Baolu 提交于 1月 02, 2020

This checks whether a domain should use the first level page
table for map/unmap and marks it in the domain structure.
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

a1948f2e

iommu/vt-d: Loose requirement for flush queue initializaton · 8e3391cf

由 Lu Baolu 提交于 1月 02, 2020

Currently if flush queue initialization fails, we return error
or enforce the system-wide strict mode. These are unnecessary
because we always check the existence of a flush queue before
queuing any iova's for lazy flushing. Printing a informational
message is enough.
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

8e3391cf

iommu/vt-d: Avoid iova flush queue in strict mode · 10f8008f

由 Lu Baolu 提交于 1月 02, 2020

If Intel IOMMU strict mode is enabled by users, it's unnecessary
to create the iova flush queue.
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

10f8008f

iommu/vt-d: trace: Extend map_sg trace event · 984d03ad

由 Lu Baolu 提交于 1月 02, 2020

Current map_sg stores trace message in a coarse manner. This
extends it so that more detailed messages could be traced.

The map_sg trace message looks like:

map_sg: dev=0000:00:17.0 [1/9] dev_addr=0xf8f90000 phys_addr=0x158051000 size=4096
map_sg: dev=0000:00:17.0 [2/9] dev_addr=0xf8f91000 phys_addr=0x15a858000 size=4096
map_sg: dev=0000:00:17.0 [3/9] dev_addr=0xf8f92000 phys_addr=0x15aa13000 size=4096
map_sg: dev=0000:00:17.0 [4/9] dev_addr=0xf8f93000 phys_addr=0x1570f1000 size=8192
map_sg: dev=0000:00:17.0 [5/9] dev_addr=0xf8f95000 phys_addr=0x15c6d0000 size=4096
map_sg: dev=0000:00:17.0 [6/9] dev_addr=0xf8f96000 phys_addr=0x157194000 size=4096
map_sg: dev=0000:00:17.0 [7/9] dev_addr=0xf8f97000 phys_addr=0x169552000 size=4096
map_sg: dev=0000:00:17.0 [8/9] dev_addr=0xf8f98000 phys_addr=0x169dde000 size=4096
map_sg: dev=0000:00:17.0 [9/9] dev_addr=0xf8f99000 phys_addr=0x148351000 size=4096
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

984d03ad

iommu/vt-d: Replace Intel specific PASID allocator with IOASID · 59a62337

由 Jacob Pan 提交于 1月 02, 2020

Make use of generic IOASID code to manage PASID allocation,
free, and lookup. Replace Intel specific code.
Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

59a62337

iommu/vt-d: Fix CPU and IOMMU SVM feature matching checks · ff3dc652

由 Jacob Pan 提交于 1月 02, 2020

Shared Virtual Memory(SVM) is based on a collective set of hardware
features detected at runtime. There are requirements for matching CPU
and IOMMU capabilities.

The current code checks CPU and IOMMU feature set for SVM support but
the result is never stored nor used. Therefore, SVM can still be used
even when these checks failed. The consequences can be:
1. CPU uses 5-level paging mode for virtual address of 57 bits, but
IOMMU can only support 4-level paging mode with 48 bits address for DMA.
2. 1GB page size is used by CPU but IOMMU does not support it. VT-d
unrecoverable faults may be generated.

The best solution to fix these problems is to prevent them in the first
place.

This patch consolidates code for checking PASID, CPU vs. IOMMU paging
mode compatibility, as well as provides specific error messages for
each failed checks. On sane hardware configurations, these error message
shall never appear in kernel log.
Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

ff3dc652

iommu/vt-d: Add Kconfig option to enable/disable scalable mode · 04618252

由 Lu Baolu 提交于 1月 02, 2020

This adds Kconfig option INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON
to make it easier for distributions to enable or disable the
Intel IOMMU scalable mode by default during kernel build.
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

04618252

23 12月, 2019 2 次提交

iommu: intel: Use generic_iommu_put_resv_regions() · 0ecdebb7

由 Thierry Reding 提交于 12月 18, 2019

Use the new standard function instead of open-coding it.

Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: NThierry Reding <treding@nvidia.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

0ecdebb7

iommu/iova: Silence warnings under memory pressure · 944c9175

由 Qian Cai 提交于 11月 22, 2019

When running heavy memory pressure workloads, this 5+ old system is
throwing endless warnings below because disk IO is too slow to recover
from swapping. Since the volume from alloc_iova_fast() could be large,
once it calls printk(), it will trigger disk IO (writing to the log
files) and pending softirqs which could cause an infinite loop and make
no progress for days by the ongoimng memory reclaim. This is the counter
part for Intel where the AMD part has already been merged. See the
commit 3d708895 ("iommu/amd: Silence warnings under memory
pressure"). Since the allocation failure will be reported in
intel_alloc_iova(), so just call dev_err_once() there because even the
"ratelimited" is too much, and silence the one in alloc_iova_mem() to
avoid the expensive warn_alloc().

 hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
 hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
 hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
 hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
 hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
 hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
 hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
 hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
 slab_out_of_memory: 66 callbacks suppressed
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
   node 0: slabs: 1822, objs: 16398, free: 0
   node 1: slabs: 2051, objs: 18459, free: 31
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
   node 0: slabs: 1822, objs: 16398, free: 0
   node 1: slabs: 2051, objs: 18459, free: 31
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 1: slabs: 381, objs: 2286, free: 27
   node 1: slabs: 381, objs: 2286, free: 27
   node 1: slabs: 381, objs: 2286, free: 27
   node 1: slabs: 381, objs: 2286, free: 27
   node 0: slabs: 1822, objs: 16398, free: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 1: slabs: 2051, objs: 18459, free: 31
   node 0: slabs: 697, objs: 4182, free: 0
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   node 1: slabs: 381, objs: 2286, free: 27
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 1: slabs: 381, objs: 2286, free: 27
 hpsa 0000:03:00.0: DMAR: Allocating 1-page iova failed
 warn_alloc: 96 callbacks suppressed
 kworker/11:1H: page allocation failure: order:0,
mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1
 CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: G    B
 Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19
12/27/2015
 Workqueue: kblockd blk_mq_run_work_fn
 Call Trace:
  dump_stack+0xa0/0xea
  warn_alloc.cold.94+0x8a/0x12d
  __alloc_pages_slowpath+0x1750/0x1870
  __alloc_pages_nodemask+0x58a/0x710
  alloc_pages_current+0x9c/0x110
  alloc_slab_page+0xc9/0x760
  allocate_slab+0x48f/0x5d0
  new_slab+0x46/0x70
  ___slab_alloc+0x4ab/0x7b0
  __slab_alloc+0x43/0x70
  kmem_cache_alloc+0x2dd/0x450
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
  alloc_iova+0x33/0x210
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 0: slabs: 697, objs: 4182, free: 0
  alloc_iova_fast+0x62/0x3d1
   node 1: slabs: 381, objs: 2286, free: 27
  intel_alloc_iova+0xce/0xe0
  intel_map_sg+0xed/0x410
  scsi_dma_map+0xd7/0x160
  scsi_queue_rq+0xbf7/0x1310
  blk_mq_dispatch_rq_list+0x4d9/0xbc0
  blk_mq_sched_dispatch_requests+0x24a/0x300
  __blk_mq_run_hw_queue+0x156/0x230
  blk_mq_run_work_fn+0x3b/0x40
  process_one_work+0x579/0xb90
  worker_thread+0x63/0x5b0
  kthread+0x1e6/0x210
  ret_from_fork+0x3a/0x50
 Mem-Info:
 active_anon:2422723 inactive_anon:361971 isolated_anon:34403
  active_file:2285 inactive_file:1838 isolated_file:0
  unevictable:0 dirty:1 writeback:5 unstable:0
  slab_reclaimable:13972 slab_unreclaimable:453879
  mapped:2380 shmem:154 pagetables:6948 bounce:0
  free:19133 free_pcp:7363 free_cma:0
Signed-off-by: NQian Cai <cai@lca.pw>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

944c9175

17 12月, 2019 3 次提交

iommu/vt-d: Allocate reserved region for ISA with correct permission · cde9319e

由 Jerry Snitselaar 提交于 12月 12, 2019

Currently the reserved region for ISA is allocated with no
permissions. If a dma domain is being used, mapping this region will
fail. Set the permissions to DMA_PTE_READ|DMA_PTE_WRITE.

Cc: Joerg Roedel <jroedel@suse.de>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: stable@vger.kernel.org # v5.3+
Fixes: d850c2ee ("iommu/vt-d: Expose ISA direct mapping region via iommu_get_resv_regions")
Signed-off-by: NJerry Snitselaar <jsnitsel@redhat.com>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

cde9319e

iommu/vt-d: Fix dmar pte read access not set error · 75d18385

由 Lu Baolu 提交于 12月 11, 2019

If the default DMA domain of a group doesn't fit a device, it
will still sit in the group but use a private identity domain.
When map/unmap/iova_to_phys come through iommu API, the driver
should still serve them, otherwise, other devices in the same
group will be impacted. Since identity domain has been mapped
with the whole available memory space and RMRRs, we don't need
to worry about the impact on it.

Link: https://www.spinics.net/lists/iommu/msg40416.html
Cc: Jerry Snitselaar <jsnitsel@redhat.com>
Reported-by: NJerry Snitselaar <jsnitsel@redhat.com>
Fixes: 942067f1 ("iommu/vt-d: Identify default domains replaced with private")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Tested-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

75d18385

iommu/vt-d: Set ISA bridge reserved region as relaxable · d8018a0e

由 Alex Williamson 提交于 12月 11, 2019

Commit d850c2ee ("iommu/vt-d: Expose ISA direct mapping region via
iommu_get_resv_regions") created a direct-mapped reserved memory region
in order to replace the static identity mapping of the ISA address
space, where the latter was then removed in commit df4f3c60
("iommu/vt-d: Remove static identity map code").  According to the
history of this code and the Kconfig option surrounding it, this direct
mapping exists for the benefit of legacy ISA drivers that are not
compatible with the DMA API.

In conjuntion with commit 9b77e5c7 ("vfio/type1: check dma map
request is within a valid iova range") this change introduced a
regression where the vfio IOMMU backend enforces reserved memory regions
per IOMMU group, preventing userspace from creating IOMMU mappings
conflicting with prescribed reserved regions.  A necessary prerequisite
for the vfio change was the introduction of "relaxable" direct mappings
introduced by commit adfd3738 ("iommu: Introduce
IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions").  These relaxable
direct mappings provide the same identity mapping support in the default
domain, but also indicate that the reservation is software imposed and
may be relaxed under some conditions, such as device assignment.

Convert the ISA bridge direct-mapped reserved region to relaxable to
reflect that the restriction is self imposed and need not be enforced
by drivers such as vfio.

Fixes: 1c5c59fb ("iommu/vt-d: Differentiate relaxable and non relaxable RMRRs")
Cc: stable@vger.kernel.org # v5.3+
Link: https://lore.kernel.org/linux-iommu/20191211082304.2d4fab45@x1.homeReported-by: Ncprt <cprt@protonmail.com>
Tested-by: Ncprt <cprt@protonmail.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Tested-by: NJerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

d8018a0e

11 11月, 2019 2 次提交

iommu/vt-d: Turn off translations at shutdown · 6c3a44ed

由 Deepa Dinamani 提交于 11月 10, 2019

The intel-iommu driver assumes that the iommu state is
cleaned up at the start of the new kernel.
But, when we try to kexec boot something other than the
Linux kernel, the cleanup cannot be relied upon.
Hence, cleanup before we go down for reboot.

Keeping the cleanup at initialization also, in case BIOS
leaves the IOMMU enabled.

I considered turning off iommu only during kexec reboot, but a clean
shutdown seems always a good idea. But if someone wants to make it
conditional, such as VMM live update, we can do that.  There doesn't
seem to be such a condition at this time.

Tested that before, the info message
'DMAR: Translation was enabled for <iommu> but we are not in kdump mode'
would be reported for each iommu. The message will not appear when the
DMA-remapping is not enabled on entry to the kernel.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Acked-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

6c3a44ed

iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved · f036c7fa

由 Yian Chen 提交于 10月 17, 2019

VT-d RMRR (Reserved Memory Region Reporting) regions are reserved
for device use only and should not be part of allocable memory pool of OS.

BIOS e820_table reports complete memory map to OS, including OS usable
memory ranges and BIOS reserved memory ranges etc.

x86 BIOS may not be trusted to include RMRR regions as reserved type
of memory in its e820 memory map, hence validate every RMRR entry
with the e820 memory map to make sure the RMRR regions will not be
used by OS for any other purposes.

ia64 EFI is working fine so implement RMRR validation as a dummy function
Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: NSohil Mehta <sohil.mehta@intel.com>
Signed-off-by: NYian Chen <yian.chen@intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

f036c7fa

30 10月, 2019 1 次提交

iommu/vt-d: Fix panic after kexec -p for kdump · 160c63f9

由 John Donnelly 提交于 10月 21, 2019

This cures a panic on restart after a kexec operation on 5.3 and 5.4
kernels.

The underlying state of the iommu registers (iommu->flags &
VTD_FLAG_TRANS_PRE_ENABLED) on a restart results in a domain being marked as
"DEFER_DEVICE_DOMAIN_INFO" that produces an Oops in identity_mapping().

[   43.654737] BUG: kernel NULL pointer dereference, address:
0000000000000056
[   43.655720] #PF: supervisor read access in kernel mode
[   43.655720] #PF: error_code(0x0000) - not-present page
[   43.655720] PGD 0 P4D 0
[   43.655720] Oops: 0000 [#1] SMP PTI
[   43.655720] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.3.2-1940.el8uek.x86_64 #1
[   43.655720] Hardware name: Oracle Corporation ORACLE SERVER
X5-2/ASM,MOTHERBOARD,1U, BIOS 30140300 09/20/2018
[   43.655720] RIP: 0010:iommu_need_mapping+0x29/0xd0
[   43.655720] Code: 00 0f 1f 44 00 00 48 8b 97 70 02 00 00 48 83 fa ff
74 53 48 8d 4a ff b8 01 00 00 00 48 83 f9 fd 76 01 c3 48 8b 35 7f 58 e0
01 <48> 39 72 58 75 f2 55 48 89 e5 41 54 53 48 8b 87 28 02 00 00 4c 8b
[   43.655720] RSP: 0018:ffffc9000001b9b0 EFLAGS: 00010246
[   43.655720] RAX: 0000000000000001 RBX: 0000000000001000 RCX:
fffffffffffffffd
[   43.655720] RDX: fffffffffffffffe RSI: ffff8880719b8000 RDI:
ffff8880477460b0
[   43.655720] RBP: ffffc9000001b9e8 R08: 0000000000000000 R09:
ffff888047c01700
[   43.655720] R10: 00002194036fc692 R11: 0000000000000000 R12:
0000000000000000
[   43.655720] R13: ffff8880477460b0 R14: 0000000000000cc0 R15:
ffff888072d2b558
[   43.655720] FS:  0000000000000000(0000) GS:ffff888071c00000(0000)
knlGS:0000000000000000
[   43.655720] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   43.655720] CR2: 0000000000000056 CR3: 000000007440a002 CR4:
00000000001606b0
[   43.655720] Call Trace:
[   43.655720]  ? intel_alloc_coherent+0x2a/0x180
[   43.655720]  ? __schedule+0x2c2/0x650
[   43.655720]  dma_alloc_attrs+0x8c/0xd0
[   43.655720]  dma_pool_alloc+0xdf/0x200
[   43.655720]  ehci_qh_alloc+0x58/0x130
[   43.655720]  ehci_setup+0x287/0x7ba
[   43.655720]  ? _dev_info+0x6c/0x83
[   43.655720]  ehci_pci_setup+0x91/0x436
[   43.655720]  usb_add_hcd.cold.48+0x1d4/0x754
[   43.655720]  usb_hcd_pci_probe+0x2bc/0x3f0
[   43.655720]  ehci_pci_probe+0x39/0x40
[   43.655720]  local_pci_probe+0x47/0x80
[   43.655720]  pci_device_probe+0xff/0x1b0
[   43.655720]  really_probe+0xf5/0x3a0
[   43.655720]  driver_probe_device+0xbb/0x100
[   43.655720]  device_driver_attach+0x58/0x60
[   43.655720]  __driver_attach+0x8f/0x150
[   43.655720]  ? device_driver_attach+0x60/0x60
[   43.655720]  bus_for_each_dev+0x74/0xb0
[   43.655720]  driver_attach+0x1e/0x20
[   43.655720]  bus_add_driver+0x151/0x1f0
[   43.655720]  ? ehci_hcd_init+0xb2/0xb2
[   43.655720]  ? do_early_param+0x95/0x95
[   43.655720]  driver_register+0x70/0xc0
[   43.655720]  ? ehci_hcd_init+0xb2/0xb2
[   43.655720]  __pci_register_driver+0x57/0x60
[   43.655720]  ehci_pci_init+0x6a/0x6c
[   43.655720]  do_one_initcall+0x4a/0x1fa
[   43.655720]  ? do_early_param+0x95/0x95
[   43.655720]  kernel_init_freeable+0x1bd/0x262
[   43.655720]  ? rest_init+0xb0/0xb0
[   43.655720]  kernel_init+0xe/0x110
[   43.655720]  ret_from_fork+0x24/0x50

Fixes: 8af46c78 ("iommu/vt-d: Implement is_attach_deferred iommu ops entry")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: NJohn Donnelly <john.p.donnelly@oracle.com>
Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

160c63f9

18 10月, 2019 1 次提交

iommu/vt-d: Return the correct dma mask when we are bypassing the IOMMU · 9c24eaf8

由 Arvind Sankar 提交于 10月 08, 2019

We must return a mask covering the full physical RAM when bypassing the
IOMMU mapping. Also, in iommu_need_mapping, we need to check using
dma_direct_get_required_mask to ensure that the device's dma_mask can
cover physical RAM before deciding to bypass IOMMU mapping.

Based on an earlier patch from Christoph Hellwig.

Fixes: 249baa54 ("dma-mapping: provide a better default ->get_required_mask")
Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com>
Acked-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9c24eaf8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功