提交 · 3906b91844d603153c094f636205ec9aa5454b2f · openanolis / cloud-kernel

08 12月, 2016 1 次提交

PCI: vmd: Use SRCU as a local RCU to prevent delaying global RCU · 3906b918

由 Jon Derrick 提交于 11月 11, 2016

SRCU lets synchronize_srcu() depend on VMD-local RCU primitives, preventing
long delays from locking up RCU in other systems.  VMD performs a
synchronize when removing a device, but will hit all IRQ lists if the
device uses all VMD vectors.  This patch will not help VMD's RCU
synchronization, but will isolate the read side delays to the VMD
subsystem.  Additionally, the use of SRCU in VMD's ISR will keep it
isolated from any other RCU waiters in the rest of the system.

Tested using concurrent FIO and NVMe resets:

  [global]
  rw=read
  bs=4k
  direct=1
  ioengine=libaio
  iodepth=32
  norandommap
  timeout=300
  runtime=1000000000

  [nvme0]
  cpus_allowed=0-63
  numjobs=8
  filename=/dev/nvme0n1

  [nvme1]
  cpus_allowed=0-63
  numjobs=8
  filename=/dev/nvme1n1

  while (true) do
    for i in /sys/class/nvme/nvme*; do
      echo "Resetting ${i##*/}"
      echo 1 > $i/reset_controller;
      sleep 5
    done;
  done
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

3906b918

12 11月, 2016 1 次提交

PCI: vmd: Remove unnecessary pci_set_drvdata() · 5b23e8fa

由 Wei Yongjun 提交于 10月 17, 2016

The driver core clears the driver data to NULL after device_release or on
probe failure.  Thus, it is not needed to manually clear the device driver
data to NULL.
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

5b23e8fa

05 10月, 2016 1 次提交

x86/PCI: VMD: Move VMD driver to drivers/pci/host · 181ffd19

由 Keith Busch 提交于 10月 04, 2016

Move the driver source and Kconfig to the PCI host bridge drivers directory
and move the config option to a more appropriate sub-menu instead of
occupying the top-level location.

Update the Kconfig option with the X86_64 dependency that was implicitly
included from the previous location, and add information about the module
name when built as a loadable module.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
CC: Jon Derrick <jonathan.derrick@intel.com>

181ffd19

23 9月, 2016 1 次提交

x86/PCI: VMD: Request userspace control of PCIe hotplug indicators · 3161832d

由 Keith Busch 提交于 9月 13, 2016

Add set_dev_domain_options() to set PCI domain-specific options as devices
are added.  The first usage is to request exclusive userspace control of
PCIe hotplug indicators in VMD domains.

Devices in a VMD domain use PCIe hotplug Attention and Power Indicators in
a non-standard way; tell pciehp to ignore the indicators so userspace can
control them via the sysfs "attention" file.

To determine whether a bus is within a VMD domain, add a bool to the
pci_sysdata structure that the VMD driver sets during initialization.

[bhelgaas: changelog]
Requested-by: NKapil Karkra <kapil.karkra@intel.com>
Tested-by: NArtur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

3161832d

20 9月, 2016 5 次提交

x86/PCI: VMD: Synchronize with RCU freeing MSI IRQ descs · ee6ee49f

由 Keith Busch 提交于 8月 04, 2016

Fix a potential race when disabling MSI/MSI-X on a VMD domain device. If
the VMD interrupt service is running, it may see a disabled IRQ. We can
synchronize RCU just before freeing the MSI descriptor. This is safe since
the irq_desc lock isn't held, and the descriptor is valid even though it is
disabled. After vmd_msi_free(), though, the handler is reinitialized to
handle_bad_irq(), so we can't let the VMD ISR's list iteration see the
disabled IRQ after this.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>

ee6ee49f

x86/PCI: VMD: Eliminate index member from IRQ list · b3182227

由 Jon Derrick 提交于 9月 02, 2016

Use math to discover the IRQ list index number relative to the IRQ list
head.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NKeith Busch <keith.busch@intel.com>

b3182227

x86/PCI: VMD: Eliminate vmd_vector member from list type · 53db86ad

由 Jon Derrick 提交于 9月 02, 2016

Eliminate unused vmd and vector members from vmd_irq_list and discover the
vector using pci_irq_vector().
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NKeith Busch <keith.busch@intel.com>

53db86ad

x86/PCI: VMD: Convert to use pci_alloc_irq_vectors() API · 75de9b4c

由 Jon Derrick 提交于 8月 29, 2016

Convert to use the pci_alloc_irq_vectors() API.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

75de9b4c

x86/PCI: VMD: Allocate IRQ lists with correct MSI-X count · c68db515

由 Jon Derrick 提交于 8月 29, 2016

To reduce the amount of memory required for IRQ lists, only allocate their
space after calling pci_msix_enable_range() which may reduce the number of
MSI-X vectors allocated.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

c68db515

24 8月, 2016 1 次提交

x86/PCI: VMD: Fix infinite loop executing irq's · 21c80c9f

由 Keith Busch 提交于 8月 23, 2016

We can't initialize the list head on deletion as this causes the node to
point to itself, which causes an infinite loop if vmd_irq() happens to be
servicing that node.

The list initialization was trying to fix a bug from multiple calls to
disable the same IRQ.  Fix this instead by having the VMD driver track if
the interrupt is enabled.

[bhelgaas: changelog, add "Fixes"]
Fixes: 97e92306 ("x86/PCI: VMD: Initialize list item in IRQ disable")
Reported-by: NGrzegorz Koczot <grzegorz.koczot@intel.com>
Tested-by: NMiroslaw Drost <miroslaw.drost@intel.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>

21c80c9f

04 8月, 2016 1 次提交

dma-mapping: use unsigned long for dma_attrs · 00085f1e

由 Krzysztof Kozlowski 提交于 8月 03, 2016

The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer.  Thus the pointer can point to const data.
However the attributes do not have to be a bitfield.  Instead unsigned
long will do fine:

1. This is just simpler.  Both in terms of reading the code and setting
   attributes.  Instead of initializing local attributes on the stack
   and passing pointer to it to dma_set_attr(), just set the bits.

2. It brings safeness and checking for const correctness because the
   attributes are passed by value.

Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: NVineet Gupta <vgupta@synopsys.com>
Acked-by: NRobin Murphy <robin.murphy@arm.com>
Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00085f1e

21 6月, 2016 3 次提交

x86/PCI: VMD: Separate MSI and MSI-X vector sharing · 9c205304

由 Keith Busch 提交于 6月 20, 2016

Child devices in a VMD domain that want to use MSI are slowing down MSI-X
using devices sharing the same vectors.  Move all MSI usage to a single VMD
vector, and MSI-X devices can share the rest.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NJon Derrick <jonathan.derrick@intel.com>

9c205304

x86/PCI: VMD: Use x86_vector_domain as parent domain · e382dffc

由 Keith Busch 提交于 6月 20, 2016

Otherwise APIC code assumes VMD's IRQ domain can be managed by the APIC,
resulting in an invalid cast of irq_data during irq_force_complete_move().
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

e382dffc

x86/PCI: VMD: Use lock save/restore in interrupt enable path · 3f57ff4f

由 Jon Derrick 提交于 6月 20, 2016

Enabling interrupts may result in an interrupt raised and serviced while
VMD holds a lock, resulting in contention with the spin lock held while
enabling interrupts.

The solution is to disable preemption and save/restore the state during
interrupt enable and disable.

Fixes lockdep:

  ======================================================
  [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
  4.6.0-2016-06-16-lockdep+ #47 Tainted: G            E
  ------------------------------------------------------
  kworker/0:1/447 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
   (list_lock){+.+...}, at: [<ffffffffa04eb8fc>] vmd_irq_enable+0x3c/0x70 [vmd]

  and this task is already holding:
   (&irq_desc_lock_class){-.-...}, at: [<ffffffff810e1ff6>] __setup_irq+0xa6/0x610
  which would create a new lock dependency:
   (&irq_desc_lock_class){-.-...} -> (list_lock){+.+...}

  but this new dependency connects a HARDIRQ-irq-safe lock:
   (&irq_desc_lock_class){-.-...}
  ... which became HARDIRQ-irq-safe at:
    [<ffffffff810c9f21>] __lock_acquire+0x981/0xe00
    [<ffffffff810cb039>] lock_acquire+0x119/0x220
    [<ffffffff8167294d>] _raw_spin_lock+0x3d/0x80
    [<ffffffff810e36d4>] handle_level_irq+0x24/0x110
    [<ffffffff8101f20a>] handle_irq+0x1a/0x30
    [<ffffffff81675fc1>] do_IRQ+0x61/0x120
    [<ffffffff8167404c>] ret_from_intr+0x0/0x20
    [<ffffffff81672e30>] _raw_spin_unlock_irqrestore+0x40/0x60
    [<ffffffff810e21ee>] __setup_irq+0x29e/0x610
    [<ffffffff810e25a1>] setup_irq+0x41/0x90
    [<ffffffff81f5777f>] setup_default_timer_irq+0x1e/0x20
    [<ffffffff81f57798>] hpet_time_init+0x17/0x19
    [<ffffffff81f5775a>] x86_late_time_init+0xa/0x11
    [<ffffffff81f51e9b>] start_kernel+0x382/0x436
    [<ffffffff81f51308>] x86_64_start_reservations+0x2a/0x2c
    [<ffffffff81f51445>] x86_64_start_kernel+0x13b/0x14a

  to a HARDIRQ-irq-unsafe lock:
   (list_lock){+.+...}
  ... which became HARDIRQ-irq-unsafe at:
  ...  [<ffffffff810c9d8e>] __lock_acquire+0x7ee/0xe00
    [<ffffffff810cb039>] lock_acquire+0x119/0x220
    [<ffffffff8167294d>] _raw_spin_lock+0x3d/0x80
    [<ffffffffa04eba42>] vmd_msi_init+0x72/0x150 [vmd]
    [<ffffffff810e8597>] msi_domain_alloc+0xb7/0x140
    [<ffffffff810e6b10>] irq_domain_alloc_irqs_recursive+0x40/0xa0
    [<ffffffff810e6cea>] __irq_domain_alloc_irqs+0x14a/0x330
    [<ffffffff810e8a8c>] msi_domain_alloc_irqs+0x8c/0x1d0
    [<ffffffff813ca4e3>] pci_msi_setup_msi_irqs+0x43/0x70
    [<ffffffff813cada1>] pci_enable_msi_range+0x131/0x280
    [<ffffffff813bf5e0>] pcie_port_device_register+0x320/0x4e0
    [<ffffffff813bf9a4>] pcie_portdrv_probe+0x34/0x60
    [<ffffffff813b0e85>] local_pci_probe+0x45/0xa0
    [<ffffffff813b226b>] pci_device_probe+0xdb/0x130
    [<ffffffff8149e3cc>] driver_probe_device+0x22c/0x440
    [<ffffffff8149e774>] __device_attach_driver+0x94/0x110
    [<ffffffff8149bfad>] bus_for_each_drv+0x5d/0x90
    [<ffffffff8149e030>] __device_attach+0xc0/0x140
    [<ffffffff8149e0c0>] device_attach+0x10/0x20
    [<ffffffff813a77f7>] pci_bus_add_device+0x47/0x90
    [<ffffffff813a7879>] pci_bus_add_devices+0x39/0x70
    [<ffffffff813aaba7>] pci_rescan_bus+0x27/0x30
    [<ffffffffa04ec1af>] vmd_probe+0x68f/0x76c [vmd]
    [<ffffffff813b0e85>] local_pci_probe+0x45/0xa0
    [<ffffffff81088064>] work_for_cpu_fn+0x14/0x20
    [<ffffffff8108c244>] process_one_work+0x1f4/0x740
    [<ffffffff8108c9c6>] worker_thread+0x236/0x4f0
    [<ffffffff810935c2>] kthread+0xf2/0x110
    [<ffffffff816738f2>] ret_from_fork+0x22/0x50

  other info that might help us debug this:

   Possible interrupt unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(list_lock);
				 local_irq_disable();
				 lock(&irq_desc_lock_class);
				 lock(list_lock);
    <Interrupt>
      lock(&irq_desc_lock_class);

   *** DEADLOCK ***
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NKeith Busch <keith.busch@intel.com>

3f57ff4f

18 6月, 2016 1 次提交

x86/PCI/VMD: Use untracked irq handler · 30ce0350

由 Keith Busch 提交于 6月 17, 2016

There is no way to know which device in a VMD triggered an interrupt
without invoking every registered driver's actions. This uses the
untracked irq handler so that a less used device does not trigger
spurious interrupt.

We have been previously recommending users to enable "noirqdebug", but do
not want to force a system setting just to keep this domain functional.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Acked-by: NBjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Cc: Jon Derrick <jonathan.derrick@intel.com>
Link: http://lkml.kernel.org/r/1466200821-29159-2-git-send-email-keith.busch@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

30ce0350

14 6月, 2016 2 次提交

x86/PCI: VMD: Initialize list item in IRQ disable · 97e92306

由 Keith Busch 提交于 5月 17, 2016

Multiple calls to disable an IRQ would have caused the driver to
dereference a poisoned list item.  This re-initializes the list to allow
multiple requests to disable the IRQ.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>

97e92306

x86/PCI: VMD: Select device dma ops to override · ca8a8fab

由 Keith Busch 提交于 5月 17, 2016

VMD device doesn't usually have device archdata specific dma_ops, so we
need to override the default ops for VMD devices.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by Jon Derrick: <jonathan.derrick@intel.com>

ca8a8fab

11 3月, 2016 3 次提交

x86/PCI: VMD: Attach VMD resources to parent domain's resource tree · 2c2c5c5c

由 Jon Derrick 提交于 2月 24, 2016

Attach the new VMD domain's resources to the VMD device's resources.  This
allows /proc/iomem to display a more complete picture.

Before:
  c0000000-c1ffffff : 0000:5d:05.5
  c2000000-c3ffffff : 0000:5d:05.5
    c2010000-c2013fff : nvme
  c4000000-c40fffff : 0000:5d:05.5

After:
  c0000000-c1ffffff : 0000:5d:05.5
  c2000000-c3ffffff : 0000:5d:05.5
    c2000000-c3ffffff : VMD MEMBAR1
      c2000000-c22fffff : PCI Bus 10000:01
        c2000000-c200ffff : 10000:01:00.0
        c2010000-c2013fff : 10000:01:00.0
          c2010000-c2013fff : nvme
      c2300000-c24fffff : PCI Bus 10000:01
  c4000000-c40fffff : 0000:5d:05.5
    c4002000-c40fffff : VMD MEMBAR2
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

2c2c5c5c

x86/PCI: VMD: Set bus resource start to 0 · d068c350

由 Keith Busch 提交于 3月 02, 2016

The bus always starts at 0.  Due to alignment and down-casting, this
happened to work before, but looked alarmingly incorrect in kernel logs.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

d068c350

x86/PCI: VMD: Document code for maintainability · 83cc54a6

由 Keith Busch 提交于 3月 02, 2016

Comment the less obvious portion of the code for setting up memory windows,
and the platform dependency for initializing the h/w with appropriate
resources.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

83cc54a6

16 1月, 2016 1 次提交

x86/PCI: Add driver for Intel Volume Management Device (VMD) · 185a383a

由 Keith Busch 提交于 1月 12, 2016

The Intel Volume Management Device (VMD) is a Root Complex Integrated
Endpoint that acts as a host bridge to a secondary PCIe domain. BIOS can
reassign one or more Root Ports to appear within a VMD domain instead of
the primary domain. The immediate benefit is that additional PCIe domains
allow more than 256 buses in a system by letting bus numbers be reused
across different domains.

VMD domains do not define ACPI _SEG, so to avoid domain clashing with host
bridges defining this segment, VMD domains start at 0x10000, which is
greater than the highest possible 16-bit ACPI defined _SEG.

This driver enumerates and enables the domain using the root bus
configuration interface provided by the PCI subsystem. The driver provides
configuration space accessor functions (pci_ops), bus and memory resources,
an MSI IRQ domain with irq_chip implementation, and DMA operations
necessary to use devices through the VMD endpoint's interface.

VMD routes I/O as follows:

1) Configuration Space: BAR 0 ("CFGBAR") of VMD provides the base
address and size for configuration space register access to VMD-owned
root ports. It works similarly to MMCONFIG for extended configuration
space. Bus numbering is independent and does not conflict with the
primary domain.

2) MMIO Space: BARs 2 and 4 ("MEMBAR1" and "MEMBAR2") of VMD provide the
base address, size, and type for MMIO register access. These addresses
are not translated by VMD hardware; they are simply reservations to be
distributed to root ports' memory base/limit registers and subdivided
among devices downstream.

3) DMA: To interact appropriately with an IOMMU, the source ID DMA read
and write requests are translated to the bus-device-function of the VMD
endpoint. Otherwise, DMA operates normally without VMD-specific address
translation.

4) Interrupts: Part of VMD's BAR 4 is reserved for VMD's MSI-X Table and
PBA. MSIs from VMD domain devices and ports are remapped to appear as
if they were issued using one of VMD's MSI-X table entries. Each MSI
and MSI-X address of VMD-owned devices and ports has a special format
where the address refers to specific entries in the VMD's MSI-X table.
As with DMA, the interrupt source ID is translated to VMD's
bus-device-function.

The driver provides its own MSI and MSI-X configuration functions
specific to how MSI messages are used within the VMD domain, and
provides an irq_chip for independent IRQ allocation to relay interrupts
from VMD's interrupt handler to the appropriate device driver's handler.

5) Errors: PCIe error message are intercepted by the root ports normally
(e.g., AER), except with VMD, system errors (i.e., firmware first) are
disabled by default. AER and hotplug interrupts are translated in the
same way as endpoint interrupts.

6) VMD does not support INTx interrupts or IO ports. Devices or drivers
requiring these features should either not be placed below VMD-owned
root ports, or VMD should be disabled by BIOS for such endpoints.

[bhelgaas: add VMD BAR #defines, factor out vmd_cfg_addr(), rework VMD
resource setup, whitespace, changelog]
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de> (IRQ-related parts)

185a383a

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功