1. 08 12月, 2016 1 次提交
    • J
      PCI: vmd: Use SRCU as a local RCU to prevent delaying global RCU · 3906b918
      Jon Derrick 提交于
      SRCU lets synchronize_srcu() depend on VMD-local RCU primitives, preventing
      long delays from locking up RCU in other systems.  VMD performs a
      synchronize when removing a device, but will hit all IRQ lists if the
      device uses all VMD vectors.  This patch will not help VMD's RCU
      synchronization, but will isolate the read side delays to the VMD
      subsystem.  Additionally, the use of SRCU in VMD's ISR will keep it
      isolated from any other RCU waiters in the rest of the system.
      
      Tested using concurrent FIO and NVMe resets:
      
        [global]
        rw=read
        bs=4k
        direct=1
        ioengine=libaio
        iodepth=32
        norandommap
        timeout=300
        runtime=1000000000
      
        [nvme0]
        cpus_allowed=0-63
        numjobs=8
        filename=/dev/nvme0n1
      
        [nvme1]
        cpus_allowed=0-63
        numjobs=8
        filename=/dev/nvme1n1
      
        while (true) do
          for i in /sys/class/nvme/nvme*; do
            echo "Resetting ${i##*/}"
            echo 1 > $i/reset_controller;
            sleep 5
          done;
        done
      Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      3906b918
  2. 12 11月, 2016 1 次提交
  3. 05 10月, 2016 1 次提交
  4. 23 9月, 2016 1 次提交
  5. 20 9月, 2016 5 次提交
  6. 24 8月, 2016 1 次提交
  7. 04 8月, 2016 1 次提交
    • K
      dma-mapping: use unsigned long for dma_attrs · 00085f1e
      Krzysztof Kozlowski 提交于
      The dma-mapping core and the implementations do not change the DMA
      attributes passed by pointer.  Thus the pointer can point to const data.
      However the attributes do not have to be a bitfield.  Instead unsigned
      long will do fine:
      
      1. This is just simpler.  Both in terms of reading the code and setting
         attributes.  Instead of initializing local attributes on the stack
         and passing pointer to it to dma_set_attr(), just set the bits.
      
      2. It brings safeness and checking for const correctness because the
         attributes are passed by value.
      
      Semantic patches for this change (at least most of them):
      
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
      
          @@
          f(...,
          - struct dma_attrs *attrs
          + unsigned long attrs
          , ...)
          {
          ...
          }
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      and
      
          // Options: --all-includes
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
          type t;
      
          @@
          t f(..., struct dma_attrs *attrs);
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Acked-by: Mark Salter <msalter@redhat.com> [c6x]
      Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
      Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
      Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
      Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
      Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
      Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
      Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
      Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00085f1e
  8. 21 6月, 2016 3 次提交
    • K
      x86/PCI: VMD: Separate MSI and MSI-X vector sharing · 9c205304
      Keith Busch 提交于
      Child devices in a VMD domain that want to use MSI are slowing down MSI-X
      using devices sharing the same vectors.  Move all MSI usage to a single VMD
      vector, and MSI-X devices can share the rest.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NJon Derrick <jonathan.derrick@intel.com>
      9c205304
    • K
      x86/PCI: VMD: Use x86_vector_domain as parent domain · e382dffc
      Keith Busch 提交于
      Otherwise APIC code assumes VMD's IRQ domain can be managed by the APIC,
      resulting in an invalid cast of irq_data during irq_force_complete_move().
      Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      e382dffc
    • J
      x86/PCI: VMD: Use lock save/restore in interrupt enable path · 3f57ff4f
      Jon Derrick 提交于
      Enabling interrupts may result in an interrupt raised and serviced while
      VMD holds a lock, resulting in contention with the spin lock held while
      enabling interrupts.
      
      The solution is to disable preemption and save/restore the state during
      interrupt enable and disable.
      
      Fixes lockdep:
      
        ======================================================
        [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
        4.6.0-2016-06-16-lockdep+ #47 Tainted: G            E
        ------------------------------------------------------
        kworker/0:1/447 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
         (list_lock){+.+...}, at: [<ffffffffa04eb8fc>] vmd_irq_enable+0x3c/0x70 [vmd]
      
        and this task is already holding:
         (&irq_desc_lock_class){-.-...}, at: [<ffffffff810e1ff6>] __setup_irq+0xa6/0x610
        which would create a new lock dependency:
         (&irq_desc_lock_class){-.-...} -> (list_lock){+.+...}
      
        but this new dependency connects a HARDIRQ-irq-safe lock:
         (&irq_desc_lock_class){-.-...}
        ... which became HARDIRQ-irq-safe at:
          [<ffffffff810c9f21>] __lock_acquire+0x981/0xe00
          [<ffffffff810cb039>] lock_acquire+0x119/0x220
          [<ffffffff8167294d>] _raw_spin_lock+0x3d/0x80
          [<ffffffff810e36d4>] handle_level_irq+0x24/0x110
          [<ffffffff8101f20a>] handle_irq+0x1a/0x30
          [<ffffffff81675fc1>] do_IRQ+0x61/0x120
          [<ffffffff8167404c>] ret_from_intr+0x0/0x20
          [<ffffffff81672e30>] _raw_spin_unlock_irqrestore+0x40/0x60
          [<ffffffff810e21ee>] __setup_irq+0x29e/0x610
          [<ffffffff810e25a1>] setup_irq+0x41/0x90
          [<ffffffff81f5777f>] setup_default_timer_irq+0x1e/0x20
          [<ffffffff81f57798>] hpet_time_init+0x17/0x19
          [<ffffffff81f5775a>] x86_late_time_init+0xa/0x11
          [<ffffffff81f51e9b>] start_kernel+0x382/0x436
          [<ffffffff81f51308>] x86_64_start_reservations+0x2a/0x2c
          [<ffffffff81f51445>] x86_64_start_kernel+0x13b/0x14a
      
        to a HARDIRQ-irq-unsafe lock:
         (list_lock){+.+...}
        ... which became HARDIRQ-irq-unsafe at:
        ...  [<ffffffff810c9d8e>] __lock_acquire+0x7ee/0xe00
          [<ffffffff810cb039>] lock_acquire+0x119/0x220
          [<ffffffff8167294d>] _raw_spin_lock+0x3d/0x80
          [<ffffffffa04eba42>] vmd_msi_init+0x72/0x150 [vmd]
          [<ffffffff810e8597>] msi_domain_alloc+0xb7/0x140
          [<ffffffff810e6b10>] irq_domain_alloc_irqs_recursive+0x40/0xa0
          [<ffffffff810e6cea>] __irq_domain_alloc_irqs+0x14a/0x330
          [<ffffffff810e8a8c>] msi_domain_alloc_irqs+0x8c/0x1d0
          [<ffffffff813ca4e3>] pci_msi_setup_msi_irqs+0x43/0x70
          [<ffffffff813cada1>] pci_enable_msi_range+0x131/0x280
          [<ffffffff813bf5e0>] pcie_port_device_register+0x320/0x4e0
          [<ffffffff813bf9a4>] pcie_portdrv_probe+0x34/0x60
          [<ffffffff813b0e85>] local_pci_probe+0x45/0xa0
          [<ffffffff813b226b>] pci_device_probe+0xdb/0x130
          [<ffffffff8149e3cc>] driver_probe_device+0x22c/0x440
          [<ffffffff8149e774>] __device_attach_driver+0x94/0x110
          [<ffffffff8149bfad>] bus_for_each_drv+0x5d/0x90
          [<ffffffff8149e030>] __device_attach+0xc0/0x140
          [<ffffffff8149e0c0>] device_attach+0x10/0x20
          [<ffffffff813a77f7>] pci_bus_add_device+0x47/0x90
          [<ffffffff813a7879>] pci_bus_add_devices+0x39/0x70
          [<ffffffff813aaba7>] pci_rescan_bus+0x27/0x30
          [<ffffffffa04ec1af>] vmd_probe+0x68f/0x76c [vmd]
          [<ffffffff813b0e85>] local_pci_probe+0x45/0xa0
          [<ffffffff81088064>] work_for_cpu_fn+0x14/0x20
          [<ffffffff8108c244>] process_one_work+0x1f4/0x740
          [<ffffffff8108c9c6>] worker_thread+0x236/0x4f0
          [<ffffffff810935c2>] kthread+0xf2/0x110
          [<ffffffff816738f2>] ret_from_fork+0x22/0x50
      
        other info that might help us debug this:
      
         Possible interrupt unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(list_lock);
      				 local_irq_disable();
      				 lock(&irq_desc_lock_class);
      				 lock(list_lock);
          <Interrupt>
            lock(&irq_desc_lock_class);
      
         *** DEADLOCK ***
      Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NKeith Busch <keith.busch@intel.com>
      3f57ff4f
  9. 18 6月, 2016 1 次提交
  10. 14 6月, 2016 2 次提交
  11. 11 3月, 2016 3 次提交
  12. 16 1月, 2016 1 次提交
    • K
      x86/PCI: Add driver for Intel Volume Management Device (VMD) · 185a383a
      Keith Busch 提交于
      The Intel Volume Management Device (VMD) is a Root Complex Integrated
      Endpoint that acts as a host bridge to a secondary PCIe domain.  BIOS can
      reassign one or more Root Ports to appear within a VMD domain instead of
      the primary domain.  The immediate benefit is that additional PCIe domains
      allow more than 256 buses in a system by letting bus numbers be reused
      across different domains.
      
      VMD domains do not define ACPI _SEG, so to avoid domain clashing with host
      bridges defining this segment, VMD domains start at 0x10000, which is
      greater than the highest possible 16-bit ACPI defined _SEG.
      
      This driver enumerates and enables the domain using the root bus
      configuration interface provided by the PCI subsystem.  The driver provides
      configuration space accessor functions (pci_ops), bus and memory resources,
      an MSI IRQ domain with irq_chip implementation, and DMA operations
      necessary to use devices through the VMD endpoint's interface.
      
      VMD routes I/O as follows:
      
         1) Configuration Space: BAR 0 ("CFGBAR") of VMD provides the base
         address and size for configuration space register access to VMD-owned
         root ports.  It works similarly to MMCONFIG for extended configuration
         space.  Bus numbering is independent and does not conflict with the
         primary domain.
      
         2) MMIO Space: BARs 2 and 4 ("MEMBAR1" and "MEMBAR2") of VMD provide the
         base address, size, and type for MMIO register access.  These addresses
         are not translated by VMD hardware; they are simply reservations to be
         distributed to root ports' memory base/limit registers and subdivided
         among devices downstream.
      
         3) DMA: To interact appropriately with an IOMMU, the source ID DMA read
         and write requests are translated to the bus-device-function of the VMD
         endpoint.  Otherwise, DMA operates normally without VMD-specific address
         translation.
      
         4) Interrupts: Part of VMD's BAR 4 is reserved for VMD's MSI-X Table and
         PBA.  MSIs from VMD domain devices and ports are remapped to appear as
         if they were issued using one of VMD's MSI-X table entries.  Each MSI
         and MSI-X address of VMD-owned devices and ports has a special format
         where the address refers to specific entries in the VMD's MSI-X table.
         As with DMA, the interrupt source ID is translated to VMD's
         bus-device-function.
      
         The driver provides its own MSI and MSI-X configuration functions
         specific to how MSI messages are used within the VMD domain, and
         provides an irq_chip for independent IRQ allocation to relay interrupts
         from VMD's interrupt handler to the appropriate device driver's handler.
      
         5) Errors: PCIe error message are intercepted by the root ports normally
         (e.g., AER), except with VMD, system errors (i.e., firmware first) are
         disabled by default.  AER and hotplug interrupts are translated in the
         same way as endpoint interrupts.
      
         6) VMD does not support INTx interrupts or IO ports.  Devices or drivers
         requiring these features should either not be placed below VMD-owned
         root ports, or VMD should be disabled by BIOS for such endpoints.
      
      [bhelgaas: add VMD BAR #defines, factor out vmd_cfg_addr(), rework VMD
      resource setup, whitespace, changelog]
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: Thomas Gleixner <tglx@linutronix.de> (IRQ-related parts)
      185a383a