1. 17 2月, 2016 1 次提交
  2. 15 2月, 2016 1 次提交
  3. 11 1月, 2016 1 次提交
  4. 17 12月, 2015 1 次提交
    • A
      powerpc/powernv: Add support for Nvlink NPUs · 5d2aa710
      Alistair Popple 提交于
      NVLink is a high speed interconnect that is used in conjunction with a
      PCI-E connection to create an interface between CPU and GPU that
      provides very high data bandwidth. A PCI-E connection to a GPU is used
      as the control path to initiate and report status of large data
      transfers sent via the NVLink.
      
      On IBM Power systems the NVLink processing unit (NPU) is similar to
      the existing PHB3. This patch adds support for a new NPU PHB type. DMA
      operations on the NPU are not supported as this patch sets the TCE
      translation tables to be the same as the related GPU PCIe device for
      each NVLink. Therefore all DMA operations are setup and controlled via
      the PCIe device.
      
      EEH is not presently supported for the NPU devices, although it may be
      added in future.
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5d2aa710
  5. 10 9月, 2015 1 次提交
    • P
      powerpc/MSI: Fix race condition in tearing down MSI interrupts · e297c939
      Paul Mackerras 提交于
      This fixes a race which can result in the same virtual IRQ number
      being assigned to two different MSI interrupts.  The most visible
      consequence of that is usually a warning and stack trace from the
      sysfs code about an attempt to create a duplicate entry in sysfs.
      
      The race happens when one CPU (say CPU 0) is disposing of an MSI
      while another CPU (say CPU 1) is setting up an MSI.  CPU 0 calls
      (for example) pnv_teardown_msi_irqs(), which calls
      msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
      hardware IRQ number) is no longer in use.  Then, before CPU 0 gets
      to calling irq_dispose_mapping() to free up the virtal IRQ number,
      CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
      MSI, and gets the same hardware IRQ number that CPU 0 just freed.
      CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
      which sees that there is currently a mapping for that hardware IRQ
      number and returns the corresponding virtual IRQ number (which is
      the same virtual IRQ number that CPU 0 was using).  CPU 0 then
      calls irq_dispose_mapping() and frees that virtual IRQ number.
      Now, if another CPU comes along and calls irq_create_mapping(), it
      is likely to get the virtual IRQ number that was just freed,
      resulting in the same virtual IRQ number apparently being used for
      two different hardware interrupts.
      
      To fix this race, we just move the call to msi_bitmap_free_hwirqs()
      to after the call to irq_dispose_mapping().  Since virq_to_hw()
      doesn't work for the virtual IRQ number after irq_dispose_mapping()
      has been called, we need to call it before irq_dispose_mapping() and
      remember the result for the msi_bitmap_free_hwirqs() call.
      
      The pattern of calling msi_bitmap_free_hwirqs() before
      irq_dispose_mapping() appears in 5 places under arch/powerpc, and
      appears to have originated in commit 05af7bd2 ("[POWERPC] MPIC
      U3/U4 MSI backend") from 2007.
      
      Fixes: 05af7bd2 ("[POWERPC] MPIC U3/U4 MSI backend")
      Cc: stable@vger.kernel.org # v2.6.22+
      Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e297c939
  6. 18 8月, 2015 1 次提交
    • A
      powerpc/powernv: move dma_get_required_mask from pnv_phb to pci_controller_ops · 53522982
      Andrew Donnellan 提交于
      Simplify the dma_get_required_mask call chain by moving it from pnv_phb to
      pci_controller_ops, similar to commit 763d2d8d ("powerpc/powernv:
      Move dma_set_mask from pnv_phb to pci_controller_ops").
      
      Previous call chain:
      
        0) call dma_get_required_mask() (kernel/dma.c)
        1) call ppc_md.dma_get_required_mask, if it exists. On powernv, that
           points to pnv_dma_get_required_mask() (platforms/powernv/setup.c)
        2) device is PCI, therefore call pnv_pci_dma_get_required_mask()
           (platforms/powernv/pci.c)
        3) call phb->dma_get_required_mask if it exists
        4) it only exists in the ioda case, where it points to
             pnv_pci_ioda_dma_get_required_mask() (platforms/powernv/pci-ioda.c)
      
      New call chain:
      
        0) call dma_get_required_mask() (kernel/dma.c)
        1) device is PCI, therefore call pci_controller_ops.dma_get_required_mask
           if it exists
        2) in the ioda case, that points to pnv_pci_ioda_dma_get_required_mask()
           (platforms/powernv/pci-ioda.c)
      
      In the p5ioc2 case, the call chain remains the same -
      dma_get_required_mask() does not find either a ppc_md call or
      pci_controller_ops call, so it calls __dma_get_required_mask().
      Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Reviewed-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      53522982
  7. 23 7月, 2015 1 次提交
    • J
      powerpc/PCI: Use for_pci_msi_entry() to access MSI device list · 2921d179
      Jiang Liu 提交于
      Use accessor for_each_pci_msi_entry() to access MSI device list, so we
      could easily move msi_list from struct pci_dev into struct device
      later.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Stuart Yoder <stuart.yoder@freescale.com>
      Cc: Yijing Wang <wangyijing@huawei.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Wei Yang <weiyang@linux.vnet.ibm.com>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Alexander Gordeev <agordeev@redhat.com>
      Cc: Scott Wood <scottwood@freescale.com>
      Cc: Laurentiu Tudor <Laurentiu.Tudor@freescale.com>
      Cc: Tudor Laurentiu <b10716@freescale.com>
      Cc: Hongtao Jia <hongtao.jia@freescale.com>
      Link: http://lkml.kernel.org/r/1436428847-8886-4-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2921d179
  8. 11 6月, 2015 7 次提交
    • A
      powerpc/powernv: Implement multilevel TCE tables · bbb845c4
      Alexey Kardashevskiy 提交于
      TCE tables might get too big in case of 4K IOMMU pages and DDW enabled
      on huge guests (hundreds of GB of RAM) so the kernel might be unable to
      allocate contiguous chunk of physical memory to store the TCE table.
      
      To address this, POWER8 CPU (actually, IODA2) supports multi-level
      TCE tables, up to 5 levels which splits the table into a tree of
      smaller subtables.
      
      This adds multi-level TCE tables support to
      pnv_pci_ioda2_table_alloc_pages() and pnv_pci_ioda2_table_free_pages()
      helpers.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bbb845c4
    • A
      powerpc/iommu/powernv: Release replaced TCE · 05c6cfb9
      Alexey Kardashevskiy 提交于
      At the moment writing new TCE value to the IOMMU table fails with EBUSY
      if there is a valid entry already. However PAPR specification allows
      the guest to write new TCE value without clearing it first.
      
      Another problem this patch is addressing is the use of pool locks for
      external IOMMU users such as VFIO. The pool locks are to protect
      DMA page allocator rather than entries and since the host kernel does
      not control what pages are in use, there is no point in pool locks and
      exchange()+put_page(oldtce) is sufficient to avoid possible races.
      
      This adds an exchange() callback to iommu_table_ops which does the same
      thing as set() plus it returns replaced TCE and DMA direction so
      the caller can release the pages afterwards. The exchange() receives
      a physical address unlike set() which receives linear mapping address;
      and returns a physical address as the clear() does.
      
      This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement
      for a platform to have exchange() implemented in order to support VFIO.
      
      This replaces iommu_tce_build() and iommu_clear_tce() with
      a single iommu_tce_xchg().
      
      This makes sure that TCE permission bits are not set in TCE passed to
      IOMMU API as those are to be calculated by platform code from
      DMA direction.
      
      This moves SetPageDirty() to the IOMMU code to make it work for both
      VFIO ioctl interface in in-kernel TCE acceleration (when it becomes
      available later).
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      05c6cfb9
    • A
      powerpc/powernv: Implement accessor to TCE entry · c5bb44ed
      Alexey Kardashevskiy 提交于
      This replaces direct accesses to TCE table with a helper which
      returns an TCE entry address. This does not make difference now but will
      when multi-level TCE tables get introduces.
      
      No change in behavior is expected.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c5bb44ed
    • A
      powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group · 0eaf4def
      Alexey Kardashevskiy 提交于
      So far one TCE table could only be used by one IOMMU group. However
      IODA2 hardware allows programming the same TCE table address to
      multiple PE allowing sharing tables.
      
      This replaces a single pointer to a group in a iommu_table struct
      with a linked list of groups which provides the way of invalidating
      TCE cache for every PE when an actual TCE table is updated. This adds
      pnv_pci_link_table_and_group() and pnv_pci_unlink_table_and_group()
      helpers to manage the list. However without VFIO, it is still going
      to be a single IOMMU group per iommu_table.
      
      This changes iommu_add_device() to add a device to a first group
      from the group list of a table as it is only called from the platform
      init code or PCI bus notifier and at these moments there is only
      one group per table.
      
      This does not change TCE invalidation code to loop through all
      attached groups in order to simplify this patch and because
      it is not really needed in most cases. IODA2 is fixed in a later
      patch.
      
      This should cause no behavioural change.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw: for the vfio related changes]
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0eaf4def
    • A
      powerpc/powernv/ioda/ioda2: Rework TCE invalidation in tce_build()/tce_free() · decbda25
      Alexey Kardashevskiy 提交于
      The pnv_pci_ioda_tce_invalidate() helper invalidates TCE cache. It is
      supposed to be called on IODA1/2 and not called on p5ioc2. It receives
      start and end host addresses of TCE table.
      
      IODA2 actually needs PCI addresses to invalidate the cache. Those
      can be calculated from host addresses but since we are going
      to implement multi-level TCE tables, calculating PCI address from
      a host address might get either tricky or ugly as TCE table remains flat
      on PCI bus but not in RAM.
      
      This moves pnv_pci_ioda_tce_invalidate() from generic pnv_tce_build/
      pnt_tce_free and defines IODA1/2-specific callbacks which call generic
      ones and do PHB-model-specific TCE cache invalidation. P5IOC2 keeps
      using generic callbacks as before.
      
      This changes pnv_pci_ioda2_tce_invalidate() to receives TCE index and
      number of pages which are PCI addresses shifted by IOMMU page shift.
      
      No change in behaviour is expected.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      decbda25
    • A
      powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table · da004c36
      Alexey Kardashevskiy 提交于
      This adds a iommu_table_ops struct and puts pointer to it into
      the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush
      callbacks from ppc_md to the new struct where they really belong to.
      
      This adds the requirement for @it_ops to be initialized before calling
      iommu_init_table() to make sure that we do not leave any IOMMU table
      with iommu_table_ops uninitialized. This is not a parameter of
      iommu_init_table() though as there will be cases when iommu_init_table()
      will not be called on TCE tables, for example - VFIO.
      
      This does s/tce_build/set/, s/tce_free/clear/ and removes "tce_"
      redundant prefixes.
      
      This removes tce_xxx_rm handlers from ppc_md but does not add
      them to iommu_table_ops as this will be done later if we decide to
      support TCE hypercalls in real mode. This removes _vm callbacks as
      only virtual mode is supported by now so this also removes @rm parameter.
      
      For pSeries, this always uses tce_buildmulti_pSeriesLP/
      tce_buildmulti_pSeriesLP. This changes multi callback to fall back to
      tce_build_pSeriesLP/tce_free_pSeriesLP if FW_FEATURE_MULTITCE is not
      present. The reason for this is we still have to support "multitce=off"
      boot parameter in disable_multitce() and we do not want to walk through
      all IOMMU tables in the system and replace "multi" callbacks with single
      ones.
      
      For powernv, this defines _ops per PHB type which are P5IOC2/IODA1/IODA2.
      This makes the callbacks for them public. Later patches will extend
      callbacks for IODA1/2.
      
      No change in behaviour is expected.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      da004c36
    • A
      powerpc/powernv: Do not set "read" flag if direction==DMA_NONE · 10b35b2b
      Alexey Kardashevskiy 提交于
      Normally a bitmap from the iommu_table is used to track what TCE entry
      is in use. Since we are going to use iommu_table without its locks and
      do xchg() instead, it becomes essential not to put bits which are not
      implied in the direction flag as the old TCE value (more precisely -
      the permission bits) will be used to decide whether to put the page or not.
      
      This adds iommu_direction_to_tce_perm() (its counterpart is there already)
      and uses it for powernv's pnv_tce_build().
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      10b35b2b
  9. 03 6月, 2015 1 次提交
  10. 02 6月, 2015 2 次提交
    • D
      powerpc/powernv: Move dma_set_mask() from pnv_phb to pci_controller_ops · 763d2d8d
      Daniel Axtens 提交于
      Previously, dma_set_mask() on powernv was convoluted:
       0) Call dma_set_mask() (a/p/kernel/dma.c)
       1) In dma_set_mask(), ppc_md.dma_set_mask() exists, so call it.
       2) On powernv, that function pointer is pnv_dma_set_mask().
          In pnv_dma_set_mask(), the device is pci, so call pnv_pci_dma_set_mask().
       3) In pnv_pci_dma_set_mask(), call pnv_phb->set_dma_mask() if it exists.
       4) It only exists in the ioda case, where it points to
          pnv_pci_ioda_dma_set_mask(), which is the final function.
      
      So the call chain is:
       dma_set_mask() ->
        pnv_dma_set_mask() ->
         pnv_pci_dma_set_mask() ->
          pnv_pci_ioda_dma_set_mask()
      
      Both ppc_md and pnv_phb function pointers are used.
      
      Rip out the ppc_md call, pnv_dma_set_mask() and pnv_pci_dma_set_mask().
      
      Instead:
       0) Call dma_set_mask() (a/p/kernel/dma.c)
       1) In dma_set_mask(), the device is pci, and pci_controller_ops.dma_set_mask()
          exists, so call pci_controller_ops.dma_set_mask()
       2) In the ioda case, that points to pnv_pci_ioda_dma_set_mask().
      
      The new call chain is
       dma_set_mask() ->
        pnv_pci_ioda_dma_set_mask()
      
      Now only the pci_controller_ops function pointer is used.
      
      The fallback paths for p5ioc2 are the same.
      
      Previously, pnv_pci_dma_set_mask() would find no pnv_phb->set_dma_mask()
      function, to it would call __set_dma_mask().
      
      Now, dma_set_mask() finds no ppc_md call or pci_controller_ops call,
      so it calls __set_dma_mask().
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      763d2d8d
    • D
      powerpc/powernv: Specialise pci_controller_ops for each controller type · 92ae0353
      Daniel Axtens 提交于
      Remove powernv generic PCI controller operations. Replace it with
      controller ops for each of the two supported PHBs.
      
      As an added bonus, make the two new structs const, which will help
      guard against bugs such as the one introduced in 65ebf4b6
      ("powerpc/powernv: Move controller ops from ppc_md to controller_ops")
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      92ae0353
  11. 22 5月, 2015 1 次提交
  12. 11 4月, 2015 1 次提交
  13. 07 4月, 2015 1 次提交
  14. 31 3月, 2015 1 次提交
    • W
      powerpc/powernv: Shift VF resource with an offset · 781a868f
      Wei Yang 提交于
      On PowerNV platform, resource position in M64 BAR implies the PE# the
      resource belongs to. In some cases, adjustment of a resource is necessary
      to locate it to a correct position in M64 BAR .
      
      This patch adds pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR
      address according to an offset.
      
      Note:
      
          After doing so, there would be a "hole" in the /proc/iomem when offset
          is a positive value. It looks like the device return some mmio back to
          the system, which actually no one could use it.
      
      [bhelgaas: rework loops, rework overlap check, index resource[]
      conventionally, remove pci_regs.h include, squashed with next patch]
      Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      781a868f
  15. 24 3月, 2015 1 次提交
  16. 04 3月, 2015 1 次提交
  17. 23 1月, 2015 1 次提交
    • G
      powerpc/powernv: Remove pnv_pci_probe_mode() · a113de37
      Gavin Shan 提交于
      The callback (ppc_md.pci_probe_mode()) is used to determine if the
      child PCI devices of the indicated PCI bus should be probed from
      device-tree or hardware. On PowerNV platform, we always expect
      probing PCI devices from hardware, which is PowerPC PCI core's
      default behaviour. Also, the callback had some delay implemented
      based on PHB's device node property "reset-clear-timestamp", which
      wasn't exported from skiboot. So we don't need this function and
      it's safe to remove it.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a113de37
  18. 24 11月, 2014 1 次提交
  19. 23 11月, 2014 1 次提交
  20. 10 11月, 2014 1 次提交
  21. 15 10月, 2014 1 次提交
  22. 02 10月, 2014 1 次提交
  23. 30 9月, 2014 1 次提交
    • G
      powerpc/powernv: Override dma_get_required_mask() · fe7e85c6
      Gavin Shan 提交于
      The dma_get_required_mask() function is used by some drivers to
      query the platform about what DMA mask is needed to cover all of
      memory. This is a bit of a strange semantic when we have to choose
      between IOMMU translation or bypass, but essentially what it means
      is "what DMA mask will give best performances".
      
      Currently, our IOMMU backend always returns a 32-bit mask here, we
      don't do anything special to it when we have bypass available. This
      causes some drivers to choose a 32-bit mask, thus losing the ability
      to use the bypass window, thinking this is more efficient. The problem
      was reported from the driver of following device:
      
      0004:03:00.0 0107: 1000:0087 (rev 05)
      0004:03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios \
                   Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
      
      This patch adds an override of that function in order to, instead,
      return a 64-bit mask whenever a bypass window is available in order
      for drivers to prefer this configuration.
      Reported-by: NMurali N. Iyer <mniyer@us.ibm.com>
      Suggested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fe7e85c6
  24. 05 8月, 2014 2 次提交
  25. 28 7月, 2014 1 次提交
  26. 11 7月, 2014 2 次提交
  27. 11 6月, 2014 1 次提交
  28. 28 4月, 2014 4 次提交
    • G
      powerpc/eeh: No hotplug on permanently removed dev · d2b0f6f7
      Gavin Shan 提交于
      The issue was detected in a bit complicated test case where
      we have multiple hierarchical PEs shown as following figure:
      
                      +-----------------+
                      | PE#3     p2p#0  |
                      |          p2p#1  |
                      +-----------------+
                              |
                      +-----------------+
                      | PE#4     pdev#0 |
                      |          pdev#1 |
                      +-----------------+
      
      PE#4 (have 2 PCI devices) is the child of PE#3, which has 2 p2p
      bridges. We accidentally had less-known scenario: PE#4 was removed
      permanently from the system because of permanent failure (e.g.
      exceeding the max allowd failure times in last hour), then we detects
      EEH errors on PE#3 and tried to recover it. However, eeh_dev instances
      for pdev#0/1 were not detached from PE#4, which was still connected to
      PE#3. All of that was because of the fact that we rely on count-based
      pcibios_release_device(), which isn't reliable enough. When doing
      recovery for PE#3, we still apply hotplug on PE#4 and pdev#0/1, which
      are not valid any more. Eventually, we run into kernel crash.
      
      The patch fixes above issue from two aspects. For unplug, we simply
      skip those permanently removed PE, whose state is (EEH_PE_STATE_ISOLATED
      && !EEH_PE_STATE_RECOVERING) and its frozen count should be greater
      than EEH_MAX_ALLOWED_FREEZES. For plug, we marked all permanently
      removed EEH devices with EEH_DEV_REMOVED and return 0xFF's on read
      its PCI config so that PCI core will omit them.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d2b0f6f7
    • G
      powerpc/eeh: Block PCI-CFG access during PE reset · d0914f50
      Gavin Shan 提交于
      We've observed multiple PE reset failures because of PCI-CFG
      access during that period. Potentially, some device drivers
      can't support EEH very well and they can't put the device to
      motionless state before PE reset. So those device drivers might
      produce PCI-CFG accesses during PE reset. Also, we could have
      PCI-CFG access from user space (e.g. "lspci"). Since access to
      frozen PE should return 0xFF's, we can block PCI-CFG access
      during the period of PE reset so that we won't get recrusive EEH
      errors.
      
      The patch adds flag EEH_PE_RESET, which is kept during PE reset.
      The PowerNV/pSeries PCI-CFG accessors reuse the flag to block
      PCI-CFG accordingly.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d0914f50
    • G
      powerpc/powernv: Remove fields in PHB diag-data dump · b34497d1
      Gavin Shan 提交于
      For some fields (e.g. LEM, MMIO, DMA) in PHB diag-data dump, it's
      meaningless to print them if they have non-zero value in the
      corresponding mask registers because we always have non-zero values
      in the mask registers. The patch only prints those fieds if we
      have non-zero values in the primary registers (e.g. LEM, MMIO, DMA
      status) so that we can save couple of lines. The patch also removes
      unnecessary spare line before "brdgCtl:" and two leading spaces as
      prefix in each line as Ben suggested.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b34497d1
    • G
      powerpc/powernv: Move PNV_EEH_STATE_ENABLED around · f5bc6b70
      Gavin Shan 提交于
      The flag PNV_EEH_STATE_ENABLED is put into pnv_phb::eeh_state,
      which is protected by CONFIG_EEH. We needn't that. Instead, we
      can have pnv_phb::flags and maintain all flags there, which is
      the purpose of the patch. The patch also renames PNV_EEH_STATE_ENABLED
      to PNV_PHB_FLAG_EEH.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f5bc6b70