1. 15 1月, 2014 11 次提交
    • S
      powerpc: Add debug checks to catch invalid cpu-to-node mappings · 68fb18aa
      Srivatsa S. Bhat 提交于
      There have been some weird bugs in the past where the kernel tried to associate
      threads of the same core to different NUMA nodes, and things went haywire after
      that point (as expected).
      
      But unfortunately, root-causing such issues have been quite challenging, due to
      the lack of appropriate debug checks in the kernel. These bugs usually lead to
      some odd soft-lockups in the scheduler's build-sched-domain code in the CPU
      hotplug path, which makes it very hard to trace it back to the incorrect
      cpu-to-node mappings.
      
      So add appropriate debug checks to catch such invalid cpu-to-node mappings
      as early as possible.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      68fb18aa
    • S
      powerpc: Fix the setup of CPU-to-Node mappings during CPU online · d4edc5b6
      Srivatsa S. Bhat 提交于
      On POWER platforms, the hypervisor can notify the guest kernel about dynamic
      changes in the cpu-numa associativity (VPHN topology update). Hence the
      cpu-to-node mappings that we got from the firmware during boot, may no longer
      be valid after such updates. This is handled using the arch_update_cpu_topology()
      hook in the scheduler, and the sched-domains are rebuilt according to the new
      mappings.
      
      But unfortunately, at the moment, CPU hotplug ignores these updated mappings
      and instead queries the firmware for the cpu-to-numa relationships and uses
      them during CPU online. So the kernel can end up assigning wrong NUMA nodes
      to CPUs during subsequent CPU hotplug online operations (after booting).
      
      Further, a particularly problematic scenario can result from this bug:
      On POWER platforms, the SMT mode can be switched between 1, 2, 4 (and even 8)
      threads per core. The switch to Single-Threaded (ST) mode is performed by
      offlining all except the first CPU thread in each core. Switching back to
      SMT mode involves onlining those other threads back, in each core.
      
      Now consider this scenario:
      
      1. During boot, the kernel gets the cpu-to-node mappings from the firmware
         and assigns the CPUs to NUMA nodes appropriately, during CPU online.
      
      2. Later on, the hypervisor updates the cpu-to-node mappings dynamically and
         communicates this update to the kernel. The kernel in turn updates its
         cpu-to-node associations and rebuilds its sched domains. Everything is
         fine so far.
      
      3. Now, the user switches the machine from SMT to ST mode (say, by running
         ppc64_cpu --smt=1). This involves offlining all except 1 thread in each
         core.
      
      4. The user then tries to switch back from ST to SMT mode (say, by running
         ppc64_cpu --smt=4), and this involves onlining those threads back. Since
         CPU hotplug ignores the new mappings, it queries the firmware and tries to
         associate the newly onlined sibling threads to the old NUMA nodes. This
         results in sibling threads within the same core getting associated with
         different NUMA nodes, which is incorrect.
      
         The scheduler's build-sched-domains code gets thoroughly confused with this
         and enters an infinite loop and causes soft-lockups, as explained in detail
         in commit 3be7db6a (powerpc: VPHN topology change updates all siblings).
      
      So to fix this, use the numa_cpu_lookup_table to remember the updated
      cpu-to-node mappings, and use them during CPU hotplug online operations.
      Further, we also need to ensure that all threads in a core are assigned to a
      common NUMA node, irrespective of whether all those threads were online during
      the topology update. To achieve this, we take care not to use cpu_sibling_mask()
      since it is not hotplug invariant. Instead, we use cpu_first_sibling_thread()
      and set up the mappings manually using the 'threads_per_core' value for that
      particular platform. This helps us ensure that we don't hit this bug with any
      combination of CPU hotplug and SMT mode switching.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d4edc5b6
    • G
      powerpc/iommu: Don't detach device without IOMMU group · 0c4b9e27
      Gavin Shan 提交于
      Some devices, for example PCI root port, don't have IOMMU table and
      group. We needn't detach them from their IOMMU group. Otherwise, it
      potentially incurs kernel crash because of referring NULL IOMMU group
      as following backtrace indicates:
      
        .iommu_group_remove_device+0x74/0x1b0
        .iommu_bus_notifier+0x94/0xb4
        .notifier_call_chain+0x78/0xe8
        .__blocking_notifier_call_chain+0x7c/0xbc
        .blocking_notifier_call_chain+0x38/0x48
        .device_del+0x50/0x234
        .pci_remove_bus_device+0x88/0x138
        .pci_stop_and_remove_bus_device+0x2c/0x40
        .pcibios_remove_pci_devices+0xcc/0xfc
        .pcibios_remove_pci_devices+0x3c/0xfc
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0c4b9e27
    • G
      powerpc/eeh: Hotplug improvement · f26c7a03
      Gavin Shan 提交于
      When EEH error comes to one specific PCI device before its driver
      is loaded, we will apply hotplug to recover the error. During the
      plug time, the PCI device will be probed and its driver is loaded.
      Then we wrongly calls to the error handlers if the driver supports
      EEH explicitly.
      
      The patch intends to fix by introducing flag EEH_DEV_NO_HANDLER and
      set it before we remove the PCI device. In turn, we can avoid wrongly
      calls the error handlers of the PCI device after its driver loaded.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f26c7a03
    • G
      powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config space · 9be3becc
      Gavin Shan 提交于
      The patch implements the EEH operation backend restore_config()
      for PowerNV platform. That relies on OPAL API opal_pci_reinit()
      where we reinitialize the error reporting properly after PE or
      PHB reset.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9be3becc
    • G
      powerpc/eeh: Add restore_config operation · 1d350544
      Gavin Shan 提交于
      After reset on the specific PE or PHB, we never configure AER
      correctly on PowerNV platform. We needn't care it on pSeries
      platform. The patch introduces additional EEH operation eeh_ops::
      restore_config() so that we have chance to configure AER correctly
      for PowerNV platform.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1d350544
    • G
      powerpc/powernv: Remove unnecessary assignment · 8184616f
      Gavin Shan 提交于
      We don't have IO ports on PHB3 and the assignment of variable
      "iomap_off" on PHB3 is meaningless. The patch just removes the
      unnecessary assignment to the variable. The code change should
      have been part of commit c35d2a8c ("powerpc/powernv: Needn't IO
      segment map for PHB3").
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8184616f
    • N
      Revert "pseries/iommu: Remove DDW on kexec" · 97e7dc52
      Nishanth Aravamudan 提交于
      After reverting 25ebc45b
      ("powerpc/pseries/iommu: remove default window before attempting DDW
      manipulation"), we no longer remove the base window in enable_ddw.
      Therefore, we no longer need to reset the DMA window state in
      find_existing_ddw_windows(). We can instead go back to what was done
      before, which simply reuses the previous configuration, if any. Further,
      this removes the final caller of the reset-pe-dma-windows call, so
      remove those functions.
      
      This fixes an EEH on kdump with the ipr driver. The EEH occurs, because
      the initcall removes the DDW configuration (64-bit DMA window), but
      doesn't ensure the ops are via the IOMMU -- a DMA operation occurs
      during probe (still investigating this) and we EEH.
      
      This reverts commit 14b6f00f.
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      97e7dc52
    • N
      Revert "powerpc/pseries/iommu: remove default window before attempting DDW manipulation" · ae69e1ed
      Nishanth Aravamudan 提交于
      Ben rightfully pointed out that there is a race in the "newer" DDW code.
      Presuming we are running on recent enough firmware that supports the
      "reset" DDW manipulation call, we currently always remove the base
      32-bit DMA window in order to maximize the resources for Phyp when
      creating the 64-bit window. However, this can be problematic for the
      case where multiple functions are in the same PE (partitionable
      endpoint), where some funtions might be 32-bit DMA only. All of a
      sudden, the only functional DMA window for such functions is gone. We
      will have serious errors in such situations. The best solution is simply
      to revert the extension to the DDW code where we ever remove the base
      DMA window.
      
      This reverts commit 25ebc45b.
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ae69e1ed
    • P
      powerpc: Delete non-required instances of include <linux/init.h> · c141611f
      Paul Gortmaker 提交于
      None of these files are actually using any __init type directives
      and hence don't need to include <linux/init.h>.  Most are just a
      left over from __devinit and __cpuinit removal, or simply due to
      code getting copied from one driver to the next.
      
      The one instance where we add an include for init.h covers off
      a case where that file was implicitly getting it from another
      header which itself didn't need it.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c141611f
    • A
      powerpc: Add vr save/restore functions · 8fe9c93e
      Andreas Schwab 提交于
      GCC 4.8 now generates out-of-line vr save/restore functions when
      optimizing for size.  They are needed for the raid6 altivec support.
      Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8fe9c93e
  2. 30 12月, 2013 13 次提交
  3. 21 12月, 2013 1 次提交
  4. 19 12月, 2013 1 次提交
    • G
      powerpc/512x: dts: remove misplaced IRQ spec from 'soc' node (5125) · bbca4d39
      Gerhard Sittig 提交于
      the 'soc' node in the MPC5125 "tower" board .dts has an '#interrupt-cells'
      property although this node is not an interrupt controller
      
      remove this erroneously placed property because starting with v3.13-rc1
      lookup and resolution of 'interrupts' specs for peripherals gets misled
      (tries to use the 'soc' as the interrupt parent which fails), emits
      'no irq domain found' WARN() messages and breaks the boot process
      
      [ best viewed with 'git diff -U5' to have DT node names in the context ]
      
      Cc: Anatolij Gustschin <agust@denx.de>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: devicetree@vger.kernel.org
      Signed-off-by: NGerhard Sittig <gsi@denx.de>
      Signed-off-by: NAnatolij Gustschin <agust@denx.de>
      bbca4d39
  5. 13 12月, 2013 10 次提交
  6. 10 12月, 2013 4 次提交