1. 02 5月, 2017 2 次提交
    • V
      cxl: Route eeh events to all drivers in cxl_pci_error_detected() · 4f58f0bf
      Vaibhav Jain 提交于
      Fix a boundary condition where in some cases an eeh event that results
      in card reset isn't passed on to a driver attached to the virtual PCI
      device associated with a slice. This will happen in case when a slice
      attached device driver returns a value other than
      PCI_ERS_RESULT_NEED_RESET from the eeh error_detected() callback. This
      would result in an early return from cxl_pci_error_detected() and
      other drivers attached to other AFUs on the card wont be notified.
      
      The patch fixes this by making sure that all slice attached
      device-drivers are notified and the return values from
      error_detected() callback are aggregated in a scheme where request for
      'disconnect' trumps all and 'none' trumps 'need_reset'.
      
      Fixes: 9e8df8a2 ("cxl: EEH support")
      Cc: stable@vger.kernel.org # v4.3+
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4f58f0bf
    • V
      cxl: Force context lock during EEH flow · ea9a26d1
      Vaibhav Jain 提交于
      During an eeh event when the cxl card is fenced and card sysfs attr
      perst_reloads_same_image is set following warning message is seen in the
      kernel logs:
      
        Adapter context unlocked with 0 active contexts
        ------------[ cut here ]------------
        WARNING: CPU: 12 PID: 627 at
        ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]
      
      Even though this warning is harmless, it clutters the kernel log
      during an eeh event. This warning is triggered as the EEH callback
      cxl_pci_error_detected doesn't obtain a context-lock before forcibly
      detaching all active context and when context-lock is released during
      call to cxl_configure_adapter from cxl_pci_slot_reset, a warning in
      cxl_adapter_context_unlock is triggered.
      
      To fix this warning, we acquire the adapter context-lock via
      cxl_adapter_context_lock() in the eeh callback
      cxl_pci_error_detected() once all the virtual AFU PHBs are notified
      and their contexts detached. The context-lock is released in
      cxl_pci_slot_reset() after the adapter is successfully reconfigured
      and before the we call the slot_reset callback on slice attached
      device-drivers.
      
      Fixes: 70b565bb ("cxl: Prevent adapter reset if an active context exists")
      Cc: stable@vger.kernel.org # v4.9+
      Reported-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Tested-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ea9a26d1
  2. 01 5月, 2017 2 次提交
  3. 28 4月, 2017 19 次提交
  4. 27 4月, 2017 8 次提交
  5. 26 4月, 2017 3 次提交
    • M
      powerpc/powernv: Fix oops on P9 DD1 in cause_ipi() · 45b21cfe
      Michael Ellerman 提交于
      Recently we merged the native xive support for Power9, and then separately some
      reworks for doorbell IPI support. In isolation both series were OK, but the
      merged result had a bug in one case.
      
      On P9 DD1 we use pnv_p9_dd1_cause_ipi() which tries to use doorbells, and then
      falls back to the interrupt controller. However the fallback is implemented by
      calling icp_ops->cause_ipi. But now that xive support is merged we might be
      using xive, in which case icp_ops is not initialised, it's a xics specific
      structure. This leads to an oops such as:
      
        Unable to handle kernel paging request for data at address 0x00000028
        Oops: Kernel access of bad area, sig: 11 [#1]
        NIP pnv_p9_dd1_cause_ipi+0x74/0xe0
        LR smp_muxed_ipi_message_pass+0x54/0x70
      
      To fix it, rather than using icp_ops which might be NULL, have both xics and
      xive set smp_ops->cause_ipi, and then in the powernv code we save that as
      ic_cause_ipi before overriding smp_ops->cause_ipi. For paranoia add a WARN_ON()
      to check if somehow smp_ops->cause_ipi is NULL.
      
      Fixes: b866cc21 ("powerpc: Change the doorbell IPI calling convention")
      Tested-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      45b21cfe
    • M
      powerpc/powernv: Fix missing attr initialisation in opal_export_attrs() · 83c49190
      Michael Ellerman 提交于
      In opal_export_attrs() we dynamically allocate some bin_attributes. They're
      allocated with kmalloc() and although we initialise most of the fields, we don't
      initialise write() or mmap(), and in particular we don't initialise the lockdep
      related fields in the embedded struct attribute.
      
      This leads to a lockdep warning at boot:
      
        BUG: key c0000000f11906d8 not in .data!
        WARNING: CPU: 0 PID: 1 at ../kernel/locking/lockdep.c:3136 lockdep_init_map+0x28c/0x2a0
        ...
        Call Trace:
          lockdep_init_map+0x288/0x2a0 (unreliable)
          __kernfs_create_file+0x8c/0x170
          sysfs_add_file_mode_ns+0xc8/0x240
          __machine_initcall_powernv_opal_init+0x60c/0x684
          do_one_initcall+0x60/0x1c0
          kernel_init_freeable+0x2f4/0x3d4
          kernel_init+0x24/0x160
          ret_from_kernel_thread+0x5c/0xb0
      
      Fix it by kzalloc'ing the attr, which fixes the uninitialised write() and
      mmap(), and calling sysfs_bin_attr_init() on it to initialise the lockdep
      fields.
      
      Fixes: 11fe909d ("powerpc/powernv: Add OPAL exports attributes to sysfs")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      83c49190
    • M
      powerpc/mm: Fix possible out-of-bounds shift in arch_mmap_rnd() · b409946b
      Michael Ellerman 提交于
      The recent patch to add runtime configuration of the ASLR limits added a bug in
      arch_mmap_rnd() where we may shift an integer (32-bits) by up to 33 bits,
      leading to undefined behaviour.
      
      In practice it exhibits as every process seg faulting instantly, presumably
      because the rnd value hasn't been restricited by the modulus at all. We didn't
      notice because it only happens under certain kernel configurations and if the
      number of bits is actually set to a large value.
      
      Fix it by switching to unsigned long.
      
      Fixes: 9fea59bd ("powerpc/mm: Add support for runtime configuration of ASLR limits")
      Reported-by: NBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b409946b
  6. 24 4月, 2017 6 次提交
    • D
      powerpc/mm: Ensure IRQs are off in switch_mm() · 9765ad13
      David Gibson 提交于
      powerpc expects IRQs to already be (soft) disabled when switch_mm() is
      called, as made clear in the commit message of 9c1e1052 ("powerpc: Allow
      perf_counters to access user memory at interrupt time").
      
      Aside from any race conditions that might exist between switch_mm() and an IRQ,
      there is also an unconditional hard_irq_disable() in switch_slb(). If that isn't
      followed at some point by an IRQ enable then interrupts will remain disabled
      until we return to userspace.
      
      It is true that when switch_mm() is called from the scheduler IRQs are off, but
      not when it's called by use_mm(). Looking closer we see that last year in commit
      f98db601 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler")
      this was made more explicit by the addition of switch_mm_irqs_off() which is now
      called by the scheduler, vs switch_mm() which is used by use_mm().
      
      Arguably it is a bug in use_mm() to call switch_mm() in a different context than
      it expects, but fixing that will take time.
      
      This was discovered recently when vhost started throwing warnings such as:
      
        BUG: sleeping function called from invalid context at kernel/mutex.c:578
        in_atomic(): 0, irqs_disabled(): 1, pid: 10768, name: vhost-10760
        no locks held by vhost-10760/10768.
        irq event stamp: 10
        hardirqs last  enabled at (9):  _raw_spin_unlock_irq+0x40/0x80
        hardirqs last disabled at (10): switch_slb+0x2e4/0x490
        softirqs last  enabled at (0):  copy_process+0x5e8/0x1260
        softirqs last disabled at (0):  (null)
        Call Trace:
          show_stack+0x88/0x390 (unreliable)
          dump_stack+0x30/0x44
          __might_sleep+0x1c4/0x2d0
          mutex_lock_nested+0x74/0x5c0
          cgroup_attach_task_all+0x5c/0x180
          vhost_attach_cgroups_work+0x58/0x80 [vhost]
          vhost_worker+0x24c/0x3d0 [vhost]
          kthread+0xec/0x100
          ret_from_kernel_thread+0x5c/0xd4
      
      Prior to commit 04b96e55 ("vhost: lockless enqueuing") (Aug 2016) the
      vhost_worker() would do a spin_unlock_irq() not long after calling use_mm(),
      which had the effect of reenabling IRQs. Since that commit removed the locking
      in vhost_worker() the body of the vhost_worker() loop now runs with interrupts
      off causing the warnings.
      
      This patch addresses the problem by making the powerpc code mirror the x86 code,
      ie. we disable interrupts in switch_mm(), and optimise the scheduler case by
      defining switch_mm_irqs_off().
      
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [mpe: Flesh out/rewrite change log, add stable]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9765ad13
    • T
      powerpc/sysfs: Fix reference leak of cpu device_nodes present at boot · e76ca277
      Tyrel Datwyler 提交于
      For CPUs present at boot each logical CPU acquires a reference to the
      associated device node of the core. This happens in register_cpu() which
      is called by topology_init(). The result of this is that we end up with
      a reference held by each thread of the core. However, these references
      are never freed if the CPU core is DLPAR removed.
      
      This patch fixes the reference leaks by acquiring and releasing the references
      in the CPU hotplug callbacks un/register_cpu_online(). With this patch symmetric
      reference counting is observed with both CPUs present at boot, and those DLPAR
      added after boot.
      
      Fixes: f86e4718 ("driver/core: cpu: initialize of_node in cpu's device struture")
      Cc: stable@vger.kernel.org # v3.12+
      Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e76ca277
    • T
      powerpc/pseries: Fix of_node_put() underflow during DLPAR remove · 68baf692
      Tyrel Datwyler 提交于
      Historically struct device_node references were tracked using a kref embedded as
      a struct field. Commit 75b57ecf ("of: Make device nodes kobjects so they
      show up in sysfs") (Mar 2014) refactored device_nodes to be kobjects such that
      the device tree could by more simply exposed to userspace using sysfs.
      
      Commit 0829f6d1 ("of: device_node kobject lifecycle fixes") (Mar 2014)
      followed up these changes to better control the kobject lifecycle and in
      particular the referecne counting via of_node_get(), of_node_put(), and
      of_node_init().
      
      A result of this second commit was that it introduced an of_node_put() call when
      a dynamic node is detached, in of_node_remove(), that removes the initial kobj
      reference created by of_node_init().
      
      Traditionally as the original dynamic device node user the pseries code had
      assumed responsibilty for releasing this final reference in its platform
      specific DLPAR detach code.
      
      This patch fixes a refcount underflow introduced by commit 0829f6d1, and
      recently exposed by the upstreaming of the recount API.
      
      Messages like the following are no longer seen in the kernel log with this
      patch following DLPAR remove operations of cpus and pci devices.
      
        rpadlpar_io: slot PHB 72 removed
        refcount_t: underflow; use-after-free.
        ------------[ cut here ]------------
        WARNING: CPU: 5 PID: 3335 at lib/refcount.c:128 refcount_sub_and_test+0xf4/0x110
      
      Fixes: 0829f6d1 ("of: device_node kobject lifecycle fixes")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      [mpe: Make change log commit references more verbose]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      68baf692
    • M
      powerpc/xmon: Deindent the SLB dumping logic · 85673646
      Michael Ellerman 提交于
      Currently the code that dumps SLB entries uses a double-nested if. This
      means the actual dumping logic is a bit squashed. Deindent it by using
      continue.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NRashmica Gupta <rashmica.g@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      85673646
    • M
      Merge branch 'topic/kprobes' into next · 9fc84914
      Michael Ellerman 提交于
      Although most of these kprobes patches are powerpc specific, there's a couple
      that touch generic code (with Acks). At the moment there's one conflict with
      acme's tree, but it's not too bad. Still just in case some other conflicts show
      up, we've put these in a topic branch so another tree could merge some or all of
      it if necessary.
      9fc84914
    • N
      powerpc/kprobes: Prefer ftrace when probing function entry · 24bd909e
      Naveen N. Rao 提交于
      KPROBES_ON_FTRACE avoids much of the overhead of regular kprobes as it
      eliminates the need for a trap, as well as the need to emulate or single-step
      instructions.
      
      Though OPTPROBES provides us with similar performance, we have limited
      optprobes trampoline slots. As such, when asked to probe at a function
      entry, default to using the ftrace infrastructure.
      
      With:
        # cd /sys/kernel/debug/tracing
        # echo 'p _do_fork' > kprobe_events
      
      before patch:
        # cat ../kprobes/list
        c0000000000daf08  k  _do_fork+0x8    [DISABLED]
        c000000000044fc0  k  kretprobe_trampoline+0x0    [OPTIMIZED]
      
      and after patch:
        # cat ../kprobes/list
        c0000000000d074c  k  _do_fork+0xc    [DISABLED][FTRACE]
        c0000000000412b0  k  kretprobe_trampoline+0x0    [OPTIMIZED]
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      24bd909e