1. 28 4月, 2017 5 次提交
  2. 27 4月, 2017 1 次提交
  3. 24 4月, 2017 3 次提交
    • D
      powerpc/mm: Ensure IRQs are off in switch_mm() · 9765ad13
      David Gibson 提交于
      powerpc expects IRQs to already be (soft) disabled when switch_mm() is
      called, as made clear in the commit message of 9c1e1052 ("powerpc: Allow
      perf_counters to access user memory at interrupt time").
      
      Aside from any race conditions that might exist between switch_mm() and an IRQ,
      there is also an unconditional hard_irq_disable() in switch_slb(). If that isn't
      followed at some point by an IRQ enable then interrupts will remain disabled
      until we return to userspace.
      
      It is true that when switch_mm() is called from the scheduler IRQs are off, but
      not when it's called by use_mm(). Looking closer we see that last year in commit
      f98db601 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler")
      this was made more explicit by the addition of switch_mm_irqs_off() which is now
      called by the scheduler, vs switch_mm() which is used by use_mm().
      
      Arguably it is a bug in use_mm() to call switch_mm() in a different context than
      it expects, but fixing that will take time.
      
      This was discovered recently when vhost started throwing warnings such as:
      
        BUG: sleeping function called from invalid context at kernel/mutex.c:578
        in_atomic(): 0, irqs_disabled(): 1, pid: 10768, name: vhost-10760
        no locks held by vhost-10760/10768.
        irq event stamp: 10
        hardirqs last  enabled at (9):  _raw_spin_unlock_irq+0x40/0x80
        hardirqs last disabled at (10): switch_slb+0x2e4/0x490
        softirqs last  enabled at (0):  copy_process+0x5e8/0x1260
        softirqs last disabled at (0):  (null)
        Call Trace:
          show_stack+0x88/0x390 (unreliable)
          dump_stack+0x30/0x44
          __might_sleep+0x1c4/0x2d0
          mutex_lock_nested+0x74/0x5c0
          cgroup_attach_task_all+0x5c/0x180
          vhost_attach_cgroups_work+0x58/0x80 [vhost]
          vhost_worker+0x24c/0x3d0 [vhost]
          kthread+0xec/0x100
          ret_from_kernel_thread+0x5c/0xd4
      
      Prior to commit 04b96e55 ("vhost: lockless enqueuing") (Aug 2016) the
      vhost_worker() would do a spin_unlock_irq() not long after calling use_mm(),
      which had the effect of reenabling IRQs. Since that commit removed the locking
      in vhost_worker() the body of the vhost_worker() loop now runs with interrupts
      off causing the warnings.
      
      This patch addresses the problem by making the powerpc code mirror the x86 code,
      ie. we disable interrupts in switch_mm(), and optimise the scheduler case by
      defining switch_mm_irqs_off().
      
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [mpe: Flesh out/rewrite change log, add stable]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9765ad13
    • N
      powerpc: Introduce a new helper to obtain function entry points · 1b32cd17
      Naveen N. Rao 提交于
      kprobe_lookup_name() is specific to the kprobe subsystem and may not always
      return the function entry point (in a subsequent patch for KPROBES_ON_FTRACE).
      For looking up function entry points, introduce a separate helper and use it
      in optprobes.c
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1b32cd17
    • N
      powerpc/kprobes: Add support for KPROBES_ON_FTRACE · ead514d5
      Naveen N. Rao 提交于
      Allow kprobes to be placed on ftrace _mcount() call sites. This optimization
      avoids the use of a trap, by riding on ftrace infrastructure.
      
      This depends on HAVE_DYNAMIC_FTRACE_WITH_REGS which depends on MPROFILE_KERNEL,
      which is only currently enabled on powerpc64le with newer toolchains.
      
      Based on the x86 code by Masami.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ead514d5
  4. 23 4月, 2017 8 次提交
  5. 21 4月, 2017 1 次提交
  6. 20 4月, 2017 2 次提交
  7. 19 4月, 2017 4 次提交
  8. 13 4月, 2017 6 次提交
  9. 12 4月, 2017 2 次提交
    • B
      powerpc/tracing: Allow tracing of mmap syscalls · 9c355917
      Balbir Singh 提交于
      Currently sys_mmap() and sys_mmap2() (32-bit only), are not visible to the
      syscall tracing machinery. This means users are not able to see the execution of
      mmap() syscalls using the syscall tracer.
      
      Fix that by using SYSCALL_DEFINE6 for sys_mmap() and sys_mmap2() so that the
      meta-data associated with these syscalls is visible to the syscall tracer.
      
      A side-effect of this change is that the return type has changed from unsigned
      long to long. However this should have no effect, the only code in the kernel
      which uses the result of these syscalls is in the syscall return path, which is
      written in asm and treats the result as unsigned regardless.
      
      Example output:
        cat-3399  [001] ....   196.542410: sys_mmap(addr: 7fff922a0000, len: 20000, prot: 3, flags: 812, fd: 3, offset: 1b0000)
        cat-3399  [001] ....   196.542443: sys_mmap -> 0x7fff922a0000
        cat-3399  [001] ....   196.542668: sys_munmap(addr: 7fff922c0000, len: 6d2c)
        cat-3399  [001] ....   196.542677: sys_munmap -> 0x0
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      [mpe: Massage change log, add detail on return type change]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9c355917
    • M
      powerpc/mm: Fix swapper_pg_dir size on 64-bit hash w/64K pages · 03dfee6d
      Michael Ellerman 提交于
      Recently in commit f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB"),
      we increased H_PGD_INDEX_SIZE to 15 when we're building with 64K pages. This
      makes it larger than RADIX_PGD_INDEX_SIZE (13), which means the logic to
      calculate MAX_PGD_INDEX_SIZE in book3s/64/pgtable.h is wrong.
      
      The end result is that the PGD (Page Global Directory, ie top level page table)
      of the kernel (aka. swapper_pg_dir), is too small.
      
      This generally doesn't lead to a crash, as we don't use the full range in normal
      operation. However if we try to dump the kernel pagetables we can trigger a
      crash because we walk off the end of the pgd into other memory and eventually
      try to dereference something bogus:
      
        $ cat /sys/kernel/debug/kernel_pagetables
        Unable to handle kernel paging request for data at address 0xe8fece0000000000
        Faulting instruction address: 0xc000000000072314
        cpu 0xc: Vector: 380 (Data SLB Access) at [c0000000daa13890]
            pc: c000000000072314: ptdump_show+0x164/0x430
            lr: c000000000072550: ptdump_show+0x3a0/0x430
           dar: e802cf0000000000
        seq_read+0xf8/0x560
        full_proxy_read+0x84/0xc0
        __vfs_read+0x6c/0x1d0
        vfs_read+0xbc/0x1b0
        SyS_read+0x6c/0x110
        system_call+0x38/0xfc
      
      The root cause is that MAX_PGD_INDEX_SIZE isn't actually computed to be
      the max of H_PGD_INDEX_SIZE or RADIX_PGD_INDEX_SIZE. To fix that move
      the calculation into asm-offsets.c where we can do it easily using
      max().
      
      Fixes: f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      03dfee6d
  10. 11 4月, 2017 3 次提交
    • G
      powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1 · 17ed4c8f
      Gautham R. Shenoy 提交于
      POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up
      from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus
      the HSPRG0 of a thread waking up from can contain the paca pointer of
      its sibling.
      
      This patch implements a context recovery framework within threads of a
      core, by provisioning space in paca_struct for saving every sibling
      threads's paca pointers. Basically, we should be able to arrive at the
      right paca pointer from any of the thread's existing paca pointer.
      
      At bootup, during powernv idle-init, we save the paca address of every
      CPU in each one its siblings paca_struct in the slot corresponding to
      this CPU's index in the core.
      
      On wakeup from a stop, the thread will determine its index in the core
      from the TIR register and recover its PACA pointer by indexing into
      the correct slot in the provisioned space in the current PACA.
      
      Furthermore, ensure that the NVGPRs are restored from the stack on the
      way out by setting the NAPSTATELOST in paca.
      
      [Changelog written with inputs from svaidy@linux.vnet.ibm.com]
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Call it a bug]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      17ed4c8f
    • G
      powerpc/powernv: Move CPU-Offline idle state invocation from smp.c to idle.c · a7cd88da
      Gautham R. Shenoy 提交于
      Move the piece of code in powernv/smp.c::pnv_smp_cpu_kill_self() which
      transitions the CPU to the deepest available platform idle state to a
      new function named pnv_cpu_offline() in powernv/idle.c. The rationale
      behind this code movement is that the data required to determine the
      deepest available platform state resides in powernv/idle.c.
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a7cd88da
    • M
      powerpc: Create asm/debugfs.h and move powerpc_debugfs_root there · 7644d581
      Michael Ellerman 提交于
      powerpc_debugfs_root is the dentry representing the root of the
      "powerpc" directory tree in debugfs.
      
      Currently it sits in asm/debug.h, a long with some other things that
      have "debug" in the name, but are otherwise unrelated.
      
      Pull it out into a separate header, which also includes linux/debugfs.h,
      and convert all the users to include debugfs.h instead of debug.h.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7644d581
  11. 10 4月, 2017 5 次提交