1. 03 7月, 2017 6 次提交
  2. 02 7月, 2017 1 次提交
  3. 28 6月, 2017 7 次提交
    • A
      powerpc/powernv/idle: Clear r12 on wakeup from stop lite · 4d0d7c02
      Akshay Adiga 提交于
      pnv_wakeup_noloss() expects r12 to contain SRR1 value to determine if the wakeup
      reason is an HMI in CHECK_HMI_INTERRUPT.
      
      When we wakeup with ESL=0, SRR1 will not contain the wakeup reason, so there is
      no point setting r12 to SRR1.
      
      However, we don't set r12 at all so r12 contains garbage (likely a kernel
      pointer), and is still used to check HMI assuming that it contained SRR1. This
      causes the OPAL msglog to be filled with the following print:
      
        HMI: Received HMI interrupt: HMER = 0x0040000000000000
      
      This patch clears r12 after waking up from stop with ESL=EC=0, so that we don't
      accidentally enter the HMI handler in pnv_wakeup_noloss() if the value of
      r12[42:45] corresponds to HMI as wakeup reason.
      
      Prior to commit 9d292501 ("powerpc/64s/idle: Avoid SRR usage in idle
      sleep/wake paths") this bug existed, in that we would incorrectly look at SRR1
      to check for a HMI when SRR1 didn't contain a wakeup reason. However the SRR1
      value would just happen to never have bits 42:45 set.
      
      Fixes: 9d292501 ("powerpc/64s/idle: Avoid SRR usage in idle sleep/wake paths")
      Signed-off-by: NAkshay Adiga <akshay.adiga@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Change log and comment massaging]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4d0d7c02
    • S
      powerpc/smp: Convert NR_CPUS to nr_cpu_ids · c642af9c
      Santosh Sivaraj 提交于
      nr_cpu_ids can be limited by nr_cpus boot parameter, whereas NR_CPUS is a
      compile time constant, which shouldn't be compared against during cpu kick.
      Signed-off-by: NSantosh Sivaraj <santosh@fossix.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c642af9c
    • S
      powerpc/smp: Do not BUG_ON if invalid CPU during kick · f8d0d5dc
      Santosh Sivaraj 提交于
      During secondary start, we do not need to BUG_ON if an invalid CPU number
      is passed. We already print an error if secondary cannot be started, so
      just return an error instead.
      Signed-off-by: NSantosh Sivaraj <santosh@fossix.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f8d0d5dc
    • H
      powerpc/fadump: add reschedule point while releasing memory · 68fa6478
      Hari Bathini 提交于
      Around 95% of memory is reserved by fadump/capture kernel. All this
      memory is freed, one page at a time, on writing '1' to the node
      /sys/kernel/fadump_release_mem. On systems with large memory, this
      can take a long time to complete, leading to soft lockup warning
      messages. To avoid this, add reschedule points at regular intervals.
      
      Also, while memblock_reserve() implicitly takes care of holes in the
      given memory range while reserving memory, those holes need to be
      taken care of while releasing memory as memory is freed one page at
      a time. Add support to skip holes while releasing memory.
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      68fa6478
    • H
      powerpc/fadump: provide a helpful error message · a5a05b91
      Hari Bathini 提交于
      fadump fails to register when there are holes in boot memory area.
      Provide a helpful error message to the user in such case.
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a5a05b91
    • H
      powerpc/fadump: avoid holes in boot memory area when fadump is registered · eae0dfcc
      Hari Bathini 提交于
      To register fadump, boot memory area - the size of low memory chunk that
      is required for a kernel to boot successfully when booted with restricted
      memory, is assumed to have no holes. But this memory area is currently
      not protected from hot-remove operations. So, fadump could fail to
      re-register after a memory hot-remove operation, if memory is removed
      from boot memory area. To avoid this, ensure that memory from boot
      memory area is not hot-removed when fadump is registered.
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      Reviewed-by: NMahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      eae0dfcc
    • H
      powerpc/fadump: avoid duplicates in crash memory ranges · a77af552
      Hari Bathini 提交于
      fadump sets up crash memory ranges to be used for creating PT_LOAD
      program headers in elfcore header. Memory chunk RMA_START through
      boot memory area size is added as the first memory range because
      firmware, at the time of crash, moves this memory chunk to different
      location specified during fadump registration making it necessary to
      create a separate program header for it with the correct offset.
      This memory chunk is skipped while setting up the remaining memory
      ranges. But currently, there is possibility that some of this memory
      may have duplicate entries like when it is hot-removed and added
      again. Ensure that no two memory ranges represent the same memory.
      
      When 5 lmbs are hot-removed and then hot-plugged before registering
      fadump, here is how the program headers in /proc/vmcore exported by
      fadump look like
      
      without this change:
      
        Program Headers:
          Type           Offset             VirtAddr           PhysAddr
                         FileSiz            MemSiz              Flags  Align
          NOTE           0x0000000000010000 0x0000000000000000 0x0000000000000000
                         0x0000000000001894 0x0000000000001894         0
          LOAD           0x0000000000021020 0xc000000000000000 0x0000000000000000
                         0x0000000040000000 0x0000000040000000  RWE    0
          LOAD           0x0000000040031020 0xc000000000000000 0x0000000000000000
                         0x0000000010000000 0x0000000010000000  RWE    0
          LOAD           0x0000000050040000 0xc000000010000000 0x0000000010000000
                         0x0000000050000000 0x0000000050000000  RWE    0
          LOAD           0x00000000a0040000 0xc000000060000000 0x0000000060000000
                         0x000000019ffe0000 0x000000019ffe0000  RWE    0
      
      and with this change:
      
        Program Headers:
          Type           Offset             VirtAddr           PhysAddr
                         FileSiz            MemSiz              Flags  Align
          NOTE           0x0000000000010000 0x0000000000000000 0x0000000000000000
                         0x0000000000001894 0x0000000000001894         0
          LOAD           0x0000000000021020 0xc000000000000000 0x0000000000000000
                         0x0000000040000000 0x0000000040000000  RWE    0
          LOAD           0x0000000040030000 0xc000000040000000 0x0000000040000000
                         0x0000000020000000 0x0000000020000000  RWE    0
          LOAD           0x0000000060030000 0xc000000060000000 0x0000000060000000
                         0x000000019ffe0000 0x000000019ffe0000  RWE    0
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      Reviewed-by: NMahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a77af552
  4. 27 6月, 2017 4 次提交
  5. 23 6月, 2017 1 次提交
    • N
      powerpc/64: Initialise thread_info for emergency stacks · 34f19ff1
      Nicholas Piggin 提交于
      Emergency stacks have their thread_info mostly uninitialised, which in
      particular means garbage preempt_count values.
      
      Emergency stack code runs with interrupts disabled entirely, and is
      used very rarely, so this has been unnoticed so far. It was found by a
      proposed new powerpc watchdog that takes a soft-NMI directly from the
      masked_interrupt handler and using the emergency stack. That crashed
      at BUG_ON(in_nmi()) in nmi_enter(). preempt_count()s were found to be
      garbage.
      
      To fix this, zero the entire THREAD_SIZE allocation, and initialize
      the thread_info.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Move it all into setup_64.c, use a function not a macro. Fix
            crashes on Cell by setting preempt_count to 0 not HARDIRQ_OFFSET]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      34f19ff1
  6. 22 6月, 2017 1 次提交
    • P
      powerpc: Convert VDSO update function to use new update_vsyscall interface · d4cfb113
      Paul Mackerras 提交于
      This converts the powerpc VDSO time update function to use the new
      interface introduced in commit 576094b7 ("time: Introduce new
      GENERIC_TIME_VSYSCALL", 2012-09-11).  Where the old interface gave
      us the time as of the last update in seconds and whole nanoseconds,
      with the new interface we get the nanoseconds part effectively in
      a binary fixed-point format with tk->tkr_mono.shift bits to the
      right of the binary point.
      
      With the old interface, the fractional nanoseconds got truncated,
      meaning that the value returned by the VDSO clock_gettime function
      would have about 1ns of jitter in it compared to the value computed
      by the generic timekeeping code in the kernel.
      
      The powerpc VDSO time functions (clock_gettime and gettimeofday)
      already work in units of 2^-32 seconds, or 0.23283 ns, because that
      makes it simple to split the result into seconds and fractional
      seconds, and represent the fractional seconds in either microseconds
      or nanoseconds.  This is good enough accuracy for now, so this patch
      avoids changing how the VDSO works or the interface in the VDSO data
      page.
      
      This patch converts the powerpc update_vsyscall_old to be called
      update_vsyscall and use the new interface.  We convert the fractional
      second to units of 2^-32 seconds without truncating to whole nanoseconds.
      (There is still a conversion to whole nanoseconds for any legacy users
      of the vdso_data/systemcfg stamp_xtime field.)
      
      In addition, this improves the accuracy of the computation of tb_to_xs
      for those systems with high-frequency timebase clocks (>= 268.5 MHz)
      by doing the right shift in two parts, one before the multiplication and
      one after, rather than doing the right shift before the multiplication.
      (We can't do all of the right shift after the multiplication unless we
      use 128-bit arithmetic.)
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d4cfb113
  7. 21 6月, 2017 4 次提交
  8. 20 6月, 2017 3 次提交
  9. 19 6月, 2017 7 次提交
  10. 16 6月, 2017 4 次提交
    • N
      powerpc/64s: Handle data breakpoints in Radix mode · d89ba535
      Naveen N. Rao 提交于
      On Power9, trying to use data breakpoints throws the splat shown
      below. This is because the check for a data breakpoint in DSISR is in
      do_hash_page(), which is not called when in Radix mode.
      
        Unable to handle kernel paging request for data at address 0xc000000000e19218
        Faulting instruction address: 0xc0000000001155e8
        cpu 0x0: Vector: 300 (Data Access) at [c0000000ef1e7b20]
        pc: c0000000001155e8: find_pid_ns+0x48/0xe0
        lr: c000000000116ac4: find_task_by_vpid+0x44/0x90
        sp: c0000000ef1e7da0
        msr: 9000000000009033
        dar: c000000000e19218
        dsisr: 400000
      
      Move the check to handle_page_fault() so as to catch data breakpoints
      in both Hash and Radix MMU modes.
      
      We have to change the check in do_hash_page() against 0xa410 to use
      0xa450, so as to include the value of (DSISR_DABRMATCH << 16).
      
      There are two sites that call handle_page_fault() when in Radix, both
      already pass DSISR in r4.
      
      Fixes: caca285e ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code")
      Cc: stable@vger.kernel.org # v4.7+
      Reported-by: NShriya R. Kulkarni <shriykul@in.ibm.com>
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      [mpe: Fix the fall-through case on hash, we need to reload DSISR]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d89ba535
    • N
      powerpc/kprobes: Skip livepatch_handler() for jprobes · c05b8c44
      Naveen N. Rao 提交于
      ftrace_caller() depends on a modified regs->nip to detect if a certain
      function has been livepatched. However, with KPROBES_ON_FTRACE, it is
      possible for regs->nip to have been modified by the kprobes pre_handler
      (jprobes, for instance). In this case, we do not want to invoke the
      livepatch_handler so as not to consume the livepatch stack.
      
      To distinguish between the two (kprobes and livepatch), we check if
      there is an active kprobe on the current function. If there is, then we
      know for sure that it must have modified the NIP as we don't support
      livepatching a kprobe'd function. In this case, we simply skip the
      livepatch_handler and branch to the new NIP. Otherwise, the
      livepatch_handler is invoked.
      
      Fixes: ead514d5 ("powerpc/kprobes: Add support for KPROBES_ON_FTRACE")
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c05b8c44
    • N
      powerpc/ftrace: Pass the correct stack pointer for DYNAMIC_FTRACE_WITH_REGS · a4979a7e
      Naveen N. Rao 提交于
      For DYNAMIC_FTRACE_WITH_REGS, we should be passing-in the original set
      of registers in pt_regs, to capture the state _before_ ftrace_caller.
      However, we are instead passing the stack pointer *after* allocating a
      stack frame in ftrace_caller. Fix this by saving the proper value of r1
      in pt_regs. Also, use SAVE_10GPRS() to simplify the code.
      
      Fixes: 15308664 ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI")
      Cc: stable@vger.kernel.org # v4.6+
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a4979a7e
    • N
      powerpc/kprobes: Pause function_graph tracing during jprobes handling · a9f8553e
      Naveen N. Rao 提交于
      This fixes a crash when function_graph and jprobes are used together.
      This is essentially commit 237d28db ("ftrace/jprobes/x86: Fix
      conflict between jprobes and function graph tracing"), but for powerpc.
      
      Jprobes breaks function_graph tracing since the jprobe hook needs to use
      jprobe_return(), which never returns back to the hook, but instead to
      the original jprobe'd function. The solution is to momentarily pause
      function_graph tracing before invoking the jprobe hook and re-enable it
      when returning back to the original jprobe'd function.
      
      Fixes: 6794c782 ("powerpc64: port of the function graph tracer")
      Cc: stable@vger.kernel.org # v2.6.30+
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a9f8553e
  11. 15 6月, 2017 2 次提交
    • N
      powerpc/64s: Avoid cpabort in context switch when possible · 07d2a628
      Nicholas Piggin 提交于
      The ISA v3.0B copy-paste facility only requires cpabort when switching
      to a process that has foreign real addresses mapped (direct access to
      accelerators), to clear a potential copy buffer filled by a previous
      thread. There is no accelerator driver implemented yet, so cpabort can
      be removed. It can be be re-added when a driver is implemented.
      
      POWER9 DD1 requires the copy buffer to always be cleared on context
      switch, but if accelerators are not in use, then an unpaired copy from
      a dummy region is sufficient to clear data out of the copy buffer.
      
      This increases context switch performance by about 5% on POWER9.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      07d2a628
    • N
      powerpc/64: Drop explicit hwsync in context switch · 9145effd
      Nicholas Piggin 提交于
      The sync (aka. hwsync, aka. heavyweight sync) in the context switch
      code to prevent MMIO access being reordered from the point of view of
      a single process if it gets migrated to a different CPU is not
      required because there is an hwsync performed earlier in the context
      switch path.
      
      Comment this so it's clear enough if anything changes on the scheduler
      or the powerpc sides. Remove the hwsync from _switch.
      
      This improves context switch performance by 2-3% on POWER8.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9145effd