1. 23 4月, 2017 2 次提交
  2. 19 4月, 2017 2 次提交
    • N
      powerpc/64s: Minor fix for MCE TLB flush for radix · 95dbdf4f
      Nicholas Piggin 提交于
      The TLB flush for radix first flushes TLB for radix configuration,
      then flushes for hash configuration. The second flush is unnecessary
      but does not affect correctness.
      
      Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      95dbdf4f
    • N
      powerpc/64s: Revert setting of LPCR[LPES] on POWER9 · 8d1b48ef
      Nicholas Piggin 提交于
      The XIVE enablement patches included a change to set the LPES (Logical
      Partitioning Environment Selector) bit (bit # 3) in LPCR (Logical Partitioning
      Control Register) on POWER9 hosts. This bit sets external interrupts to guest
      delivery mode, which uses SRR0/1. The host's EE interrupt handler is written to
      expect HSRR0/1 (for earlier CPUs). This should be fine because XIVE is
      configured not to deliver EEs to the host (Hypervisor Virtulization Interrupt is
      used instead) so the EE handler should never be executed.
      
      However a bug in interrupt controller code, hardware, or odd configuration of a
      simulator could result in the host getting an EE incorrectly. Keeping the EE
      delivery mode matching the host EE handler prevents strange crashes due to using
      the wrong exception registers.
      
      KVM will configure the LPCR to set LPES prior to running a guest so that EEs are
      delivered to the guest using SRR0/1.
      
      Fixes: 08a1e650 ("powerpc: Fixup LPCR:PECE and HEIC setting on POWER9")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Massage change log to avoid referring to LPES0 which is now renamed LPES]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8d1b48ef
  3. 13 4月, 2017 4 次提交
  4. 12 4月, 2017 2 次提交
    • B
      powerpc/tracing: Allow tracing of mmap syscalls · 9c355917
      Balbir Singh 提交于
      Currently sys_mmap() and sys_mmap2() (32-bit only), are not visible to the
      syscall tracing machinery. This means users are not able to see the execution of
      mmap() syscalls using the syscall tracer.
      
      Fix that by using SYSCALL_DEFINE6 for sys_mmap() and sys_mmap2() so that the
      meta-data associated with these syscalls is visible to the syscall tracer.
      
      A side-effect of this change is that the return type has changed from unsigned
      long to long. However this should have no effect, the only code in the kernel
      which uses the result of these syscalls is in the syscall return path, which is
      written in asm and treats the result as unsigned regardless.
      
      Example output:
        cat-3399  [001] ....   196.542410: sys_mmap(addr: 7fff922a0000, len: 20000, prot: 3, flags: 812, fd: 3, offset: 1b0000)
        cat-3399  [001] ....   196.542443: sys_mmap -> 0x7fff922a0000
        cat-3399  [001] ....   196.542668: sys_munmap(addr: 7fff922c0000, len: 6d2c)
        cat-3399  [001] ....   196.542677: sys_munmap -> 0x0
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      [mpe: Massage change log, add detail on return type change]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9c355917
    • M
      powerpc/mm: Fix swapper_pg_dir size on 64-bit hash w/64K pages · 03dfee6d
      Michael Ellerman 提交于
      Recently in commit f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB"),
      we increased H_PGD_INDEX_SIZE to 15 when we're building with 64K pages. This
      makes it larger than RADIX_PGD_INDEX_SIZE (13), which means the logic to
      calculate MAX_PGD_INDEX_SIZE in book3s/64/pgtable.h is wrong.
      
      The end result is that the PGD (Page Global Directory, ie top level page table)
      of the kernel (aka. swapper_pg_dir), is too small.
      
      This generally doesn't lead to a crash, as we don't use the full range in normal
      operation. However if we try to dump the kernel pagetables we can trigger a
      crash because we walk off the end of the pgd into other memory and eventually
      try to dereference something bogus:
      
        $ cat /sys/kernel/debug/kernel_pagetables
        Unable to handle kernel paging request for data at address 0xe8fece0000000000
        Faulting instruction address: 0xc000000000072314
        cpu 0xc: Vector: 380 (Data SLB Access) at [c0000000daa13890]
            pc: c000000000072314: ptdump_show+0x164/0x430
            lr: c000000000072550: ptdump_show+0x3a0/0x430
           dar: e802cf0000000000
        seq_read+0xf8/0x560
        full_proxy_read+0x84/0xc0
        __vfs_read+0x6c/0x1d0
        vfs_read+0xbc/0x1b0
        SyS_read+0x6c/0x110
        system_call+0x38/0xfc
      
      The root cause is that MAX_PGD_INDEX_SIZE isn't actually computed to be
      the max of H_PGD_INDEX_SIZE or RADIX_PGD_INDEX_SIZE. To fix that move
      the calculation into asm-offsets.c where we can do it easily using
      max().
      
      Fixes: f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      03dfee6d
  5. 11 4月, 2017 3 次提交
    • G
      powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1 · 17ed4c8f
      Gautham R. Shenoy 提交于
      POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up
      from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus
      the HSPRG0 of a thread waking up from can contain the paca pointer of
      its sibling.
      
      This patch implements a context recovery framework within threads of a
      core, by provisioning space in paca_struct for saving every sibling
      threads's paca pointers. Basically, we should be able to arrive at the
      right paca pointer from any of the thread's existing paca pointer.
      
      At bootup, during powernv idle-init, we save the paca address of every
      CPU in each one its siblings paca_struct in the slot corresponding to
      this CPU's index in the core.
      
      On wakeup from a stop, the thread will determine its index in the core
      from the TIR register and recover its PACA pointer by indexing into
      the correct slot in the provisioned space in the current PACA.
      
      Furthermore, ensure that the NVGPRs are restored from the stack on the
      way out by setting the NAPSTATELOST in paca.
      
      [Changelog written with inputs from svaidy@linux.vnet.ibm.com]
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Call it a bug]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      17ed4c8f
    • M
      powerpc: Remove unnecessary includes of asm/debug.h · 3ae05fb3
      Michael Ellerman 提交于
      These files don't seem to have any need for asm/debug.h, now that all it
      includes are the debugger hooks and breakpoint definitions.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3ae05fb3
    • M
      powerpc: Create asm/debugfs.h and move powerpc_debugfs_root there · 7644d581
      Michael Ellerman 提交于
      powerpc_debugfs_root is the dentry representing the root of the
      "powerpc" directory tree in debugfs.
      
      Currently it sits in asm/debug.h, a long with some other things that
      have "debug" in the name, but are otherwise unrelated.
      
      Pull it out into a separate header, which also includes linux/debugfs.h,
      and convert all the users to include debugfs.h instead of debug.h.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7644d581
  6. 10 4月, 2017 1 次提交
  7. 07 4月, 2017 1 次提交
    • B
      powerpc/smp: Remove migrate_irq() custom implementation · a978e139
      Benjamin Herrenschmidt 提交于
      Some powerpc platforms use this to move IRQs away from a CPU being
      unplugged. This function has several bugs such as not taking the right
      locks or failing to NULL check pointers.
      
      There's a new generic function doing exactly the same thing without all
      the bugs, so let's use it instead.
      
      mpe: The obvious place for the select of GENERIC_IRQ_MIGRATION is on
      HOTPLUG_CPU, but that doesn't work. On some configs PM_SLEEP_SMP will
      select HOTPLUG_CPU even though its dependencies are not met, which means
      the select of GENERIC_IRQ_MIGRATION doesn't happen. That leads to the
      build breaking. Fix it by moving the select of GENERIC_IRQ_MIGRATION to
      SMP.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a978e139
  8. 06 4月, 2017 1 次提交
  9. 04 4月, 2017 1 次提交
  10. 03 4月, 2017 2 次提交
    • M
      powerpc/book3s: Print task info if we take a machine check in user mode · 63f44d65
      Michael Ellerman 提交于
      For an MCE (Machine Check Exception) that hits while in user mode
      MSR(PR=1), print the task info to the console MCE error log. This may
      help to identify an application that triggered the MCE.
      
      After this patch the MCE console looks like:
      
        Severe Machine check interrupt [Recovered]
          NIP: [0000000010039778] PID: 762 Comm: ebizzy
          Initiator: CPU
          Error type: SLB [Multihit]
            Effective address: 0000000010039778
      
        Severe Machine check interrupt [Not recovered]
          NIP: [0000000010039778] PID: 763 Comm: ebizzy
          Initiator: CPU
          Error type: UE [Page table walk ifetch]
            Effective address: 0000000010039778
        ebizzy[763]: unhandled signal 7 at 0000000010039778 nip 0000000010039778 lr 0000000010001b44 code 30004
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      63f44d65
    • M
      powerpc/book3s: Print the kernel function name in machine check · 5b1d6fc2
      Mahesh Salgaonkar 提交于
      For D-side errors we print the load/store address that caused the
      machine check as 'Effective address'. But the instruction that may have
      caused the machine check can also be helpful, so in addition to printing
      the NIP, also print the kernel function name as well.
      
      After this patch the MCE console log would look like:
      
        Severe Machine check interrupt [Recovered]
          NIP [d00000001bc70194]: init_module+0x194/0x2b0 [bork_kernel]
          Initiator: CPU
          Error type: SLB [Parity]
            Effective address: d000000026de0000
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5b1d6fc2
  11. 01 4月, 2017 3 次提交
    • A
      powerpc/mm: Enable mappings above 128TB · f4ea6dcb
      Aneesh Kumar K.V 提交于
      Not all user space application is ready to handle wide addresses. It's
      known that at least some JIT compilers use higher bits in pointers to
      encode their information. It collides with valid pointers with 512TB
      addresses and leads to crashes.
      
      To mitigate this, we are not going to allocate virtual address space
      above 128TB by default.
      
      But userspace can ask for allocation from full address space by
      specifying hint address (with or without MAP_FIXED) above 128TB.
      
      If hint address set above 128TB, but MAP_FIXED is not specified, we try
      to look for unmapped area by specified address. If it's already
      occupied, we look for unmapped area in *full* address space, rather than
      from 128TB window.
      
      This approach helps to easily make application's memory allocator aware
      about large address space without manually tracking allocated virtual
      address space.
      
      This is going to be a per mmap decision. ie, we can have some mmaps with
      larger addresses and other that do not.
      
      A sample memory layout looks like:
      
        10000000-10010000 r-xp 00000000 fc:00 9057045          /home/max_addr_512TB
        10010000-10020000 r--p 00000000 fc:00 9057045          /home/max_addr_512TB
        10020000-10030000 rw-p 00010000 fc:00 9057045          /home/max_addr_512TB
        10029630000-10029660000 rw-p 00000000 00:00 0          [heap]
        7fff834a0000-7fff834b0000 rw-p 00000000 00:00 0
        7fff834b0000-7fff83670000 r-xp 00000000 fc:00 9177190  /lib/powerpc64le-linux-gnu/libc-2.23.so
        7fff83670000-7fff83680000 r--p 001b0000 fc:00 9177190  /lib/powerpc64le-linux-gnu/libc-2.23.so
        7fff83680000-7fff83690000 rw-p 001c0000 fc:00 9177190  /lib/powerpc64le-linux-gnu/libc-2.23.so
        7fff83690000-7fff836a0000 rw-p 00000000 00:00 0
        7fff836a0000-7fff836c0000 r-xp 00000000 00:00 0        [vdso]
        7fff836c0000-7fff83700000 r-xp 00000000 fc:00 9177193  /lib/powerpc64le-linux-gnu/ld-2.23.so
        7fff83700000-7fff83710000 r--p 00030000 fc:00 9177193  /lib/powerpc64le-linux-gnu/ld-2.23.so
        7fff83710000-7fff83720000 rw-p 00040000 fc:00 9177193  /lib/powerpc64le-linux-gnu/ld-2.23.so
        7fffdccf0000-7fffdcd20000 rw-p 00000000 00:00 0        [stack]
        1000000000000-1000000010000 rw-p 00000000 00:00 0
        1ffff83710000-1ffff83720000 rw-p 00000000 00:00 0
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f4ea6dcb
    • A
      powerpc/mm/hash: Store addr_limit in PACA · bb183221
      Aneesh Kumar K.V 提交于
      We optmize the slice page size array copy to paca by copying only the
      range based on addr_limit. This will require us to not look at page size
      array beyond addr_limit in PACA on slb fault. To enable that copy task
      size to paca which will be used during slb fault.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Rename from task_size to addr_limit, consolidate #ifdefs]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bb183221
    • A
      powerpc/mm: Add addr_limit to mm_context and use it to derive max slice index · 957b778a
      Aneesh Kumar K.V 提交于
      In the followup patch, we will increase the slice array size to handle
      512TB range, but will limit the max addr to 128TB. Avoid doing
      unnecessary computation and avoid doing slice mask related operation
      above address limit.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      957b778a
  12. 31 3月, 2017 2 次提交
  13. 28 3月, 2017 2 次提交
  14. 21 3月, 2017 7 次提交
  15. 20 3月, 2017 3 次提交
  16. 10 3月, 2017 2 次提交
  17. 06 3月, 2017 2 次提交
    • A
      powerpc/64: Avoid panic during boot due to divide by zero in init_cache_info() · 6ba422c7
      Anton Blanchard 提交于
      I see a panic in early boot when building with a recent gcc toolchain.
      The issue is a divide by zero, which is undefined. Older toolchains
      let us get away with it:
      
      int foo(int a) { return a / 0; }
      
      foo:
      	li 9,0
      	divw 3,3,9
      	extsw 3,3
      	blr
      
      But newer ones catch it:
      
      foo:
      	trap
      
      Add a check to avoid the divide by zero.
      
      Fixes: e2827fe5 ("powerpc/64: Clean up ppc64_caches using a struct per cache")
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6ba422c7
    • S
      powerpc: Update to new option-vector-5 format for CAS · 014d02cb
      Suraj Jitindar Singh 提交于
      On POWER9 the ibm,client-architecture-support (CAS) negotiation process
      has been updated to change how the host to guest negotiation is done for
      the new hash/radix mmu as well as the nest mmu, process tables and guest
      translation shootdown (GTSE).
      
      This is documented in the unreleased PAPR ACR "CAS option vector
      additions for P9".
      
      The host tells the guest which options it supports in
      ibm,arch-vec-5-platform-support. The guest then chooses a subset of these
      to request in the CAS call and these are agreed to in the
      ibm,architecture-vec-5 property of the chosen node.
      
      Thus we read ibm,arch-vec-5-platform-support and make our selection before
      calling CAS. We then parse the ibm,architecture-vec-5 property of the
      chosen node to check whether we should run as hash or radix.
      
      ibm,arch-vec-5-platform-support format:
      
      index value pairs: <index, val> ... <index, val>
      
      index: Option vector 5 byte number
      val:   Some representation of supported values
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      [mpe: Don't print about unknown options, be consistent with OV5_FEAT]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      014d02cb