1. 02 3月, 2016 6 次提交
    • C
      powerpc: Add the ability to save Altivec without giving it up · 6f515d84
      Cyril Bur 提交于
      This patch adds the ability to be able to save the VEC registers to the
      thread struct without giving up (disabling the facility) next time the
      process returns to userspace.
      
      This patch builds on a previous optimisation for the FPU registers in the
      thread copy path to avoid a possibly pointless reload of VEC state.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6f515d84
    • C
      powerpc: Add the ability to save FPU without giving it up · 8792468d
      Cyril Bur 提交于
      This patch adds the ability to be able to save the FPU registers to the
      thread struct without giving up (disabling the facility) next time the
      process returns to userspace.
      
      This patch optimises the thread copy path (as a result of a fork() or
      clone()) so that the parent thread can return to userspace with hot
      registers avoiding a possibly pointless reload of FPU register state.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8792468d
    • C
      powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two · de2a20aa
      Cyril Bur 提交于
      This prepares for the decoupling of saving {fpu,altivec,vsx} registers and
      marking {fpu,altivec,vsx} as being unused by a thread.
      
      Currently giveup_{fpu,altivec,vsx}() does both however optimisations to
      task switching can be made if these two operations are decoupled.
      save_all() will permit the saving of registers to thread structs and leave
      threads MSR with bits enabled.
      
      This patch introduces no functional change.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      de2a20aa
    • C
      powerpc: Restore FPU/VEC/VSX if previously used · 70fe3d98
      Cyril Bur 提交于
      Currently the FPU, VEC and VSX facilities are lazily loaded. This is not
      a problem unless a process is using these facilities.
      
      Modern versions of GCC are very good at automatically vectorising code,
      new and modernised workloads make use of floating point and vector
      facilities, even the kernel makes use of vectorised memcpy.
      
      All this combined greatly increases the cost of a syscall since the
      kernel uses the facilities sometimes even in syscall fast-path making it
      increasingly common for a thread to take an *_unavailable exception soon
      after a syscall, not to mention potentially taking all three.
      
      The obvious overcompensation to this problem is to simply always load
      all the facilities on every exit to userspace. Loading up all FPU, VEC
      and VSX registers every time can be expensive and if a workload does
      avoid using them, it should not be forced to incur this penalty.
      
      An 8bit counter is used to detect if the registers have been used in the
      past and the registers are always loaded until the value wraps to back
      to zero.
      
      Several versions of the assembly in entry_64.S were tested:
      
        1. Always calling C.
        2. Performing a common case check and then calling C.
        3. A complex check in asm.
      
      After some benchmarking it was determined that avoiding C in the common
      case is a performance benefit (option 2). The full check in asm (option
      3) greatly complicated that codepath for a negligible performance gain
      and the trade-off was deemed not worth it.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      [mpe: Move load_vec in the struct to fill an existing hole, reword change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      
      fixup
      70fe3d98
    • C
      powerpc: Explicitly disable math features when copying thread · d272f667
      Cyril Bur 提交于
      Currently when threads get scheduled off they always giveup the FPU,
      Altivec (VMX) and Vector (VSX) units if they were using them. When they are
      scheduled back on a fault is then taken to enable each facility and load
      registers. As a result explicitly disabling FPU/VMX/VSX has not been
      necessary.
      
      Future changes and optimisations remove this mandatory giveup and fault
      which could cause calls such as clone() and fork() to copy threads and run
      them later with FPU/VMX/VSX enabled but no registers loaded.
      
      This patch starts the process of having MSR_{FP,VEC,VSX} mean that a
      threads registers are hot while not having MSR_{FP,VEC,VSX} means that the
      registers must be loaded. This allows for a smarter return to userspace.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d272f667
    • D
      powerpc/mm: Split hash page table sizing heuristic into a helper · 5c3c7ede
      David Gibson 提交于
      htab_get_table_size() either retrieve the size of the hash page table (HPT)
      from the device tree - if the HPT size is determined by firmware - or
      uses a heuristic to determine a good size based on RAM size if the kernel
      is responsible for allocating the HPT.
      
      To support a PAPR extension allowing resizing of the HPT, we're going to
      want the memory size -> HPT size logic elsewhere, so split it out into a
      helper function.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5c3c7ede
  2. 01 3月, 2016 8 次提交
  3. 29 2月, 2016 7 次提交
  4. 27 2月, 2016 2 次提交
  5. 24 2月, 2016 5 次提交
  6. 22 2月, 2016 6 次提交
  7. 17 2月, 2016 5 次提交
    • B
      powerpc: atomic: Implement acquire/release/relaxed variants for cmpxchg · 56c08e6d
      Boqun Feng 提交于
      Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on
      which _release variants can be built.
      
      To avoid superfluous barriers in _acquire variants, we implement these
      operations with assembly code rather use __atomic_op_acquire() to build
      them automatically.
      
      For the same reason, we keep the assembly implementation of fully
      ordered cmpxchg operations.
      
      However, we don't do the similar for _release, because that will require
      putting barriers in the middle of ll/sc loops, which is probably a bad
      idea.
      
      Note cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed are not
      compiler barriers.
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      56c08e6d
    • B
      powerpc: atomic: Implement acquire/release/relaxed variants for xchg · 26760fc1
      Boqun Feng 提交于
      Implement xchg{,64}_relaxed and atomic{,64}_xchg_relaxed, based on these
      _relaxed variants, release/acquire variants and fully ordered versions
      can be built.
      
      Note that xchg{,64}_relaxed and atomic_{,64}_xchg_relaxed are not
      compiler barriers.
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      26760fc1
    • B
      powerpc: atomic: Implement atomic{, 64}_*_return_* variants · dc53617c
      Boqun Feng 提交于
      On powerpc, acquire and release semantics can be achieved with
      lightweight barriers("lwsync" and "ctrl+isync"), which can be used to
      implement __atomic_op_{acquire,release}.
      
      For release semantics, since we only need to ensure all memory accesses
      that issue before must take effects before the -store- part of the
      atomics, "lwsync" is what we only need. On the platform without
      "lwsync", "sync" should be used. Therefore in __atomic_op_release() we
      use PPC_RELEASE_BARRIER.
      
      For acquire semantics, "lwsync" is what we only need for the similar
      reason.  However on the platform without "lwsync", we can use "isync"
      rather than "sync" as an acquire barrier. Therefore in
      __atomic_op_acquire() we use PPC_ACQUIRE_BARRIER, which is barrier() on
      UP, "lwsync" if available and "isync" otherwise.
      
      Implement atomic{,64}_{add,sub,inc,dec}_return_relaxed, and build other
      variants with these helpers.
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dc53617c
    • B
      powerpc: Fix kgdb on little endian ppc64le · 94e3d923
      Balbir Singh 提交于
      I spent some time trying to use kgdb and debugged my inability to
      resume from kgdb_handle_breakpoint(). NIP is not incremented
      and that leads to a loop in the debugger.
      
      I've tested this lightly on a virtual instance with KDB enabled.
      After the patch, I am able to get the "go" command to work as
      expected.
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      94e3d923
    • A
      powerpc/ioda: Set "read" permission when "write" is set · 6ecad912
      Alexey Kardashevskiy 提交于
      Quite often drivers set only "write" permission assuming that this
      includes "read" permission as well and this works on plenty of
      platforms. However IODA2 is strict about this and produces an EEH when
      "read" permission is not set and reading happens.
      
      This adds a workaround in the IODA code to always add the "read" bit
      when the "write" bit is set.
      
      Fixes: 10b35b2b ("powerpc/powernv: Do not set "read" flag if direction==DMA_NONE")
      Cc: stable@vger.kernel.org # 4.2+
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Tested-by: NDouglas Miller <dougmill@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6ecad912
  8. 15 2月, 2016 1 次提交
    • A
      powerpc/mm: Fix Multi hit ERAT cause by recent THP update · c777e2a8
      Aneesh Kumar K.V 提交于
      With ppc64 we use the deposited pgtable_t to store the hash pte slot
      information. We should not withdraw the deposited pgtable_t without
      marking the pmd none. This ensure that low level hash fault handling
      will skip this huge pte and we will handle them at upper levels.
      
      Recent change to pmd splitting changed the above in order to handle the
      race between pmd split and exit_mmap. The race is explained below.
      
      Consider following race:
      
      		CPU0				CPU1
      shrink_page_list()
        add_to_swap()
          split_huge_page_to_list()
            __split_huge_pmd_locked()
              pmdp_huge_clear_flush_notify()
      	// pmd_none() == true
      					exit_mmap()
      					  unmap_vmas()
      					    zap_pmd_range()
      					      // no action on pmd since pmd_none() == true
      	pmd_populate()
      
      As result the THP will not be freed. The leak is detected by check_mm():
      
      	BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
      
      The above required us to not mark pmd none during a pmd split.
      
      The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
      level fault handling code skip this pte. At higher level we do take ptl
      lock. That should serialze us against the pmd split. Once the lock is
      acquired we do check the pmd again using pmd_same. That should always
      return false for us and hence we should retry the access. We do the
      pmd_same check in all case after taking plt with
      THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
      huge_pmd_set_accessed)
      
      Also make sure we wait for irq disable section in other cpus to finish
      before flipping a huge pte entry with a regular pmd entry. Code paths
      like find_linux_pte_or_hugepte depend on irq disable to get
      a stable pte_t pointer. A parallel thp split need to make sure we
      don't convert a pmd pte to a regular pmd entry without waiting for the
      irq disable section to finish.
      
      Fixes: eef1b3ba ("thp: implement split_huge_pmd()")
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c777e2a8