1. 31 1月, 2017 1 次提交
    • P
      KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU · c9270132
      Paul Mackerras 提交于
      This adds two capabilities and two ioctls to allow userspace to
      find out about and configure the POWER9 MMU in a guest.  The two
      capabilities tell userspace whether KVM can support a guest using
      the radix MMU, or using the hashed page table (HPT) MMU with a
      process table and segment tables.  (Note that the MMUs in the
      POWER9 processor cores do not use the process and segment tables
      when in HPT mode, but the nest MMU does).
      
      The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
      whether a guest will use the radix MMU or the HPT MMU, and to
      specify the size and location (in guest space) of the process
      table.
      
      The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
      the radix MMU.  It returns a list of supported radix tree geometries
      (base page size and number of bits indexed at each level of the
      radix tree) and the encoding used to specify the various page
      sizes for the TLB invalidate entry instruction.
      
      Initially, both capabilities return 0 and the ioctls return -EINVAL,
      until the necessary infrastructure for them to operate correctly
      is added.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c9270132
  2. 30 11月, 2016 2 次提交
  3. 24 11月, 2016 1 次提交
    • P
      KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs · e9cf1e08
      Paul Mackerras 提交于
      This adds code to handle two new guest-accessible special-purpose
      registers on POWER9: TIDR (thread ID register) and PSSCR (processor
      stop status and control register).  They are context-switched
      between host and guest, and the guest values can be read and set
      via the one_reg interface.
      
      The PSSCR contains some fields which are guest-accessible and some
      which are only accessible in hypervisor mode.  We only allow the
      guest-accessible fields to be read or set by userspace.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e9cf1e08
  4. 21 11月, 2016 1 次提交
    • P
      KVM: PPC: Book3S HV: Save/restore XER in checkpointed register state · 0d808df0
      Paul Mackerras 提交于
      When switching from/to a guest that has a transaction in progress,
      we need to save/restore the checkpointed register state.  Although
      XER is part of the CPU state that gets checkpointed, the code that
      does this saving and restoring doesn't save/restore XER.
      
      This fixes it by saving and restoring the XER.  To allow userspace
      to read/write the checkpointed XER value, we also add a new ONE_REG
      specifier.
      
      The visible effect of this bug is that the guest may see its XER
      value being corrupted when it uses transactions.
      
      Fixes: e4e38121 ("KVM: PPC: Book3S HV: Add transactional memory support")
      Fixes: 0a8eccef ("KVM: PPC: Book3S HV: Add missing code for transaction reclaim on guest exit")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      0d808df0
  5. 01 8月, 2016 5 次提交
  6. 27 4月, 2016 1 次提交
  7. 21 4月, 2016 1 次提交
  8. 18 4月, 2016 1 次提交
    • A
      powerpc: scan_features() updates incorrect bits for REAL_LE · 6997e57d
      Anton Blanchard 提交于
      The REAL_LE feature entry in the ibm_pa_feature struct is missing an MMU
      feature value, meaning all the remaining elements initialise the wrong
      values.
      
      This means instead of checking for byte 5, bit 0, we check for byte 0,
      bit 0, and then we incorrectly set the CPU feature bit as well as MMU
      feature bit 1 and CPU user feature bits 0 and 2 (5).
      
      Checking byte 0 bit 0 (IBM numbering), means we're looking at the
      "Memory Management Unit (MMU)" feature - ie. does the CPU have an MMU.
      In practice that bit is set on all platforms which have the property.
      
      This means we set CPU_FTR_REAL_LE always. In practice that seems not to
      matter because all the modern cpus which have this property also
      implement REAL_LE, and we've never needed to disable it.
      
      We're also incorrectly setting MMU feature bit 1, which is:
      
        #define MMU_FTR_TYPE_8xx		0x00000002
      
      Luckily the only place that looks for MMU_FTR_TYPE_8xx is in Book3E
      code, which can't run on the same cpus as scan_features(). So this also
      doesn't matter in practice.
      
      Finally in the CPU user feature mask, we're setting bits 0 and 2. Bit 2
      is not currently used, and bit 0 is:
      
        #define PPC_FEATURE_PPC_LE		0x00000001
      
      Which says the CPU supports the old style "PPC Little Endian" mode.
      Again this should be harmless in practice as no 64-bit CPUs implement
      that mode.
      
      Fix the code by adding the missing initialisation of the MMU feature.
      
      Also add a comment marking CPU user feature bit 2 (0x4) as reserved. It
      would be unsafe to start using it as old kernels incorrectly set it.
      
      Fixes: 44ae3ab3 ("powerpc: Free up some CPU feature bits by moving out MMU-related features")
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org
      [mpe: Flesh out changelog, add comment reserving 0x4]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6997e57d
  9. 02 3月, 2016 1 次提交
    • A
      KVM: PPC: Add support for 64bit TCE windows · 58ded420
      Alexey Kardashevskiy 提交于
      The existing KVM_CREATE_SPAPR_TCE only supports 32bit windows which is not
      enough for directly mapped windows as the guest can get more than 4GB.
      
      This adds KVM_CREATE_SPAPR_TCE_64 ioctl and advertises it
      via KVM_CAP_SPAPR_TCE_64 capability. The table size is checked against
      the locked memory limit.
      
      Since 64bit windows are to support Dynamic DMA windows (DDW), let's add
      @bus_offset and @page_shift which are also required by DDW.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      58ded420
  10. 01 3月, 2016 1 次提交
  11. 26 2月, 2016 1 次提交
    • T
      net: Facility to report route quality of connected sockets · a87cb3e4
      Tom Herbert 提交于
      This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
      purpose is to allow an application to give feedback to the kernel about
      the quality of the network path for a connected socket. The value
      argument indicates the type of quality report. For this initial patch
      the only supported advice is a value of 1 which indicates "bad path,
      please reroute"-- the action taken by the kernel is to call
      dst_negative_advice which will attempt to choose a different ECMP route,
      reset the TX hash for flow label and UDP source port in encapsulation,
      etc.
      
      This facility should be useful for connected UDP sockets where only the
      application can provide any feedback about path quality. It could also
      be useful for TCP applications that have additional knowledge about the
      path outside of the normal TCP control loop.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a87cb3e4
  12. 21 1月, 2016 1 次提交
  13. 13 1月, 2016 1 次提交
    • U
      powerpc/module: Handle R_PPC64_ENTRY relocations · a61674bd
      Ulrich Weigand 提交于
      GCC 6 will include changes to generated code with -mcmodel=large,
      which is used to build kernel modules on powerpc64le.  This was
      necessary because the large model is supposed to allow arbitrary
      sizes and locations of the code and data sections, but the ELFv2
      global entry point prolog still made the unconditional assumption
      that the TOC associated with any particular function can be found
      within 2 GB of the function entry point:
      
      func:
      	addis r2,r12,(.TOC.-func)@ha
      	addi  r2,r2,(.TOC.-func)@l
      	.localentry func, .-func
      
      To remove this assumption, GCC will now generate instead this global
      entry point prolog sequence when using -mcmodel=large:
      
      	.quad .TOC.-func
      func:
      	.reloc ., R_PPC64_ENTRY
      	ld    r2, -8(r12)
      	add   r2, r2, r12
      	.localentry func, .-func
      
      The new .reloc triggers an optimization in the linker that will
      replace this new prolog with the original code (see above) if the
      linker determines that the distance between .TOC. and func is in
      range after all.
      
      Since this new relocation is now present in module object files,
      the kernel module loader is required to handle them too.  This
      patch adds support for the new relocation and implements the
      same optimization done by the GNU linker.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NUlrich Weigand <ulrich.weigand@de.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a61674bd
  14. 11 1月, 2016 1 次提交
  15. 05 1月, 2016 1 次提交
  16. 16 12月, 2015 1 次提交
    • M
      Partial revert of "powerpc: Individual System V IPC system calls" · 2475c362
      Michael Ellerman 提交于
      This partially reverts commit a3423615.
      
      While reviewing the glibc patch to exploit the individual IPC calls,
      Arnd & Andreas noticed that we were still requiring userspace to pass
      IPC_64 in order to get the new style IPC API.
      
      With a bit of cleanup in the kernel we can drop that requirement, and
      instead only provide the new style API, which will simplify things for
      userspace.
      
      Rather than try and sneak that patch into 4.4, instead we will drop the
      individual IPC calls for powerpc, and merge them again in 4.5 once the
      cleanup patch has gone in.
      
      Because we've already added sys_mlock2() as syscall #378, we don't do a
      full revert of the IPC calls. Instead we drop the __NR #defines, and
      send those now undefined syscall numbers to sys_ni_syscall(). This
      leaves a gap in the syscall numbers, but we'll reuse them when we merge
      the individual IPC calls.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      2475c362
  17. 16 11月, 2015 1 次提交
  18. 06 11月, 2015 1 次提交
    • E
      mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage · b0f205c2
      Eric B Munson 提交于
      The previous patch introduced a flag that specified pages in a VMA should
      be placed on the unevictable LRU, but they should not be made present when
      the area is created.  This patch adds the ability to set this state via
      the new mlock system calls.
      
      We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall.
      MLOCK_ONFAULT will set the VM_LOCKONFAULT modifier for VM_LOCKED.
      MCL_ONFAULT should be used as a modifier to the two other mlockall flags.
      When used with MCL_CURRENT, all current mappings will be marked with
      VM_LOCKED | VM_LOCKONFAULT.  When used with MCL_FUTURE, the mm->def_flags
      will be marked with VM_LOCKED | VM_LOCKONFAULT.  When used with both
      MCL_CURRENT and MCL_FUTURE, all current mappings and mm->def_flags will be
      marked with VM_LOCKED | VM_LOCKONFAULT.
      
      Prior to this patch, mlockall() will unconditionally clear the
      mm->def_flags any time it is called without MCL_FUTURE.  This behavior is
      maintained after adding MCL_ONFAULT.  If a call to mlockall(MCL_FUTURE) is
      followed by mlockall(MCL_CURRENT), the mm->def_flags will be cleared and
      new VMAs will be unlocked.  This remains true with or without MCL_ONFAULT
      in either mlockall() invocation.
      
      munlock() will unconditionally clear both vma flags.  munlockall()
      unconditionally clears for VMA flags on all VMAs and in the mm->def_flags
      field.
      Signed-off-by: NEric B Munson <emunson@akamai.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b0f205c2
  19. 15 10月, 2015 1 次提交
  20. 21 9月, 2015 1 次提交
  21. 09 9月, 2015 1 次提交
  22. 18 8月, 2015 1 次提交
  23. 29 7月, 2015 1 次提交
    • M
      powerpc/kernel: Switch to using MAX_ERRNO · c3525940
      Michael Ellerman 提交于
      Currently on powerpc we have our own #define for the highest (negative)
      errno value, called _LAST_ERRNO. This is defined to be 516, for reasons
      which are not clear.
      
      The generic code, and x86, use MAX_ERRNO, which is defined to be 4095.
      
      In particular seccomp uses MAX_ERRNO to restrict the value that a
      seccomp filter can return.
      
      Currently with the mismatch between _LAST_ERRNO and MAX_ERRNO, a seccomp
      tracer wanting to return 600, expecting it to be seen as an error, would
      instead find on powerpc that userspace sees a successful syscall with a
      return value of 600.
      
      To avoid this inconsistency, switch powerpc to use MAX_ERRNO.
      
      We are somewhat confident that generic syscalls that can return a
      non-error value above negative MAX_ERRNO have already been updated to
      use force_successful_syscall_return().
      
      I have also checked all the powerpc specific syscalls, and believe that
      none of them expect to return a non-error value between -MAX_ERRNO and
      -516. So this change should be safe ...
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      c3525940
  24. 16 7月, 2015 1 次提交
  25. 19 6月, 2015 1 次提交
    • S
      powerpc/tm: Abort syscalls in active transactions · b4b56f9e
      Sam bobroff 提交于
      This patch changes the syscall handler to doom (tabort) active
      transactions when a syscall is made and return very early without
      performing the syscall and keeping side effects to a minimum (no CPU
      accounting or system call tracing is performed). Also included is a
      new HWCAP2 bit, PPC_FEATURE2_HTM_NOSC, to indicate this
      behaviour to userspace.
      
      Currently, the system call instruction automatically suspends an
      active transaction which causes side effects to persist when an active
      transaction fails.
      
      This does change the kernel's behaviour, but in a way that was
      documented as unsupported.  It doesn't reduce functionality as
      syscalls will still be performed after tsuspend; it just requires that
      the transaction be explicitly suspended.  It also provides a
      consistent interface and makes the behaviour of user code
      substantially the same across powerpc and platforms that do not
      support suspended transactions (e.g. x86 and s390).
      
      Performance measurements using
      http://ozlabs.org/~anton/junkcode/null_syscall.c indicate the cost of
      a normal (non-aborted) system call increases by about 0.25%.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b4b56f9e
  26. 18 6月, 2015 1 次提交
  27. 05 6月, 2015 1 次提交
  28. 12 5月, 2015 2 次提交
  29. 30 4月, 2015 1 次提交
    • M
      Revert "powerpc/tm: Abort syscalls in active transactions" · 68fc378c
      Michael Ellerman 提交于
      This reverts commit feba4036.
      
      Although the principle of this change is good, the implementation has a
      few issues.
      
      Firstly we can sometimes fail to abort a syscall because r12 may have
      been clobbered by C code if we went down the virtual CPU accounting
      path, or if syscall tracing was enabled.
      
      Secondly we have decided that it is safer to abort the syscall even
      earlier in the syscall entry path, so that we avoid the syscall tracing
      path when we are transactional.
      
      So that we have time to thoroughly test those changes we have decided to
      revert this for this merge window and will merge the fixed version in
      the next window.
      
      NB. Rather than reverting the selftest we just drop tm-syscall from
      TEST_PROGS so that it's not run by default.
      
      Fixes: feba4036 ("powerpc/tm: Abort syscalls in active transactions")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      68fc378c
  30. 17 4月, 2015 1 次提交
  31. 11 4月, 2015 1 次提交
    • S
      powerpc/tm: Abort syscalls in active transactions · feba4036
      Sam bobroff 提交于
      This patch changes the syscall handler to doom (tabort) active
      transactions when a syscall is made and return immediately without
      performing the syscall.
      
      Currently, the system call instruction automatically suspends an
      active transaction which causes side effects to persist when an active
      transaction fails.
      
      This does change the kernel's behaviour, but in a way that was
      documented as unsupported. It doesn't reduce functionality because
      syscalls will still be performed after tsuspend. It also provides a
      consistent interface and makes the behaviour of user code
      substantially the same across powerpc and platforms that do not
      support suspended transactions (e.g. x86 and s390).
      
      Performance measurements using
      http://ozlabs.org/~anton/junkcode/null_syscall.c
      indicate the cost of a system call increases by about 0.5%.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Acked-By: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      feba4036
  32. 28 3月, 2015 1 次提交
    • M
      powerpc: Add a proper syscall for switching endianness · 529d235a
      Michael Ellerman 提交于
      We currently have a "special" syscall for switching endianness. This is
      syscall number 0x1ebe, which is handled explicitly in the 64-bit syscall
      exception entry.
      
      That has a few problems, firstly the syscall number is outside of the
      usual range, which confuses various tools. For example strace doesn't
      recognise the syscall at all.
      
      Secondly it's handled explicitly as a special case in the syscall
      exception entry, which is complicated enough without it.
      
      As a first step toward removing the special syscall, we need to add a
      regular syscall that implements the same functionality.
      
      The logic is simple, it simply toggles the MSR_LE bit in the userspace
      MSR. This is the same as the special syscall, with the caveat that the
      special syscall clobbers fewer registers.
      
      This version clobbers r9-r12, XER, CTR, and CR0-1,5-7.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      529d235a
  33. 16 3月, 2015 1 次提交
  34. 29 12月, 2014 1 次提交
    • P
      powerpc: Wire up sys_execveat() syscall · 1e5d0fdb
      Pranith Kumar 提交于
      Wire up sys_execveat(). This passes the selftests for the system call.
      
      Check success of execveat(3, '../execveat', 0)... [OK]
      Check success of execveat(5, 'execveat', 0)... [OK]
      Check success of execveat(6, 'execveat', 0)... [OK]
      Check success of execveat(-100, '/home/pranith/linux/...ftests/exec/execveat', 0)... [OK]
      Check success of execveat(99, '/home/pranith/linux/...ftests/exec/execveat', 0)... [OK]
      Check success of execveat(8, '', 4096)... [OK]
      Check success of execveat(17, '', 4096)... [OK]
      Check success of execveat(9, '', 4096)... [OK]
      Check success of execveat(14, '', 4096)... [OK]
      Check success of execveat(14, '', 4096)... [OK]
      Check success of execveat(15, '', 4096)... [OK]
      Check failure of execveat(8, '', 0) with ENOENT... [OK]
      Check failure of execveat(8, '(null)', 4096) with EFAULT... [OK]
      Check success of execveat(5, 'execveat.symlink', 0)... [OK]
      Check success of execveat(6, 'execveat.symlink', 0)... [OK]
      Check success of execveat(-100, '/home/pranith/linux/...xec/execveat.symlink', 0)... [OK]
      Check success of execveat(10, '', 4096)... [OK]
      Check success of execveat(10, '', 4352)... [OK]
      Check failure of execveat(5, 'execveat.symlink', 256) with ELOOP... [OK]
      Check failure of execveat(6, 'execveat.symlink', 256) with ELOOP... [OK]
      Check failure of execveat(-100, '/home/pranith/linux/tools/testing/selftests/exec/execveat.symlink', 256) with ELOOP... [OK]
      Check success of execveat(3, '../script', 0)... [OK]
      Check success of execveat(5, 'script', 0)... [OK]
      Check success of execveat(6, 'script', 0)... [OK]
      Check success of execveat(-100, '/home/pranith/linux/...elftests/exec/script', 0)... [OK]
      Check success of execveat(13, '', 4096)... [OK]
      Check success of execveat(13, '', 4352)... [OK]
      Check failure of execveat(18, '', 4096) with ENOENT... [OK]
      Check failure of execveat(7, 'script', 0) with ENOENT... [OK]
      Check success of execveat(16, '', 4096)... [OK]
      Check success of execveat(16, '', 4096)... [OK]
      Check success of execveat(4, '../script', 0)... [OK]
      Check success of execveat(4, 'script', 0)... [OK]
      Check success of execveat(4, '../script', 0)... [OK]
      Check failure of execveat(4, 'script', 0) with ENOENT... [OK]
      Check failure of execveat(5, 'execveat', 65535) with EINVAL... [OK]
      Check failure of execveat(5, 'no-such-file', 0) with ENOENT... [OK]
      Check failure of execveat(6, 'no-such-file', 0) with ENOENT... [OK]
      Check failure of execveat(-100, 'no-such-file', 0) with ENOENT... [OK]
      Check failure of execveat(5, '', 4096) with EACCES... [OK]
      Check failure of execveat(5, 'Makefile', 0) with EACCES... [OK]
      Check failure of execveat(11, '', 4096) with EACCES... [OK]
      Check failure of execveat(12, '', 4096) with EACCES... [OK]
      Check failure of execveat(99, '', 4096) with EBADF... [OK]
      Check failure of execveat(99, 'execveat', 0) with EBADF... [OK]
      Check failure of execveat(8, 'execveat', 0) with ENOTDIR... [OK]
      Invoke copy of 'execveat' via filename of length 4093:
      Check success of execveat(19, '', 4096)... [OK]
      Check success of execveat(5, 'xxxxxxxxxxxxxxxxxxxx...yyyyyyyyyyyyyyyyyyyy', 0)... [OK]
      Invoke copy of 'script' via filename of length 4093:
      Check success of execveat(20, '', 4096)... [OK]
      /bin/sh: 0: Can't open /dev/fd/5/xxxxxxx(... a long line of x's and y's, 0)... [OK]
      Check success of execveat(5, 'xxxxxxxxxxxxxxxxxxxx...yyyyyyyyyyyyyyyyyyyy', 0)... [OK]
      
      Tested on a 32-bit powerpc system.
      Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1e5d0fdb