1. 06 8月, 2015 7 次提交
  2. 30 7月, 2015 3 次提交
    • M
      selftests/seccomp: Add powerpc support · 5d83c2b3
      Michael Ellerman 提交于
      Wire up the syscall number and regs so the tests work on powerpc.
      
      With the powerpc kernel support just merged, all tests pass on ppc64,
      ppc64 (compat), ppc64le, ppc, ppc64e and ppc64e (compat).
      Acked-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5d83c2b3
    • M
      selftests/seccomp: Make seccomp tests work on big endian · c385d0db
      Michael Ellerman 提交于
      The seccomp_bpf test uses BPF_LD|BPF_W|BPF_ABS to load 32-bit values
      from seccomp_data->args. On big endian machines this will load the high
      word of the argument, which is not what the test wants.
      
      Borrow a hack from samples/seccomp/bpf-helper.h which changes the offset
      on big endian to account for this.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NKees Cook <keescook@chromium.org>
      c385d0db
    • M
      powerpc/kernel: Enable seccomp filter · 2449acc5
      Michael Ellerman 提交于
      This commit enables seccomp filter on powerpc, now that we have all the
      necessary pieces in place.
      
      To support seccomp's desire to modify the syscall return value under
      some circumstances, we use a different ABI to the ptrace ABI. That is we
      use r3 as the syscall return value, and orig_gpr3 is the first syscall
      parameter.
      
      This means the seccomp code, or a ptracer via SECCOMP_RET_TRACE, will
      see -ENOSYS preloaded in r3. This is identical to the behaviour on x86,
      and allows seccomp or the ptracer to either leave the -ENOSYS or change
      it to something else, as well as rejecting or not the syscall by
      modifying r0.
      
      If seccomp does not reject the syscall, we restore the register state to
      match what ptrace and audit expect, ie. r3 is the first syscall
      parameter again. We do this restore using orig_gpr3, which may have been
      modified by seccomp, which allows seccomp to modify the first syscall
      paramater and allow the syscall to proceed.
      
      We need to #ifdef the the additional handling of r3 for seccomp, so move
      it all out of line.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      2449acc5
  3. 29 7月, 2015 8 次提交
    • M
      powerpc/kernel: Add SIG_SYS support for compat tasks · 1b60bab0
      Michael Ellerman 提交于
      SIG_SYS was added in commit a0727e8c "signal, x86: add SIGSYS info
      and make it synchronous."
      
      Because we use the asm-generic struct siginfo, we got support for
      SIG_SYS for free as part of that commit.
      
      However there was no compat handling added for powerpc. That means we've
      been advertising the existence of signfo._sifields._sigsys to compat
      tasks, but not actually filling in the fields correctly.
      
      Luckily it looks like no one has noticed, presumably because the only
      user of SIGSYS in the kernel is seccomp filter, which we don't support
      yet.
      
      So before we enable seccomp filter, add compat handling for SIGSYS.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      1b60bab0
    • M
      powerpc: Change syscall_get_nr() to return int · e9fbe686
      Michael Ellerman 提交于
      The documentation for syscall_get_nr() in asm-generic says:
      
       Note this returns int even on 64-bit machines. Only 32 bits of
       system call number can be meaningful. If the actual arch value
       is 64 bits, this truncates to 32 bits so 0xffffffff means -1.
      
      However our implementation was never updated to reflect this.
      
      Generally it's not important, but there is once case where it matters.
      
      For seccomp filter with SECCOMP_RET_TRACE, the tracer will set
      regs->gpr[0] to -1 to reject the syscall. When the task is a compat
      task, this means we end up with 0xffffffff in r0 because ptrace will
      zero extend the 32-bit value.
      
      If syscall_get_nr() returns an unsigned long, then a 64-bit kernel will
      see a positive value in r0 and will incorrectly allow the syscall
      through seccomp.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      e9fbe686
    • M
      powerpc: Use orig_gpr3 in syscall_get_arguments() · 1cb9839b
      Michael Ellerman 提交于
      Currently syscall_get_arguments() is used by syscall tracepoints, and
      collect_syscall() which is used in some debugging as well as
      /proc/pid/syscall.
      
      The current implementation just copies regs->gpr[3 .. 5] out, which is
      fine for all the current use cases.
      
      When we enable seccomp filter, that will also start using
      syscall_get_arguments(). However for seccomp filter we want to use r3
      as the return value of the syscall, and orig_gpr3 as the first
      parameter. This will allow seccomp to modify the return value in r3.
      
      To support this we need to modify syscall_get_arguments() to return
      orig_gpr3 instead of r3. This is safe for all uses because orig_gpr3
      always contains the r3 value that was passed to the syscall. We store it
      in the syscall entry path and never modify it.
      
      Update syscall_set_arguments() while we're here, even though it's never
      used.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      1cb9839b
    • M
      powerpc: Rework syscall_get_arguments() so there is only one loop · a7657844
      Michael Ellerman 提交于
      Currently syscall_get_arguments() has two loops, one for compat and one
      for regular tasks. In prepartion for the next patch, which changes which
      registers we use, switch it to only have one loop, so we only have one
      place to update.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      a7657844
    • M
      powerpc: Don't negate error in syscall_set_return_value() · 1b1a3702
      Michael Ellerman 提交于
      Currently the only caller of syscall_set_return_value() is seccomp
      filter, which is not enabled on powerpc.
      
      This means we have not noticed that our implementation of
      syscall_set_return_value() negates error, even though the value passed
      in is already negative.
      
      So remove the negation in syscall_set_return_value(), and expect the
      caller to do it like all other implementations do.
      
      Also add a comment about the ccr handling.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      1b1a3702
    • M
      powerpc: Drop unused syscall_get_error() · 2923e6d5
      Michael Ellerman 提交于
      syscall_get_error() is unused, and never has been.
      
      It's also probably wrong, as it negates r3 before returning it, but that
      depends on what the caller is expecting.
      
      It also doesn't deal with compat, and doesn't deal with TIF_NOERROR.
      
      Although we could fix those, until it has a caller and it's clear what
      semantics the caller wants it's just untested code. So drop it.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      2923e6d5
    • M
      powerpc/kernel: Change the do_syscall_trace_enter() API · d3837414
      Michael Ellerman 提交于
      The API for calling do_syscall_trace_enter() is currently sensible
      enough, it just returns the (modified) syscall number.
      
      However once we enable seccomp filter it will get more complicated. When
      seccomp filter runs, the seccomp kernel code (via SECCOMP_RET_ERRNO), or
      a ptracer (via SECCOMP_RET_TRACE), may reject the syscall and *may* or may
      *not* set a return value in r3.
      
      That means the assembler that calls do_syscall_trace_enter() can not
      blindly return ENOSYS, it needs to only return ENOSYS if a return value
      has not already been set.
      
      There is no way to implement that logic with the current API. So change
      the do_syscall_trace_enter() API to make it deal with the return code
      juggling, and the assembler can then just return whatever return code it
      is given.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      d3837414
    • M
      powerpc/kernel: Switch to using MAX_ERRNO · c3525940
      Michael Ellerman 提交于
      Currently on powerpc we have our own #define for the highest (negative)
      errno value, called _LAST_ERRNO. This is defined to be 516, for reasons
      which are not clear.
      
      The generic code, and x86, use MAX_ERRNO, which is defined to be 4095.
      
      In particular seccomp uses MAX_ERRNO to restrict the value that a
      seccomp filter can return.
      
      Currently with the mismatch between _LAST_ERRNO and MAX_ERRNO, a seccomp
      tracer wanting to return 600, expecting it to be seen as an error, would
      instead find on powerpc that userspace sees a successful syscall with a
      return value of 600.
      
      To avoid this inconsistency, switch powerpc to use MAX_ERRNO.
      
      We are somewhat confident that generic syscalls that can return a
      non-error value above negative MAX_ERRNO have already been updated to
      use force_successful_syscall_return().
      
      I have also checked all the powerpc specific syscalls, and believe that
      none of them expect to return a non-error value between -MAX_ERRNO and
      -516. So this change should be safe ...
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      c3525940
  4. 27 7月, 2015 1 次提交
  5. 25 7月, 2015 2 次提交
  6. 23 7月, 2015 3 次提交
    • P
      powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_* · 01c9348c
      Paul Mackerras 提交于
      The hardware RNG on POWER8 and POWER7+ can be relatively slow, since
      it can only supply one 64-bit value per microsecond.  Currently we
      read it in arch_get_random_long(), but that slows down reading from
      /dev/urandom since the code in random.c calls arch_get_random_long()
      for every longword read from /dev/urandom.
      
      Since the hardware RNG supplies high-quality entropy on every read, it
      matches the semantics of arch_get_random_seed_long() better than those
      of arch_get_random_long().  Therefore this commit makes the code use
      the POWER8/7+ hardware RNG only for arch_get_random_seed_{long,int}
      and not for arch_get_random_{long,int}.
      
      This won't affect any other PowerPC-based platforms because none of
      them currently support a hardware RNG.  To make it clear that the
      ppc_md function pointer is used for arch_get_random_seed_*, we rename
      it from get_random_long to get_random_seed.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      01c9348c
    • T
      powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers · 1c2cb594
      Thomas Huth 提交于
      The EPOW interrupt handler uses rtas_get_sensor(), which in turn
      uses rtas_busy_delay() to wait for RTAS becoming ready in case it
      is necessary. But rtas_busy_delay() is annotated with might_sleep()
      and thus may not be used by interrupts handlers like the EPOW handler!
      This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is
      enabled:
      
       BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496
       in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6
       Call Trace:
       [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable)
       [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180
       [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0
       [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0
       [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450
       [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300
       [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0
       [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260
       [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80
       [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200
       [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24
       [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110
       [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180
      
      Fix this issue by introducing a new rtas_get_sensor_fast() function
      that does not use rtas_busy_delay() - and thus can only be used for
      sensors that do not cause a BUSY condition - known as "fast" sensors.
      
      The EPOW sensor is defined to be "fast" in sPAPR - mpe.
      
      Fixes: 587f83e8 ("powerpc/pseries: Use rtas_get_sensor in RAS code")
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1c2cb594
    • T
      powerpc/rtas: Replace magic values with defines · 9ef03193
      Thomas Huth 提交于
      rtas.h already has some nice #defines for RTAS return status
      codes - let's use them instead of hard-coded "magic" values!
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9ef03193
  7. 21 7月, 2015 3 次提交
  8. 16 7月, 2015 5 次提交
  9. 13 7月, 2015 8 次提交