1. 14 4月, 2012 15 次提交
    • W
      Documentation: prctl/seccomp_filter · 8ac270d1
      Will Drewry 提交于
      Documents how system call filtering using Berkeley Packet
      Filter programs works and how it may be used.
      Includes an example for x86 and a semi-generic
      example using a macro-based code generator.
      Acked-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      
      v18: - added acked by
           - update no new privs numbers
      v17: - remove @compat note and add Pitfalls section for arch checking
             (keescook@chromium.org)
      v16: -
      v15: -
      v14: - rebase/nochanges
      v13: - rebase on to 88ebdda6
      v12: - comment on the ptrace_event use
           - update arch support comment
           - note the behavior of SECCOMP_RET_DATA when there are multiple filters
             (keescook@chromium.org)
           - lots of samples/ clean up incl 64-bit bpf-direct support
             (markus@chromium.org)
           - rebase to linux-next
      v11: - overhaul return value language, updates (keescook@chromium.org)
           - comment on do_exit(SIGSYS)
      v10: - update for SIGSYS
           - update for new seccomp_data layout
           - update for ptrace option use
      v9: - updated bpf-direct.c for SIGILL
      v8: - add PR_SET_NO_NEW_PRIVS to the samples.
      v7: - updated for all the new stuff in v7: TRAP, TRACE
          - only talk about PR_SET_SECCOMP now
          - fixed bad JLE32 check (coreyb@linux.vnet.ibm.com)
          - adds dropper.c: a simple system call disabler
      v6: - tweak the language to note the requirement of
            PR_SET_NO_NEW_PRIVS being called prior to use. (luto@mit.edu)
      v5: - update sample to use system call arguments
          - adds a "fancy" example using a macro-based generator
          - cleaned up bpf in the sample
          - update docs to mention arguments
          - fix prctl value (eparis@redhat.com)
          - language cleanup (rdunlap@xenotime.net)
      v4: - update for no_new_privs use
          - minor tweaks
      v3: - call out BPF <-> Berkeley Packet Filter (rdunlap@xenotime.net)
          - document use of tentative always-unprivileged
          - guard sample compilation for i386 and x86_64
      v2: - move code to samples (corbet@lwn.net)
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      8ac270d1
    • W
      x86: Enable HAVE_ARCH_SECCOMP_FILTER · c6cfbeb4
      Will Drewry 提交于
      Enable support for seccomp filter on x86:
      - syscall_get_arch()
      - syscall_get_arguments()
      - syscall_rollback()
      - syscall_set_return_value()
      - SIGSYS siginfo_t support
      - secure_computing is called from a ptrace_event()-safe context
      - secure_computing return value is checked (see below).
      
      SECCOMP_RET_TRACE and SECCOMP_RET_TRAP may result in seccomp needing to
      skip a system call without killing the process.  This is done by
      returning a non-zero (-1) value from secure_computing.  This change
      makes x86 respect that return value.
      
      To ensure that minimal kernel code is exposed, a non-zero return value
      results in an immediate return to user space (with an invalid syscall
      number).
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Reviewed-by: NH. Peter Anvin <hpa@zytor.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      
      v18: rebase and tweaked change description, acked-by
      v17: added reviewed by and rebased
      v..: all rebases since original introduction.
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      c6cfbeb4
    • W
      ptrace,seccomp: Add PTRACE_SECCOMP support · fb0fadf9
      Will Drewry 提交于
      This change adds support for a new ptrace option, PTRACE_O_TRACESECCOMP,
      and a new return value for seccomp BPF programs, SECCOMP_RET_TRACE.
      
      When a tracer specifies the PTRACE_O_TRACESECCOMP ptrace option, the
      tracer will be notified, via PTRACE_EVENT_SECCOMP, for any syscall that
      results in a BPF program returning SECCOMP_RET_TRACE.  The 16-bit
      SECCOMP_RET_DATA mask of the BPF program return value will be passed as
      the ptrace_message and may be retrieved using PTRACE_GETEVENTMSG.
      
      If the subordinate process is not using seccomp filter, then no
      system call notifications will occur even if the option is specified.
      
      If there is no tracer with PTRACE_O_TRACESECCOMP when SECCOMP_RET_TRACE
      is returned, the system call will not be executed and an -ENOSYS errno
      will be returned to userspace.
      
      This change adds a dependency on the system call slow path.  Any future
      efforts to use the system call fast path for seccomp filter will need to
      address this restriction.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: - rebase
           - comment fatal_signal check
           - acked-by
           - drop secure_computing_int comment
      v17: - ...
      v16: - update PT_TRACE_MASK to 0xbf4 so that STOP isn't clear on SETOPTIONS call (indan@nul.nu)
             [note PT_TRACE_MASK disappears in linux-next]
      v15: - add audit support for non-zero return codes
           - clean up style (indan@nul.nu)
      v14: - rebase/nochanges
      v13: - rebase on to 88ebdda6
             (Brings back a change to ptrace.c and the masks.)
      v12: - rebase to linux-next
           - use ptrace_event and update arch/Kconfig to mention slow-path dependency
           - drop all tracehook changes and inclusion (oleg@redhat.com)
      v11: - invert the logic to just make it a PTRACE_SYSCALL accelerator
             (indan@nul.nu)
      v10: - moved to PTRACE_O_SECCOMP / PT_TRACE_SECCOMP
      v9:  - n/a
      v8:  - guarded PTRACE_SECCOMP use with an ifdef
      v7:  - introduced
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      fb0fadf9
    • W
      seccomp: Add SECCOMP_RET_TRAP · bb6ea430
      Will Drewry 提交于
      Adds a new return value to seccomp filters that triggers a SIGSYS to be
      delivered with the new SYS_SECCOMP si_code.
      
      This allows in-process system call emulation, including just specifying
      an errno or cleanly dumping core, rather than just dying.
      Suggested-by: NMarkus Gutschke <markus@chromium.org>
      Suggested-by: NJulien Tinnes <jln@chromium.org>
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: - acked-by, rebase
           - don't mention secure_computing_int() anymore
      v15: - use audit_seccomp/skip
           - pad out error spacing; clean up switch (indan@nul.nu)
      v14: - n/a
      v13: - rebase on to 88ebdda6
      v12: - rebase on to linux-next
      v11: - clarify the comment (indan@nul.nu)
           - s/sigtrap/sigsys
      v10: - use SIGSYS, syscall_get_arch, updates arch/Kconfig
             note suggested-by (though original suggestion had other behaviors)
      v9:  - changes to SIGILL
      v8:  - clean up based on changes to dependent patches
      v7:  - introduction
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      bb6ea430
    • W
      signal, x86: add SIGSYS info and make it synchronous. · a0727e8c
      Will Drewry 提交于
      This change enables SIGSYS, defines _sigfields._sigsys, and adds
      x86 (compat) arch support.  _sigsys defines fields which allow
      a signal handler to receive the triggering system call number,
      the relevant AUDIT_ARCH_* value for that number, and the address
      of the callsite.
      
      SIGSYS is added to the SYNCHRONOUS_MASK because it is desirable for it
      to have setup_frame() called for it. The goal is to ensure that
      ucontext_t reflects the machine state from the time-of-syscall and not
      from another signal handler.
      
      The first consumer of SIGSYS would be seccomp filter.  In particular,
      a filter program could specify a new return value, SECCOMP_RET_TRAP,
      which would result in the system call being denied and the calling
      thread signaled.  This also means that implementing arch-specific
      support can be dependent upon HAVE_ARCH_SECCOMP_FILTER.
      Suggested-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reviewed-by: NH. Peter Anvin <hpa@zytor.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: - added acked by, rebase
      v17: - rebase and reviewed-by addition
      v14: - rebase/nochanges
      v13: - rebase on to 88ebdda6
      v12: - reworded changelog (oleg@redhat.com)
      v11: - fix dropped words in the change description
           - added fallback copy_siginfo support.
           - added __ARCH_SIGSYS define to allow stepped arch support.
      v10: - first version based on suggestion
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      a0727e8c
    • W
      seccomp: add SECCOMP_RET_ERRNO · acf3b2c7
      Will Drewry 提交于
      This change adds the SECCOMP_RET_ERRNO as a valid return value from a
      seccomp filter.  Additionally, it makes the first use of the lower
      16-bits for storing a filter-supplied errno.  16-bits is more than
      enough for the errno-base.h calls.
      
      Returning errors instead of immediately terminating processes that
      violate seccomp policy allow for broader use of this functionality
      for kernel attack surface reduction.  For example, a linux container
      could maintain a whitelist of pre-existing system calls but drop
      all new ones with errnos.  This would keep a logically static attack
      surface while providing errnos that may allow for graceful failure
      without the downside of do_exit() on a bad call.
      
      This change also changes the signature of __secure_computing.  It
      appears the only direct caller is the arm entry code and it clobbers
      any possible return value (register) immediately.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: - fix up comments and rebase
           - fix bad var name which was fixed in later revs
           - remove _int() and just change the __secure_computing signature
      v16-v17: ...
      v15: - use audit_seccomp and add a skip label. (eparis@redhat.com)
           - clean up and pad out return codes (indan@nul.nu)
      v14: - no change/rebase
      v13: - rebase on to 88ebdda6
      v12: - move to WARN_ON if filter is NULL
             (oleg@redhat.com, luto@mit.edu, keescook@chromium.org)
           - return immediately for filter==NULL (keescook@chromium.org)
           - change evaluation to only compare the ACTION so that layered
             errnos don't result in the lowest one being returned.
             (keeschook@chromium.org)
      v11: - check for NULL filter (keescook@chromium.org)
      v10: - change loaders to fn
       v9: - n/a
       v8: - update Kconfig to note new need for syscall_set_return_value.
           - reordered such that TRAP behavior follows on later.
           - made the for loop a little less indent-y
       v7: - introduced
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      acf3b2c7
    • K
      seccomp: remove duplicated failure logging · 3dc1c1b2
      Kees Cook 提交于
      This consolidates the seccomp filter error logging path and adds more
      details to the audit log.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: make compat= permanent in the record
      v15: added a return code to the audit_seccomp path by wad@chromium.org
           (suggested by eparis@redhat.com)
      v*: original by keescook@chromium.org
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      3dc1c1b2
    • W
      seccomp: add system call filtering using BPF · e2cfabdf
      Will Drewry 提交于
      [This patch depends on luto@mit.edu's no_new_privs patch:
         https://lkml.org/lkml/2012/1/30/264
       The whole series including Andrew's patches can be found here:
         https://github.com/redpig/linux/tree/seccomp
       Complete diff here:
         https://github.com/redpig/linux/compare/1dc65fed...seccomp
      ]
      
      This patch adds support for seccomp mode 2.  Mode 2 introduces the
      ability for unprivileged processes to install system call filtering
      policy expressed in terms of a Berkeley Packet Filter (BPF) program.
      This program will be evaluated in the kernel for each system call
      the task makes and computes a result based on data in the format
      of struct seccomp_data.
      
      A filter program may be installed by calling:
        struct sock_fprog fprog = { ... };
        ...
        prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog);
      
      The return value of the filter program determines if the system call is
      allowed to proceed or denied.  If the first filter program installed
      allows prctl(2) calls, then the above call may be made repeatedly
      by a task to further reduce its access to the kernel.  All attached
      programs must be evaluated before a system call will be allowed to
      proceed.
      
      Filter programs will be inherited across fork/clone and execve.
      However, if the task attaching the filter is unprivileged
      (!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task.  This
      ensures that unprivileged tasks cannot attach filters that affect
      privileged tasks (e.g., setuid binary).
      
      There are a number of benefits to this approach. A few of which are
      as follows:
      - BPF has been exposed to userland for a long time
      - BPF optimization (and JIT'ing) are well understood
      - Userland already knows its ABI: system call numbers and desired
        arguments
      - No time-of-check-time-of-use vulnerable data accesses are possible.
      - system call arguments are loaded on access only to minimize copying
        required for system call policy decisions.
      
      Mode 2 support is restricted to architectures that enable
      HAVE_ARCH_SECCOMP_FILTER.  In this patch, the primary dependency is on
      syscall_get_arguments().  The full desired scope of this feature will
      add a few minor additional requirements expressed later in this series.
      Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be
      the desired additional functionality.
      
      No architectures are enabled in this patch.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reviewed-by: NIndan Zupancic <indan@nul.nu>
      Acked-by: NEric Paris <eparis@redhat.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      
      v18: - rebase to v3.4-rc2
           - s/chk/check/ (akpm@linux-foundation.org,jmorris@namei.org)
           - allocate with GFP_KERNEL|__GFP_NOWARN (indan@nul.nu)
           - add a comment for get_u32 regarding endianness (akpm@)
           - fix other typos, style mistakes (akpm@)
           - added acked-by
      v17: - properly guard seccomp filter needed headers (leann@ubuntu.com)
           - tighten return mask to 0x7fff0000
      v16: - no change
      v15: - add a 4 instr penalty when counting a path to account for seccomp_filter
             size (indan@nul.nu)
           - drop the max insns to 256KB (indan@nul.nu)
           - return ENOMEM if the max insns limit has been hit (indan@nul.nu)
           - move IP checks after args (indan@nul.nu)
           - drop !user_filter check (indan@nul.nu)
           - only allow explicit bpf codes (indan@nul.nu)
           - exit_code -> exit_sig
      v14: - put/get_seccomp_filter takes struct task_struct
             (indan@nul.nu,keescook@chromium.org)
           - adds seccomp_chk_filter and drops general bpf_run/chk_filter user
           - add seccomp_bpf_load for use by net/core/filter.c
           - lower max per-process/per-hierarchy: 1MB
           - moved nnp/capability check prior to allocation
             (all of the above: indan@nul.nu)
      v13: - rebase on to 88ebdda6
      v12: - added a maximum instruction count per path (indan@nul.nu,oleg@redhat.com)
           - removed copy_seccomp (keescook@chromium.org,indan@nul.nu)
           - reworded the prctl_set_seccomp comment (indan@nul.nu)
      v11: - reorder struct seccomp_data to allow future args expansion (hpa@zytor.com)
           - style clean up, @compat dropped, compat_sock_fprog32 (indan@nul.nu)
           - do_exit(SIGSYS) (keescook@chromium.org, luto@mit.edu)
           - pare down Kconfig doc reference.
           - extra comment clean up
      v10: - seccomp_data has changed again to be more aesthetically pleasing
             (hpa@zytor.com)
           - calling convention is noted in a new u32 field using syscall_get_arch.
             This allows for cross-calling convention tasks to use seccomp filters.
             (hpa@zytor.com)
           - lots of clean up (thanks, Indan!)
       v9: - n/a
       v8: - use bpf_chk_filter, bpf_run_filter. update load_fns
           - Lots of fixes courtesy of indan@nul.nu:
           -- fix up load behavior, compat fixups, and merge alloc code,
           -- renamed pc and dropped __packed, use bool compat.
           -- Added a hidden CONFIG_SECCOMP_FILTER to synthesize non-arch
              dependencies
       v7:  (massive overhaul thanks to Indan, others)
           - added CONFIG_HAVE_ARCH_SECCOMP_FILTER
           - merged into seccomp.c
           - minimal seccomp_filter.h
           - no config option (part of seccomp)
           - no new prctl
           - doesn't break seccomp on systems without asm/syscall.h
             (works but arg access always fails)
           - dropped seccomp_init_task, extra free functions, ...
           - dropped the no-asm/syscall.h code paths
           - merges with network sk_run_filter and sk_chk_filter
       v6: - fix memory leak on attach compat check failure
           - require no_new_privs || CAP_SYS_ADMIN prior to filter
             installation. (luto@mit.edu)
           - s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com)
           - cleaned up Kconfig (amwang@redhat.com)
           - on block, note if the call was compat (so the # means something)
       v5: - uses syscall_get_arguments
             (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org)
            - uses union-based arg storage with hi/lo struct to
              handle endianness.  Compromises between the two alternate
              proposals to minimize extra arg shuffling and account for
              endianness assuming userspace uses offsetof().
              (mcgrathr@chromium.org, indan@nul.nu)
            - update Kconfig description
            - add include/seccomp_filter.h and add its installation
            - (naive) on-demand syscall argument loading
            - drop seccomp_t (eparis@redhat.com)
       v4:  - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS
            - now uses current->no_new_privs
              (luto@mit.edu,torvalds@linux-foundation.com)
            - assign names to seccomp modes (rdunlap@xenotime.net)
            - fix style issues (rdunlap@xenotime.net)
            - reworded Kconfig entry (rdunlap@xenotime.net)
       v3:  - macros to inline (oleg@redhat.com)
            - init_task behavior fixed (oleg@redhat.com)
            - drop creator entry and extra NULL check (oleg@redhat.com)
            - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com)
            - adds tentative use of "always_unprivileged" as per
              torvalds@linux-foundation.org and luto@mit.edu
       v2:  - (patch 2 only)
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      e2cfabdf
    • W
      arch/x86: add syscall_get_arch to syscall.h · b7456536
      Will Drewry 提交于
      Add syscall_get_arch() to export the current AUDIT_ARCH_* based on system call
      entry path.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reviewed-by: NH. Peter Anvin <hpa@zytor.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      
      v18: - update comment about x32 tasks
           - rebase to v3.4-rc2
      v17: rebase and reviewed-by
      v14: rebase/nochanges
      v13: rebase on to 88ebdda6Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      b7456536
    • W
      asm/syscall.h: add syscall_get_arch · 07bd18d0
      Will Drewry 提交于
      Adds a stub for a function that will return the AUDIT_ARCH_* value
      appropriate to the supplied task based on the system call convention.
      
      For audit's use, the value can generally be hard-coded at the
      audit-site.  However, for other functionality not inlined into syscall
      entry/exit, this makes that information available.  seccomp_filter is
      the first planned consumer and, as such, the comment indicates a tie to
      CONFIG_HAVE_ARCH_SECCOMP_FILTER.
      Suggested-by: NRoland McGrath <mcgrathr@chromium.org>
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: comment and change reword and rebase.
      v14: rebase/nochanges
      v13: rebase on to 88ebdda6
      v12: rebase on to linux-next
      v11: fixed improper return type
      v10: introduced
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      07bd18d0
    • W
      seccomp: kill the seccomp_t typedef · 932ecebb
      Will Drewry 提交于
      Replaces the seccomp_t typedef with struct seccomp to match modern
      kernel style.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: rebase
      ...
      v14: rebase/nochanges
      v13: rebase on to 88ebdda6
      v12: rebase on to linux-next
      v8-v11: no changes
      v7: struct seccomp_struct -> struct seccomp
      v6: original inclusion in this series.
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      932ecebb
    • W
      net/compat.c,linux/filter.h: share compat_sock_fprog · 0c5fe1b4
      Will Drewry 提交于
      Any other users of bpf_*_filter that take a struct sock_fprog from
      userspace will need to be able to also accept a compat_sock_fprog
      if the arch supports compat calls.  This change allows the existing
      compat_sock_fprog be shared.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: tasered by the apostrophe police
      v14: rebase/nochanges
      v13: rebase on to 88ebdda6
      v12: rebase on to linux-next
      v11: introduction
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      0c5fe1b4
    • W
      sk_run_filter: add BPF_S_ANC_SECCOMP_LD_W · 46b325c7
      Will Drewry 提交于
      Introduces a new BPF ancillary instruction that all LD calls will be
      mapped through when skb_run_filter() is being used for seccomp BPF.  The
      rewriting will be done using a secondary chk_filter function that is run
      after skb_chk_filter.
      
      The code change is guarded by CONFIG_SECCOMP_FILTER which is added,
      along with the seccomp_bpf_load() function later in this series.
      
      This is based on http://lkml.org/lkml/2012/3/2/141Suggested-by: NIndan Zupancic <indan@nul.nu>
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: rebase
      ...
      v15: include seccomp.h explicitly for when seccomp_bpf_load exists.
      v14: First cut using a single additional instruction
      ... v13: made bpf functions generic.
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      46b325c7
    • J
      Fix execve behavior apparmor for PR_{GET,SET}_NO_NEW_PRIVS · c29bceb3
      John Johansen 提交于
      Add support for AppArmor to explicitly fail requested domain transitions
      if NO_NEW_PRIVS is set and the task is not unconfined.
      
      Transitions from unconfined are still allowed because this always results
      in a reduction of privileges.
      Acked-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Signed-off-by: NJohn Johansen <john.johansen@canonical.com>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      
      v18: new acked-by, new description
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      c29bceb3
    • A
      Add PR_{GET,SET}_NO_NEW_PRIVS to prevent execve from granting privs · 259e5e6c
      Andy Lutomirski 提交于
      With this change, calling
        prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)
      disables privilege granting operations at execve-time.  For example, a
      process will not be able to execute a setuid binary to change their uid
      or gid if this bit is set.  The same is true for file capabilities.
      
      Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that
      LSMs respect the requested behavior.
      
      To determine if the NO_NEW_PRIVS bit is set, a task may call
        prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0);
      It returns 1 if set and 0 if it is not set. If any of the arguments are
      non-zero, it will return -1 and set errno to -EINVAL.
      (PR_SET_NO_NEW_PRIVS behaves similarly.)
      
      This functionality is desired for the proposed seccomp filter patch
      series.  By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the
      system call behavior for itself and its child tasks without being
      able to impact the behavior of a more privileged task.
      
      Another potential use is making certain privileged operations
      unprivileged.  For example, chroot may be considered "safe" if it cannot
      affect privileged tasks.
      
      Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is
      set and AppArmor is in use.  It is fixed in a subsequent patch.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      
      v18: updated change desc
      v17: using new define values as per 3.4
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      259e5e6c
  2. 09 4月, 2012 3 次提交
  3. 08 4月, 2012 5 次提交
  4. 07 4月, 2012 17 次提交
    • L
      Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux · f21fec96
      Linus Torvalds 提交于
      Pull ACPI & Power Management patches from Len Brown:
       "Two fixes for cpuidle merge-window changes, plus a URL fix in
        MAINTAINERS"
      
      * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
        MAINTAINERS: Update git url for ACPI
        cpuidle: Fix panic in CPU off-lining with no idle driver
        ACPI processor: Use safe_halt() rather than halt() in acpi_idle_play_dead()
      f21fec96
    • L
      Merge branch '3.4-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · a0421da4
      Linus Torvalds 提交于
      Pull target fixes from Nicholas Bellinger:
       "Pull two tcm_fc fabric related fixes for -rc2:
      
        Note that both have been CC'ed to stable, and patch #1 is the
        important one that addresses a memory corruption bug related to FC
        exchange timeouts + command abort.
      
        Thanks again to MDR for tracking down this issue!"
      
      * '3.4-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
        tcm_fc: Do not free tpg structure during wq allocation failure
        tcm_fc: Add abort flag for gracefully handling exchange timeout
      a0421da4
    • M
      tcm_fc: Do not free tpg structure during wq allocation failure · 06383f10
      Mark Rustad 提交于
      Avoid freeing a registered tpg structure if an alloc_workqueue call
      fails.  This fixes a bug where the failure was leaking memory associated
      with se_portal_group setup during the original core_tpg_register() call.
      Signed-off-by: NMark Rustad <mark.d.rustad@intel.com>
      Acked-by: NKiran Patil <Kiran.patil@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      06383f10
    • M
      tcm_fc: Add abort flag for gracefully handling exchange timeout · e1c40382
      Mark Rustad 提交于
      Add abort flag and use it to terminate processing when an exchange
      is timed out or is reset. The abort flag is used in place of the
      transport_generic_free_cmd function call in the reset and timeout
      cases, because calling that function in that context would free
      memory that was in use. The aborted flag allows the lifetime to
      be managed in a more normal way, while truncating the processing.
      
      This change eliminates a source of memory corruption which
      manifested in a variety of ugly ways.
      
      (nab: Drop unused struct fc_exch *ep in ft_recv_seq)
      Signed-off-by: NMark Rustad <mark.d.rustad@intel.com>
      Acked-by: NKiran Patil <Kiran.patil@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      e1c40382
    • L
      Merge branches 'idle-fix' and 'misc' into release · eeaab2d8
      Len Brown 提交于
      eeaab2d8
    • I
      MAINTAINERS: Update git url for ACPI · aaef292a
      Igor Murzov 提交于
      Signed-off-by: NIgor Murzov <e-mail@date.by>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      aaef292a
    • L
      Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile · 4157368e
      Linus Torvalds 提交于
      Pull arch/tile bug fixes from Chris Metcalf:
       "This includes Paul Gortmaker's change to fix the <asm/system.h>
        disintegration issues on tile, a fix to unbreak the tilepro ethernet
        driver, and a backlog of bugfix-only changes from internal Tilera
        development over the last few months.
      
        They have all been to LKML and on linux-next for the last few days.
        The EDAC change to MAINTAINERS is an oddity but discussion on the
        linux-edac list suggested I ask you to pull that change through my
        tree since they don't have a tree to pull edac changes from at the
        moment."
      
      * 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (39 commits)
        drivers/net/ethernet/tile: fix netdev_alloc_skb() bombing
        MAINTAINERS: update EDAC information
        tilepro ethernet driver: fix a few minor issues
        tile-srom.c driver: minor code cleanup
        edac: say "TILEGx" not "TILEPro" for the tilegx edac driver
        arch/tile: avoid accidentally unmasking NMI-type interrupt accidentally
        arch/tile: remove bogus performance optimization
        arch/tile: return SIGBUS for addresses that are unaligned AND invalid
        arch/tile: fix finv_buffer_remote() for tilegx
        arch/tile: use atomic exchange in arch_write_unlock()
        arch/tile: stop mentioning the "kvm" subdirectory
        arch/tile: export the page_home() function.
        arch/tile: fix pointer cast in cacheflush.c
        arch/tile: fix single-stepping over swint1 instructions on tilegx
        arch/tile: implement panic_smp_self_stop()
        arch/tile: add "nop" after "nap" to help GX idle power draw
        arch/tile: use proper memparse() for "maxmem" options
        arch/tile: fix up locking in pgtable.c slightly
        arch/tile: don't leak kernel memory when we unload modules
        arch/tile: fix bug in delay_backoff()
        ...
      4157368e
    • L
      Merge tag 'stable/for-linus-3.4-rc1-tag' of... · 9479f0f8
      Linus Torvalds 提交于
      Merge tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
      
      Pull xen fixes from Konrad Rzeszutek Wilk:
       "Two fixes for regressions:
         * one is a workaround that will be removed in v3.5 with proper fix in
           the tip/x86 tree,
         * the other is to fix drivers to load on PV (a previous patch made
           them only load in PVonHVM mode).
      
        The rest are just minor fixes in the various drivers and some cleanup
        in the core code."
      
      * tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
        xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success
        xen/pciback: fix XEN_PCI_OP_enable_msix result
        xen/smp: Remove unnecessary call to smp_processor_id()
        xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'
        xen: only check xen_platform_pci_unplug if hvm
      9479f0f8
    • L
      Merge tag 'mmc-fixes-for-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc · 1ddca057
      Linus Torvalds 提交于
      Pull MMC fixes from Chris Ball:
       - Disable use of MSI in sdhci-pci, which caused multiple chipsets to
         stop working in 3.4-rc1.  I'll wait to turn this on again until we
         have a chipset whitelist for it.
       - Fix a libertas SDIO powered-resume regression introduced in 3.3;
         thanks to Neil Brown and Rafael Wysocki for this fix.
       - Fix module reloading on omap_hsmmc.
       - Stop trusting the spec/card's specified maximum data timeout length,
         and use three seconds instead.  Previously we used 300ms.
      
      Also cleanups and fixes for s3c, atmel, sh_mmcif and omap_hsmmc.
      
      * tag 'mmc-fixes-for-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (28 commits)
        mmc: use really long write timeout to deal with crappy cards
        mmc: sdhci-dove: Fix compile error by including module.h
        mmc: Prevent 1.8V switch for SD hosts that don't support UHS modes.
        Revert "mmc: sdhci-pci: Add MSI support"
        Revert "mmc: sdhci-pci: add quirks for broken MSI on O2Micro controllers"
        mmc: core: fix power class selection
        mmc: omap_hsmmc: fix module re-insertion
        mmc: omap_hsmmc: convert to module_platform_driver
        mmc: omap_hsmmc: make it behave well as a module
        mmc: omap_hsmmc: trivial cleanups
        mmc: omap_hsmmc: context save after enabling runtime pm
        mmc: omap_hsmmc: use runtime put sync in probe error patch
        mmc: sdio: Use empty system suspend/resume callbacks at the bus level
        mmc: bus: print bus speed mode of UHS-I card
        mmc: sdhci-pci: add quirks for broken MSI on O2Micro controllers
        mmc: sh_mmcif: Simplify calculation of mmc->f_min
        mmc: sh_mmcif: mmc->f_max should be half of the bus clock
        mmc: sh_mmcif: double clock speed
        mmc: block: Remove use of mmc_blk_set_blksize
        mmc: atmel-mci: add support for odd clock dividers
        ...
      1ddca057
    • L
      Make the "word-at-a-time" helper functions more commonly usable · f68e556e
      Linus Torvalds 提交于
      I have a new optimized x86 "strncpy_from_user()" that will use these
      same helper functions for all the same reasons the name lookup code uses
      them.  This is preparation for that.
      
      This moves them into an architecture-specific header file.  It's
      architecture-specific for two reasons:
      
       - some of the functions are likely to want architecture-specific
         implementations.  Even if the current code happens to be "generic" in
         the sense that it should work on any little-endian machine, it's
         likely that the "multiply by a big constant and shift" implementation
         is less than optimal for an architecture that has a guaranteed fast
         bit count instruction, for example.
      
       - I expect that if architectures like sparc want to start playing
         around with this, we'll need to abstract out a few more details (in
         particular the actual unaligned accesses).  So we're likely to have
         more architecture-specific stuff if non-x86 architectures start using
         this.
      
         (and if it turns out that non-x86 architectures don't start using
         this, then having it in an architecture-specific header is still the
         right thing to do, of course)
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f68e556e
    • T
      cpuidle: Fix panic in CPU off-lining with no idle driver · ee01e663
      Toshi Kani 提交于
      Fix a NULL pointer dereference panic in cpuidle_play_dead() during
      CPU off-lining when no cpuidle driver is registered.  A cpuidle
      driver may be registered at boot-time based on CPU type.  This patch
      allows an off-lined CPU to enter HLT-based idle in this condition.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@amd.com>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Tested-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      ee01e663
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 23f347ef
      Linus Torvalds 提交于
      Pull networking updates from David Miller:
      
       1) Fix inaccuracies in network driver interface documentation, from Ben
          Hutchings.
      
       2) Fix handling of negative offsets in BPF JITs, from Jan Seiffert.
      
       3) Compile warning, locking, and refcounting fixes in netfilter's
          xt_CT, from Pablo Neira Ayuso.
      
       4) phonet sendmsg needs to validate user length just like any other
          datagram protocol, fix from Sasha Levin.
      
       5) Ipv6 multicast code uses wrong loop index, from RongQing Li.
      
       6) Link handling and firmware fixes in bnx2x driver from Yaniv Rosner
          and Yuval Mintz.
      
       7) mlx4 erroneously allocates 4 pages at a time, regardless of page
          size, fix from Thadeu Lima de Souza Cascardo.
      
       8) SCTP socket option wasn't extended in a backwards compatible way,
          fix from Thomas Graf.
      
       9) Add missing address change event emissions to bonding, from Shlomo
          Pongratz.
      
      10) /proc/net/dev regressed because it uses a private offset to track
          where we are in the hash table, but this doesn't track the offset
          pullback that the seq_file code does resulting in some entries being
          missed in large dumps.
      
          Fix from Eric Dumazet.
      
      11) do_tcp_sendpage() unloads the send queue way too fast, because it
          invokes tcp_push() when it shouldn't.  Let the natural sequence
          generated by the splice paths, and the assosciated MSG_MORE
          settings, guide the tcp_push() calls.
      
          Otherwise what goes out of TCP is spaghetti and doesn't batch
          effectively into GSO/TSO clusters.
      
          From Eric Dumazet.
      
      12) Once we put a SKB into either the netlink receiver's queue or a
          socket error queue, it can be consumed and freed up, therefore we
          cannot touch it after queueing it like that.
      
          Fixes from Eric Dumazet.
      
      13) PPP has this annoying behavior in that for every transmit call it
          immediately stops the TX queue, then calls down into the next layer
          to transmit the PPP frame.
      
          But if that next layer can take it immediately, it just un-stops the
          TX queue right before returning from the transmit method.
      
          Besides being useless work, it makes several facilities unusable, in
          particular things like the equalizers.  Well behaved devices should
          only stop the TX queue when they really are full, and in PPP's case
          when it gets backlogged to the downstream device.
      
          David Woodhouse therefore fixed PPP to not stop the TX queue until
          it's downstream can't take data any more.
      
      14) IFF_UNICAST_FLT got accidently lost in some recent stmmac driver
          changes, re-add.  From Marc Kleine-Budde.
      
      15) Fix link flaps in ixgbe, from Eric W. Multanen.
      
      16) Descriptor writeback fixes in e1000e from Matthew Vick.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
        net: fix a race in sock_queue_err_skb()
        netlink: fix races after skb queueing
        doc, net: Update ndo_start_xmit return type and values
        doc, net: Remove instruction to set net_device::trans_start
        doc, net: Update netdev operation names
        doc, net: Update documentation of synchronisation for TX multiqueue
        doc, net: Remove obsolete reference to dev->poll
        ethtool: Remove exception to the requirement of holding RTNL lock
        MAINTAINERS: update for Marvell Ethernet drivers
        bonding: properly unset current_arp_slave on slave link up
        phonet: Check input from user before allocating
        tcp: tcp_sendpages() should call tcp_push() once
        ipv6: fix array index in ip6_mc_add_src()
        mlx4: allocate just enough pages instead of always 4 pages
        stmmac: re-add IFF_UNICAST_FLT for dwmac1000
        bnx2x: Clear MDC/MDIO warning message
        bnx2x: Fix BCM57711+BCM84823 link issue
        bnx2x: Clear BCM84833 LED after fan failure
        bnx2x: Fix BCM84833 PHY FW version presentation
        bnx2x: Fix link issue for BCM8727 boards.
        ...
      23f347ef
    • J
      xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success · f09d8432
      Jan Beulich 提交于
      The original XenoLinux code has always had things this way, and for
      compatibility reasons (in particular with a subsequent pciback
      adjustment) upstream Linux should behave the same way (allowing for two
      distinct error indications to be returned by the backend).
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      f09d8432
    • J
      xen/pciback: fix XEN_PCI_OP_enable_msix result · 0ee46eca
      Jan Beulich 提交于
      Prior to 2.6.19 and as of 2.6.31, pci_enable_msix() can return a
      positive value to indicate the number of vectors (less than the amount
      requested) that can be set up for a given device. Returning this as an
      operation value (secondary result) is fine, but (primary) operation
      results are expected to be negative (error) or zero (success) according
      to the protocol. With the frontend fixed to match the XenoLinux
      behavior, the backend can now validly return zero (success) here,
      passing the upper limit on the number of vectors in op->value.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      0ee46eca
    • S
      xen/smp: Remove unnecessary call to smp_processor_id() · e8c9e788
      Srivatsa S. Bhat 提交于
      There is an extra and unnecessary call to smp_processor_id()
      in cpu_bringup(). Remove it.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      e8c9e788
    • K
      xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries' · 2531d64b
      Konrad Rzeszutek Wilk 提交于
      The above mentioned patch checks the IOAPIC and if it contains
      -1, then it unmaps said IOAPIC. But under Xen we get this:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
      IP: [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
      PGD 0
      Oops: 0002 [#1] SMP
      CPU 0
      Modules linked in:
      
      Pid: 1, comm: swapper/0 Not tainted 3.2.10-3.fc16.x86_64 #1 Dell Inc. Inspiron
      1525                  /0U990C
      RIP: e030:[<ffffffff8134e51f>]  [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
      RSP: e02b: ffff8800d42cbb70  EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 00000000ffffffef RCX: 0000000000000001
      RDX: 0000000000000040 RSI: 00000000ffffffef RDI: 0000000000000001
      RBP: ffff8800d42cbb80 R08: ffff8800d6400000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffef
      R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000010
      FS:  0000000000000000(0000) GS:ffff8800df5fe000(0000) knlGS:0000000000000000
      CS:  e033 DS: 0000 ES: 0000 CR0:000000008005003b
      CR2: 0000000000000040 CR3: 0000000001a05000 CR4: 0000000000002660
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper/0 (pid: 1, threadinfo ffff8800d42ca000, task ffff8800d42d0000)
      Stack:
       00000000ffffffef 0000000000000010 ffff8800d42cbbe0 ffffffff8134f157
       ffffffff8100a9b2 ffffffff8182ffd1 00000000000000a0 00000000829e7384
       0000000000000002 0000000000000010 00000000ffffffff 0000000000000000
      Call Trace:
       [<ffffffff8134f157>] xen_bind_pirq_gsi_to_irq+0x87/0x230
       [<ffffffff8100a9b2>] ? check_events+0x12+0x20
       [<ffffffff814bab42>] xen_register_pirq+0x82/0xe0
       [<ffffffff814bac1a>] xen_register_gsi.part.2+0x4a/0xd0
       [<ffffffff814bacc0>] acpi_register_gsi_xen+0x20/0x30
       [<ffffffff8103036f>] acpi_register_gsi+0xf/0x20
       [<ffffffff8131abdb>] acpi_pci_irq_enable+0x12e/0x202
       [<ffffffff814bc849>] pcibios_enable_device+0x39/0x40
       [<ffffffff812dc7ab>] do_pci_enable_device+0x4b/0x70
       [<ffffffff812dc878>] __pci_enable_device_flags+0xa8/0xf0
       [<ffffffff812dc8d3>] pci_enable_device+0x13/0x20
      
      The reason we are dying is b/c the call acpi_get_override_irq() is used,
      which returns the polarity and trigger for the IRQs. That function calls
      mp_find_ioapics to get the 'struct ioapic' structure - which along with the
      mp_irq[x] is used to figure out the default values and the polarity/trigger
      overrides. Since the mp_find_ioapics now returns -1 [b/c the IOAPIC is filled
      with 0xffffffff], the acpi_get_override_irq() stops trying to lookup in the
      mp_irq[x] the proper INT_SRV_OVR and we can't install the SCI interrupt.
      
      The proper fix for this is going in v3.5 and adds an x86_io_apic_ops
      struct so that platforms can override it. But for v3.4 lets carry this
      work-around. This patch does that by providing a slightly different variant
      of the fake IOAPIC entries.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2531d64b
    • I
      xen: only check xen_platform_pci_unplug if hvm · e95ae5a4
      Igor Mammedov 提交于
      commit b9136d207f08
        xen: initialize platform-pci even if xen_emul_unplug=never
      
      breaks blkfront/netfront by not loading them because of
      xen_platform_pci_unplug=0 and it is never set for PV guest.
      Signed-off-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      e95ae5a4