1. 17 4月, 2012 1 次提交
  2. 14 4月, 2012 5 次提交
    • W
      ptrace,seccomp: Add PTRACE_SECCOMP support · fb0fadf9
      Will Drewry 提交于
      This change adds support for a new ptrace option, PTRACE_O_TRACESECCOMP,
      and a new return value for seccomp BPF programs, SECCOMP_RET_TRACE.
      
      When a tracer specifies the PTRACE_O_TRACESECCOMP ptrace option, the
      tracer will be notified, via PTRACE_EVENT_SECCOMP, for any syscall that
      results in a BPF program returning SECCOMP_RET_TRACE.  The 16-bit
      SECCOMP_RET_DATA mask of the BPF program return value will be passed as
      the ptrace_message and may be retrieved using PTRACE_GETEVENTMSG.
      
      If the subordinate process is not using seccomp filter, then no
      system call notifications will occur even if the option is specified.
      
      If there is no tracer with PTRACE_O_TRACESECCOMP when SECCOMP_RET_TRACE
      is returned, the system call will not be executed and an -ENOSYS errno
      will be returned to userspace.
      
      This change adds a dependency on the system call slow path.  Any future
      efforts to use the system call fast path for seccomp filter will need to
      address this restriction.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: - rebase
           - comment fatal_signal check
           - acked-by
           - drop secure_computing_int comment
      v17: - ...
      v16: - update PT_TRACE_MASK to 0xbf4 so that STOP isn't clear on SETOPTIONS call (indan@nul.nu)
             [note PT_TRACE_MASK disappears in linux-next]
      v15: - add audit support for non-zero return codes
           - clean up style (indan@nul.nu)
      v14: - rebase/nochanges
      v13: - rebase on to 88ebdda6
             (Brings back a change to ptrace.c and the masks.)
      v12: - rebase to linux-next
           - use ptrace_event and update arch/Kconfig to mention slow-path dependency
           - drop all tracehook changes and inclusion (oleg@redhat.com)
      v11: - invert the logic to just make it a PTRACE_SYSCALL accelerator
             (indan@nul.nu)
      v10: - moved to PTRACE_O_SECCOMP / PT_TRACE_SECCOMP
      v9:  - n/a
      v8:  - guarded PTRACE_SECCOMP use with an ifdef
      v7:  - introduced
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      fb0fadf9
    • W
      seccomp: Add SECCOMP_RET_TRAP · bb6ea430
      Will Drewry 提交于
      Adds a new return value to seccomp filters that triggers a SIGSYS to be
      delivered with the new SYS_SECCOMP si_code.
      
      This allows in-process system call emulation, including just specifying
      an errno or cleanly dumping core, rather than just dying.
      Suggested-by: NMarkus Gutschke <markus@chromium.org>
      Suggested-by: NJulien Tinnes <jln@chromium.org>
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: - acked-by, rebase
           - don't mention secure_computing_int() anymore
      v15: - use audit_seccomp/skip
           - pad out error spacing; clean up switch (indan@nul.nu)
      v14: - n/a
      v13: - rebase on to 88ebdda6
      v12: - rebase on to linux-next
      v11: - clarify the comment (indan@nul.nu)
           - s/sigtrap/sigsys
      v10: - use SIGSYS, syscall_get_arch, updates arch/Kconfig
             note suggested-by (though original suggestion had other behaviors)
      v9:  - changes to SIGILL
      v8:  - clean up based on changes to dependent patches
      v7:  - introduction
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      bb6ea430
    • W
      seccomp: add SECCOMP_RET_ERRNO · acf3b2c7
      Will Drewry 提交于
      This change adds the SECCOMP_RET_ERRNO as a valid return value from a
      seccomp filter.  Additionally, it makes the first use of the lower
      16-bits for storing a filter-supplied errno.  16-bits is more than
      enough for the errno-base.h calls.
      
      Returning errors instead of immediately terminating processes that
      violate seccomp policy allow for broader use of this functionality
      for kernel attack surface reduction.  For example, a linux container
      could maintain a whitelist of pre-existing system calls but drop
      all new ones with errnos.  This would keep a logically static attack
      surface while providing errnos that may allow for graceful failure
      without the downside of do_exit() on a bad call.
      
      This change also changes the signature of __secure_computing.  It
      appears the only direct caller is the arm entry code and it clobbers
      any possible return value (register) immediately.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: - fix up comments and rebase
           - fix bad var name which was fixed in later revs
           - remove _int() and just change the __secure_computing signature
      v16-v17: ...
      v15: - use audit_seccomp and add a skip label. (eparis@redhat.com)
           - clean up and pad out return codes (indan@nul.nu)
      v14: - no change/rebase
      v13: - rebase on to 88ebdda6
      v12: - move to WARN_ON if filter is NULL
             (oleg@redhat.com, luto@mit.edu, keescook@chromium.org)
           - return immediately for filter==NULL (keescook@chromium.org)
           - change evaluation to only compare the ACTION so that layered
             errnos don't result in the lowest one being returned.
             (keeschook@chromium.org)
      v11: - check for NULL filter (keescook@chromium.org)
      v10: - change loaders to fn
       v9: - n/a
       v8: - update Kconfig to note new need for syscall_set_return_value.
           - reordered such that TRAP behavior follows on later.
           - made the for loop a little less indent-y
       v7: - introduced
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      acf3b2c7
    • W
      seccomp: add system call filtering using BPF · e2cfabdf
      Will Drewry 提交于
      [This patch depends on luto@mit.edu's no_new_privs patch:
         https://lkml.org/lkml/2012/1/30/264
       The whole series including Andrew's patches can be found here:
         https://github.com/redpig/linux/tree/seccomp
       Complete diff here:
         https://github.com/redpig/linux/compare/1dc65fed...seccomp
      ]
      
      This patch adds support for seccomp mode 2.  Mode 2 introduces the
      ability for unprivileged processes to install system call filtering
      policy expressed in terms of a Berkeley Packet Filter (BPF) program.
      This program will be evaluated in the kernel for each system call
      the task makes and computes a result based on data in the format
      of struct seccomp_data.
      
      A filter program may be installed by calling:
        struct sock_fprog fprog = { ... };
        ...
        prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog);
      
      The return value of the filter program determines if the system call is
      allowed to proceed or denied.  If the first filter program installed
      allows prctl(2) calls, then the above call may be made repeatedly
      by a task to further reduce its access to the kernel.  All attached
      programs must be evaluated before a system call will be allowed to
      proceed.
      
      Filter programs will be inherited across fork/clone and execve.
      However, if the task attaching the filter is unprivileged
      (!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task.  This
      ensures that unprivileged tasks cannot attach filters that affect
      privileged tasks (e.g., setuid binary).
      
      There are a number of benefits to this approach. A few of which are
      as follows:
      - BPF has been exposed to userland for a long time
      - BPF optimization (and JIT'ing) are well understood
      - Userland already knows its ABI: system call numbers and desired
        arguments
      - No time-of-check-time-of-use vulnerable data accesses are possible.
      - system call arguments are loaded on access only to minimize copying
        required for system call policy decisions.
      
      Mode 2 support is restricted to architectures that enable
      HAVE_ARCH_SECCOMP_FILTER.  In this patch, the primary dependency is on
      syscall_get_arguments().  The full desired scope of this feature will
      add a few minor additional requirements expressed later in this series.
      Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be
      the desired additional functionality.
      
      No architectures are enabled in this patch.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reviewed-by: NIndan Zupancic <indan@nul.nu>
      Acked-by: NEric Paris <eparis@redhat.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      
      v18: - rebase to v3.4-rc2
           - s/chk/check/ (akpm@linux-foundation.org,jmorris@namei.org)
           - allocate with GFP_KERNEL|__GFP_NOWARN (indan@nul.nu)
           - add a comment for get_u32 regarding endianness (akpm@)
           - fix other typos, style mistakes (akpm@)
           - added acked-by
      v17: - properly guard seccomp filter needed headers (leann@ubuntu.com)
           - tighten return mask to 0x7fff0000
      v16: - no change
      v15: - add a 4 instr penalty when counting a path to account for seccomp_filter
             size (indan@nul.nu)
           - drop the max insns to 256KB (indan@nul.nu)
           - return ENOMEM if the max insns limit has been hit (indan@nul.nu)
           - move IP checks after args (indan@nul.nu)
           - drop !user_filter check (indan@nul.nu)
           - only allow explicit bpf codes (indan@nul.nu)
           - exit_code -> exit_sig
      v14: - put/get_seccomp_filter takes struct task_struct
             (indan@nul.nu,keescook@chromium.org)
           - adds seccomp_chk_filter and drops general bpf_run/chk_filter user
           - add seccomp_bpf_load for use by net/core/filter.c
           - lower max per-process/per-hierarchy: 1MB
           - moved nnp/capability check prior to allocation
             (all of the above: indan@nul.nu)
      v13: - rebase on to 88ebdda6
      v12: - added a maximum instruction count per path (indan@nul.nu,oleg@redhat.com)
           - removed copy_seccomp (keescook@chromium.org,indan@nul.nu)
           - reworded the prctl_set_seccomp comment (indan@nul.nu)
      v11: - reorder struct seccomp_data to allow future args expansion (hpa@zytor.com)
           - style clean up, @compat dropped, compat_sock_fprog32 (indan@nul.nu)
           - do_exit(SIGSYS) (keescook@chromium.org, luto@mit.edu)
           - pare down Kconfig doc reference.
           - extra comment clean up
      v10: - seccomp_data has changed again to be more aesthetically pleasing
             (hpa@zytor.com)
           - calling convention is noted in a new u32 field using syscall_get_arch.
             This allows for cross-calling convention tasks to use seccomp filters.
             (hpa@zytor.com)
           - lots of clean up (thanks, Indan!)
       v9: - n/a
       v8: - use bpf_chk_filter, bpf_run_filter. update load_fns
           - Lots of fixes courtesy of indan@nul.nu:
           -- fix up load behavior, compat fixups, and merge alloc code,
           -- renamed pc and dropped __packed, use bool compat.
           -- Added a hidden CONFIG_SECCOMP_FILTER to synthesize non-arch
              dependencies
       v7:  (massive overhaul thanks to Indan, others)
           - added CONFIG_HAVE_ARCH_SECCOMP_FILTER
           - merged into seccomp.c
           - minimal seccomp_filter.h
           - no config option (part of seccomp)
           - no new prctl
           - doesn't break seccomp on systems without asm/syscall.h
             (works but arg access always fails)
           - dropped seccomp_init_task, extra free functions, ...
           - dropped the no-asm/syscall.h code paths
           - merges with network sk_run_filter and sk_chk_filter
       v6: - fix memory leak on attach compat check failure
           - require no_new_privs || CAP_SYS_ADMIN prior to filter
             installation. (luto@mit.edu)
           - s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com)
           - cleaned up Kconfig (amwang@redhat.com)
           - on block, note if the call was compat (so the # means something)
       v5: - uses syscall_get_arguments
             (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org)
            - uses union-based arg storage with hi/lo struct to
              handle endianness.  Compromises between the two alternate
              proposals to minimize extra arg shuffling and account for
              endianness assuming userspace uses offsetof().
              (mcgrathr@chromium.org, indan@nul.nu)
            - update Kconfig description
            - add include/seccomp_filter.h and add its installation
            - (naive) on-demand syscall argument loading
            - drop seccomp_t (eparis@redhat.com)
       v4:  - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS
            - now uses current->no_new_privs
              (luto@mit.edu,torvalds@linux-foundation.com)
            - assign names to seccomp modes (rdunlap@xenotime.net)
            - fix style issues (rdunlap@xenotime.net)
            - reworded Kconfig entry (rdunlap@xenotime.net)
       v3:  - macros to inline (oleg@redhat.com)
            - init_task behavior fixed (oleg@redhat.com)
            - drop creator entry and extra NULL check (oleg@redhat.com)
            - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com)
            - adds tentative use of "always_unprivileged" as per
              torvalds@linux-foundation.org and luto@mit.edu
       v2:  - (patch 2 only)
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      e2cfabdf
    • W
      seccomp: kill the seccomp_t typedef · 932ecebb
      Will Drewry 提交于
      Replaces the seccomp_t typedef with struct seccomp to match modern
      kernel style.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: rebase
      ...
      v14: rebase/nochanges
      v13: rebase on to 88ebdda6
      v12: rebase on to linux-next
      v8-v11: no changes
      v7: struct seccomp_struct -> struct seccomp
      v6: original inclusion in this series.
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      932ecebb
  3. 07 6月, 2011 1 次提交
    • A
      x86-64: Emulate legacy vsyscalls · 5cec93c2
      Andy Lutomirski 提交于
      There's a fair amount of code in the vsyscall page.  It contains
      a syscall instruction (in the gettimeofday fallback) and who
      knows what will happen if an exploit jumps into the middle of
      some other code.
      
      Reduce the risk by replacing the vsyscalls with short magic
      incantations that cause the kernel to emulate the real
      vsyscalls. These incantations are useless if entered in the
      middle.
      
      This causes vsyscalls to be a little more expensive than real
      syscalls.  Fortunately sensible programs don't use them.
      The only exception is time() which is still called by glibc
      through the vsyscall - but calling time() millions of times
      per second is not sensible. glibc has this fixed in the
      development tree.
      
      This patch is not perfect: the vread_tsc and vread_hpet
      functions are still at a fixed address.  Fixing that might
      involve making alternative patching work in the vDSO.
      Signed-off-by: NAndy Lutomirski <luto@mit.edu>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Jesper Juhl <jj@chaosbits.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
      Cc: Mikael Pettersson <mikpe@it.uu.se>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
      Cc: Valdis.Kletnieks@vt.edu
      Cc: pageexec@freemail.hu
      Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu
      [ Removed the CONFIG option - it's simpler to just do it unconditionally. Tidied up the code as well. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5cec93c2
  4. 20 4月, 2009 1 次提交
  5. 17 7月, 2007 2 次提交
  6. 26 4月, 2006 1 次提交
  7. 09 1月, 2006 1 次提交
  8. 28 6月, 2005 1 次提交
    • A
      [PATCH] seccomp: tsc disable · ffaa8bd6
      Andrea Arcangeli 提交于
      I believe at least for seccomp it's worth to turn off the tsc, not just for
      HT but for the L2 cache too.  So it's up to you, either you turn it off
      completely (which isn't very nice IMHO) or I recommend to apply this below
      patch.
      
      This has been tested successfully on x86-64 against current cogito
      repository (i686 compiles so I didn't bother testing ;).  People selling
      the cpu through cpushare may appreciate this bit for a peace of mind.
      
      There's no way to get any timing info anymore with this applied
      (gettimeofday is forbidden of course).  The seccomp environment is
      completely deterministic so it can't be allowed to get timing info, it has
      to be deterministic so in the future I can enable a computing mode that
      does a parallel computing for each task with server side transparent
      checkpointing and verification that the output is the same from all the 2/3
      seller computers for each task, without the buyer even noticing (for now
      the verification is left to the buyer client side and there's no
      checkpointing, since that would require more kernel changes to track the
      dirty bits but it'll be easy to extend once the basic mode is finished).
      
      Eliminating a cold-cache read of the cr4 global variable will save one
      cacheline during the tlb flush while making the code per-cpu-safe at the
      same time.  Thanks to Mikael Pettersson for noticing the tlb flush wasn't
      per-cpu-safe.
      
      The global tlb flush can run from irq (IPI calling do_flush_tlb_all) but
      it'll be transparent to the switch_to code since the IPI won't make any
      change to the cr4 contents from the point of view of the interrupted code
      and since it's now all per-cpu stuff, it will not race.  So no need to
      disable irqs in switch_to slow path.
      Signed-off-by: NAndrea Arcangeli <andrea@cpushare.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ffaa8bd6
  9. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4