1. 11 5月, 2012 3 次提交
    • D
      KEYS: Add invalidation support · fd75815f
      David Howells 提交于
      Add support for invalidating a key - which renders it immediately invisible to
      further searches and causes the garbage collector to immediately wake up,
      remove it from keyrings and then destroy it when it's no longer referenced.
      
      It's better not to do this with keyctl_revoke() as that marks the key to start
      returning -EKEYREVOKED to searches when what is actually desired is to have the
      key refetched.
      
      To invalidate a key the caller must be granted SEARCH permission by the key.
      This may be too strict.  It may be better to also permit invalidation if the
      caller has any of READ, WRITE or SETATTR permission.
      
      The primary use for this is to evict keys that are cached in special keyrings,
      such as the DNS resolver or an ID mapper.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      fd75815f
    • D
      KEYS: Do LRU discard in full keyrings · 31d5a79d
      David Howells 提交于
      Do an LRU discard in keyrings that are full rather than returning ENFILE.  To
      perform this, a time_t is added to the key struct and updated by the creation
      of a link to a key and by a key being found as the result of a search.  At the
      completion of a successful search, the keyrings in the path between the root of
      the search and the first found link to it also have their last-used times
      updated.
      
      Note that discarding a link to a key from a keyring does not necessarily
      destroy the key as there may be references held by other places.
      
      An alternate discard method that might suffice is to perform FIFO discard from
      the keyring, using the spare 2-byte hole in the keylist header as the index of
      the next link to be discarded.
      
      This is useful when using a keyring as a cache for DNS results or foreign
      filesystem IDs.
      
      
      This can be tested by the following.  As root do:
      
      	echo 1000 >/proc/sys/kernel/keys/root_maxkeys
      
      	kr=`keyctl newring foo @s`
      	for ((i=0; i<2000; i++)); do keyctl add user a$i a $kr; done
      
      Without this patch ENFILE should be reported when the keyring fills up.  With
      this patch, the keyring discards keys in an LRU fashion.  Note that the stored
      LRU time has a granularity of 1s.
      
      After doing this, /proc/key-users can be observed and should show that most of
      the 2000 keys have been discarded:
      
      	[root@andromeda ~]# cat /proc/key-users
      	    0:   517 516/516 513/1000 5249/20000
      
      The "513/1000" here is the number of quota-accounted keys present for this user
      out of the maximum permitted.
      
      In /proc/keys, the keyring shows the number of keys it has and the number of
      slots it has allocated:
      
      	[root@andromeda ~]# grep foo /proc/keys
      	200c64c4 I--Q--     1 perm 3b3f0000     0     0 keyring   foo: 509/509
      
      The maximum is (PAGE_SIZE - header) / key pointer size.  That's typically 509
      on a 64-bit system and 1020 on a 32-bit system.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      31d5a79d
    • D
      KEYS: Perform RCU synchronisation on keys prior to key destruction · 65d87fe6
      David Howells 提交于
      Make the keys garbage collector invoke synchronize_rcu() prior to destroying
      keys with a zero usage count.  This means that a key can be examined under the
      RCU read lock in the safe knowledge that it won't get deallocated until after
      the lock is released - even if its usage count becomes zero whilst we're
      looking at it.
      
      This is useful in keyring search vs key link.  Consider a keyring containing a
      link to a key.  That link can be replaced in-place in the keyring without
      requiring an RCU copy-and-replace on the keyring contents without breaking a
      search underway on that keyring when the displaced key is released, provided
      the key is actually destroyed only after the RCU read lock held by the search
      algorithm is released.
      
      This permits __key_link() to replace a key without having to reallocate the key
      payload.  A key gets replaced if a new key being linked into a keyring has the
      same type and description.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      65d87fe6
  2. 30 4月, 2012 1 次提交
    • L
      pipes: add a "packetized pipe" mode for writing · 9883035a
      Linus Torvalds 提交于
      The actual internal pipe implementation is already really about
      individual packets (called "pipe buffers"), and this simply exposes that
      as a special packetized mode.
      
      When we are in the packetized mode (marked by O_DIRECT as suggested by
      Alan Cox), a write() on a pipe will not merge the new data with previous
      writes, so each write will get a pipe buffer of its own.  The pipe
      buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn
      will tell the reader side to break the read at that boundary (and throw
      away any partial packet contents that do not fit in the read buffer).
      
      End result: as long as you do writes less than PIPE_BUF in size (so that
      the pipe doesn't have to split them up), you can now treat the pipe as a
      packet interface, where each read() system call will read one packet at
      a time.  You can just use a sufficiently big read buffer (PIPE_BUF is
      sufficient, since bigger than that doesn't guarantee atomicity anyway),
      and the return value of the read() will naturally give you the size of
      the packet.
      
      NOTE! We do not support zero-sized packets, and zero-sized reads and
      writes to a pipe continue to be no-ops.  Also note that big packets will
      currently be split at write time, but that the size at which that
      happens is not really specified (except that it's bigger than PIPE_BUF).
      Currently that limit is the system page size, but we might want to
      explicitly support bigger packets some day.
      
      The main user for this is going to be the autofs packet interface,
      allowing us to stop having to care so deeply about exact packet sizes
      (which have had bugs with 32/64-bit compatibility modes).  But user
      space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will
      fail with an EINVAL on kernels that do not support this interface.
      Tested-by: NMichael Tokarev <mjt@tls.msk.ru>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: David Miller <davem@davemloft.net>
      Cc: Ian Kent <raven@themaw.net>
      Cc: Thomas Meyer <thomas@m3y3r.de>
      Cc: stable@kernel.org  # needed for systemd/autofs interaction fix
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9883035a
  3. 28 4月, 2012 1 次提交
  4. 27 4月, 2012 1 次提交
  5. 26 4月, 2012 1 次提交
  6. 25 4月, 2012 1 次提交
    • A
      USB: EHCI: fix crash during suspend on ASUS computers · 151b6128
      Alan Stern 提交于
      This patch (as1545) fixes a problem affecting several ASUS computers:
      The machine crashes or corrupts memory when going into suspend if the
      ehci-hcd driver is bound to any controllers.  Users have been forced
      to unbind or unload ehci-hcd before putting their systems to sleep.
      
      After extensive testing, it was determined that the machines don't
      like going into suspend when any EHCI controllers are in the PCI D3
      power state.  Presumably this is a firmware bug, but there's nothing
      we can do about it except to avoid putting the controllers in D3
      during system sleep.
      
      The patch adds a new flag to indicate whether the problem is present,
      and avoids changing the controller's power state if the flag is set.
      Runtime suspend is unaffected; this matters only for system suspend.
      However as a side effect, the controller will not respond to remote
      wakeup requests while the system is asleep.  Hence USB wakeup is not
      functional -- but of course, this is already true in the current state
      of affairs.
      
      This fixes Bugzilla #42728.
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Tested-by: NAndrey Rahmatullin <wrar@wrar.name>
      Tested-by: NOleksij Rempel (fishor) <bug-track@fisher-privat.net>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      151b6128
  7. 23 4月, 2012 3 次提交
  8. 21 4月, 2012 6 次提交
  9. 18 4月, 2012 1 次提交
  10. 17 4月, 2012 3 次提交
  11. 16 4月, 2012 2 次提交
  12. 14 4月, 2012 11 次提交
    • L
      do not export kernel's NULL #define to userspace · 2084c24a
      Lubos Lunak 提交于
      GCC's NULL is actually __null, which allows detecting some questionable
      NULL usage and warn about it.  Moreover each platform/compiler should
      have its own stddef.h anyway (which is different from linux/stddef.h).
      
      So there's no good reason to leak kernel's NULL to userspace and
      override what the compiler provides.
      Signed-off-by: NLuboš Luňák <l.lunak@suse.cz>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2084c24a
    • W
      ptrace,seccomp: Add PTRACE_SECCOMP support · fb0fadf9
      Will Drewry 提交于
      This change adds support for a new ptrace option, PTRACE_O_TRACESECCOMP,
      and a new return value for seccomp BPF programs, SECCOMP_RET_TRACE.
      
      When a tracer specifies the PTRACE_O_TRACESECCOMP ptrace option, the
      tracer will be notified, via PTRACE_EVENT_SECCOMP, for any syscall that
      results in a BPF program returning SECCOMP_RET_TRACE.  The 16-bit
      SECCOMP_RET_DATA mask of the BPF program return value will be passed as
      the ptrace_message and may be retrieved using PTRACE_GETEVENTMSG.
      
      If the subordinate process is not using seccomp filter, then no
      system call notifications will occur even if the option is specified.
      
      If there is no tracer with PTRACE_O_TRACESECCOMP when SECCOMP_RET_TRACE
      is returned, the system call will not be executed and an -ENOSYS errno
      will be returned to userspace.
      
      This change adds a dependency on the system call slow path.  Any future
      efforts to use the system call fast path for seccomp filter will need to
      address this restriction.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: - rebase
           - comment fatal_signal check
           - acked-by
           - drop secure_computing_int comment
      v17: - ...
      v16: - update PT_TRACE_MASK to 0xbf4 so that STOP isn't clear on SETOPTIONS call (indan@nul.nu)
             [note PT_TRACE_MASK disappears in linux-next]
      v15: - add audit support for non-zero return codes
           - clean up style (indan@nul.nu)
      v14: - rebase/nochanges
      v13: - rebase on to 88ebdda6
             (Brings back a change to ptrace.c and the masks.)
      v12: - rebase to linux-next
           - use ptrace_event and update arch/Kconfig to mention slow-path dependency
           - drop all tracehook changes and inclusion (oleg@redhat.com)
      v11: - invert the logic to just make it a PTRACE_SYSCALL accelerator
             (indan@nul.nu)
      v10: - moved to PTRACE_O_SECCOMP / PT_TRACE_SECCOMP
      v9:  - n/a
      v8:  - guarded PTRACE_SECCOMP use with an ifdef
      v7:  - introduced
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      fb0fadf9
    • W
      seccomp: Add SECCOMP_RET_TRAP · bb6ea430
      Will Drewry 提交于
      Adds a new return value to seccomp filters that triggers a SIGSYS to be
      delivered with the new SYS_SECCOMP si_code.
      
      This allows in-process system call emulation, including just specifying
      an errno or cleanly dumping core, rather than just dying.
      Suggested-by: NMarkus Gutschke <markus@chromium.org>
      Suggested-by: NJulien Tinnes <jln@chromium.org>
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: - acked-by, rebase
           - don't mention secure_computing_int() anymore
      v15: - use audit_seccomp/skip
           - pad out error spacing; clean up switch (indan@nul.nu)
      v14: - n/a
      v13: - rebase on to 88ebdda6
      v12: - rebase on to linux-next
      v11: - clarify the comment (indan@nul.nu)
           - s/sigtrap/sigsys
      v10: - use SIGSYS, syscall_get_arch, updates arch/Kconfig
             note suggested-by (though original suggestion had other behaviors)
      v9:  - changes to SIGILL
      v8:  - clean up based on changes to dependent patches
      v7:  - introduction
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      bb6ea430
    • W
      seccomp: add SECCOMP_RET_ERRNO · acf3b2c7
      Will Drewry 提交于
      This change adds the SECCOMP_RET_ERRNO as a valid return value from a
      seccomp filter.  Additionally, it makes the first use of the lower
      16-bits for storing a filter-supplied errno.  16-bits is more than
      enough for the errno-base.h calls.
      
      Returning errors instead of immediately terminating processes that
      violate seccomp policy allow for broader use of this functionality
      for kernel attack surface reduction.  For example, a linux container
      could maintain a whitelist of pre-existing system calls but drop
      all new ones with errnos.  This would keep a logically static attack
      surface while providing errnos that may allow for graceful failure
      without the downside of do_exit() on a bad call.
      
      This change also changes the signature of __secure_computing.  It
      appears the only direct caller is the arm entry code and it clobbers
      any possible return value (register) immediately.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: - fix up comments and rebase
           - fix bad var name which was fixed in later revs
           - remove _int() and just change the __secure_computing signature
      v16-v17: ...
      v15: - use audit_seccomp and add a skip label. (eparis@redhat.com)
           - clean up and pad out return codes (indan@nul.nu)
      v14: - no change/rebase
      v13: - rebase on to 88ebdda6
      v12: - move to WARN_ON if filter is NULL
             (oleg@redhat.com, luto@mit.edu, keescook@chromium.org)
           - return immediately for filter==NULL (keescook@chromium.org)
           - change evaluation to only compare the ACTION so that layered
             errnos don't result in the lowest one being returned.
             (keeschook@chromium.org)
      v11: - check for NULL filter (keescook@chromium.org)
      v10: - change loaders to fn
       v9: - n/a
       v8: - update Kconfig to note new need for syscall_set_return_value.
           - reordered such that TRAP behavior follows on later.
           - made the for loop a little less indent-y
       v7: - introduced
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      acf3b2c7
    • K
      seccomp: remove duplicated failure logging · 3dc1c1b2
      Kees Cook 提交于
      This consolidates the seccomp filter error logging path and adds more
      details to the audit log.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: make compat= permanent in the record
      v15: added a return code to the audit_seccomp path by wad@chromium.org
           (suggested by eparis@redhat.com)
      v*: original by keescook@chromium.org
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      3dc1c1b2
    • W
      seccomp: add system call filtering using BPF · e2cfabdf
      Will Drewry 提交于
      [This patch depends on luto@mit.edu's no_new_privs patch:
         https://lkml.org/lkml/2012/1/30/264
       The whole series including Andrew's patches can be found here:
         https://github.com/redpig/linux/tree/seccomp
       Complete diff here:
         https://github.com/redpig/linux/compare/1dc65fed...seccomp
      ]
      
      This patch adds support for seccomp mode 2.  Mode 2 introduces the
      ability for unprivileged processes to install system call filtering
      policy expressed in terms of a Berkeley Packet Filter (BPF) program.
      This program will be evaluated in the kernel for each system call
      the task makes and computes a result based on data in the format
      of struct seccomp_data.
      
      A filter program may be installed by calling:
        struct sock_fprog fprog = { ... };
        ...
        prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog);
      
      The return value of the filter program determines if the system call is
      allowed to proceed or denied.  If the first filter program installed
      allows prctl(2) calls, then the above call may be made repeatedly
      by a task to further reduce its access to the kernel.  All attached
      programs must be evaluated before a system call will be allowed to
      proceed.
      
      Filter programs will be inherited across fork/clone and execve.
      However, if the task attaching the filter is unprivileged
      (!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task.  This
      ensures that unprivileged tasks cannot attach filters that affect
      privileged tasks (e.g., setuid binary).
      
      There are a number of benefits to this approach. A few of which are
      as follows:
      - BPF has been exposed to userland for a long time
      - BPF optimization (and JIT'ing) are well understood
      - Userland already knows its ABI: system call numbers and desired
        arguments
      - No time-of-check-time-of-use vulnerable data accesses are possible.
      - system call arguments are loaded on access only to minimize copying
        required for system call policy decisions.
      
      Mode 2 support is restricted to architectures that enable
      HAVE_ARCH_SECCOMP_FILTER.  In this patch, the primary dependency is on
      syscall_get_arguments().  The full desired scope of this feature will
      add a few minor additional requirements expressed later in this series.
      Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be
      the desired additional functionality.
      
      No architectures are enabled in this patch.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reviewed-by: NIndan Zupancic <indan@nul.nu>
      Acked-by: NEric Paris <eparis@redhat.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      
      v18: - rebase to v3.4-rc2
           - s/chk/check/ (akpm@linux-foundation.org,jmorris@namei.org)
           - allocate with GFP_KERNEL|__GFP_NOWARN (indan@nul.nu)
           - add a comment for get_u32 regarding endianness (akpm@)
           - fix other typos, style mistakes (akpm@)
           - added acked-by
      v17: - properly guard seccomp filter needed headers (leann@ubuntu.com)
           - tighten return mask to 0x7fff0000
      v16: - no change
      v15: - add a 4 instr penalty when counting a path to account for seccomp_filter
             size (indan@nul.nu)
           - drop the max insns to 256KB (indan@nul.nu)
           - return ENOMEM if the max insns limit has been hit (indan@nul.nu)
           - move IP checks after args (indan@nul.nu)
           - drop !user_filter check (indan@nul.nu)
           - only allow explicit bpf codes (indan@nul.nu)
           - exit_code -> exit_sig
      v14: - put/get_seccomp_filter takes struct task_struct
             (indan@nul.nu,keescook@chromium.org)
           - adds seccomp_chk_filter and drops general bpf_run/chk_filter user
           - add seccomp_bpf_load for use by net/core/filter.c
           - lower max per-process/per-hierarchy: 1MB
           - moved nnp/capability check prior to allocation
             (all of the above: indan@nul.nu)
      v13: - rebase on to 88ebdda6
      v12: - added a maximum instruction count per path (indan@nul.nu,oleg@redhat.com)
           - removed copy_seccomp (keescook@chromium.org,indan@nul.nu)
           - reworded the prctl_set_seccomp comment (indan@nul.nu)
      v11: - reorder struct seccomp_data to allow future args expansion (hpa@zytor.com)
           - style clean up, @compat dropped, compat_sock_fprog32 (indan@nul.nu)
           - do_exit(SIGSYS) (keescook@chromium.org, luto@mit.edu)
           - pare down Kconfig doc reference.
           - extra comment clean up
      v10: - seccomp_data has changed again to be more aesthetically pleasing
             (hpa@zytor.com)
           - calling convention is noted in a new u32 field using syscall_get_arch.
             This allows for cross-calling convention tasks to use seccomp filters.
             (hpa@zytor.com)
           - lots of clean up (thanks, Indan!)
       v9: - n/a
       v8: - use bpf_chk_filter, bpf_run_filter. update load_fns
           - Lots of fixes courtesy of indan@nul.nu:
           -- fix up load behavior, compat fixups, and merge alloc code,
           -- renamed pc and dropped __packed, use bool compat.
           -- Added a hidden CONFIG_SECCOMP_FILTER to synthesize non-arch
              dependencies
       v7:  (massive overhaul thanks to Indan, others)
           - added CONFIG_HAVE_ARCH_SECCOMP_FILTER
           - merged into seccomp.c
           - minimal seccomp_filter.h
           - no config option (part of seccomp)
           - no new prctl
           - doesn't break seccomp on systems without asm/syscall.h
             (works but arg access always fails)
           - dropped seccomp_init_task, extra free functions, ...
           - dropped the no-asm/syscall.h code paths
           - merges with network sk_run_filter and sk_chk_filter
       v6: - fix memory leak on attach compat check failure
           - require no_new_privs || CAP_SYS_ADMIN prior to filter
             installation. (luto@mit.edu)
           - s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com)
           - cleaned up Kconfig (amwang@redhat.com)
           - on block, note if the call was compat (so the # means something)
       v5: - uses syscall_get_arguments
             (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org)
            - uses union-based arg storage with hi/lo struct to
              handle endianness.  Compromises between the two alternate
              proposals to minimize extra arg shuffling and account for
              endianness assuming userspace uses offsetof().
              (mcgrathr@chromium.org, indan@nul.nu)
            - update Kconfig description
            - add include/seccomp_filter.h and add its installation
            - (naive) on-demand syscall argument loading
            - drop seccomp_t (eparis@redhat.com)
       v4:  - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS
            - now uses current->no_new_privs
              (luto@mit.edu,torvalds@linux-foundation.com)
            - assign names to seccomp modes (rdunlap@xenotime.net)
            - fix style issues (rdunlap@xenotime.net)
            - reworded Kconfig entry (rdunlap@xenotime.net)
       v3:  - macros to inline (oleg@redhat.com)
            - init_task behavior fixed (oleg@redhat.com)
            - drop creator entry and extra NULL check (oleg@redhat.com)
            - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com)
            - adds tentative use of "always_unprivileged" as per
              torvalds@linux-foundation.org and luto@mit.edu
       v2:  - (patch 2 only)
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      e2cfabdf
    • W
      seccomp: kill the seccomp_t typedef · 932ecebb
      Will Drewry 提交于
      Replaces the seccomp_t typedef with struct seccomp to match modern
      kernel style.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: rebase
      ...
      v14: rebase/nochanges
      v13: rebase on to 88ebdda6
      v12: rebase on to linux-next
      v8-v11: no changes
      v7: struct seccomp_struct -> struct seccomp
      v6: original inclusion in this series.
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      932ecebb
    • W
      net/compat.c,linux/filter.h: share compat_sock_fprog · 0c5fe1b4
      Will Drewry 提交于
      Any other users of bpf_*_filter that take a struct sock_fprog from
      userspace will need to be able to also accept a compat_sock_fprog
      if the arch supports compat calls.  This change allows the existing
      compat_sock_fprog be shared.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: tasered by the apostrophe police
      v14: rebase/nochanges
      v13: rebase on to 88ebdda6
      v12: rebase on to linux-next
      v11: introduction
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      0c5fe1b4
    • W
      sk_run_filter: add BPF_S_ANC_SECCOMP_LD_W · 46b325c7
      Will Drewry 提交于
      Introduces a new BPF ancillary instruction that all LD calls will be
      mapped through when skb_run_filter() is being used for seccomp BPF.  The
      rewriting will be done using a secondary chk_filter function that is run
      after skb_chk_filter.
      
      The code change is guarded by CONFIG_SECCOMP_FILTER which is added,
      along with the seccomp_bpf_load() function later in this series.
      
      This is based on http://lkml.org/lkml/2012/3/2/141Suggested-by: NIndan Zupancic <indan@nul.nu>
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      
      v18: rebase
      ...
      v15: include seccomp.h explicitly for when seccomp_bpf_load exists.
      v14: First cut using a single additional instruction
      ... v13: made bpf functions generic.
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      46b325c7
    • A
      Add PR_{GET,SET}_NO_NEW_PRIVS to prevent execve from granting privs · 259e5e6c
      Andy Lutomirski 提交于
      With this change, calling
        prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)
      disables privilege granting operations at execve-time.  For example, a
      process will not be able to execute a setuid binary to change their uid
      or gid if this bit is set.  The same is true for file capabilities.
      
      Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that
      LSMs respect the requested behavior.
      
      To determine if the NO_NEW_PRIVS bit is set, a task may call
        prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0);
      It returns 1 if set and 0 if it is not set. If any of the arguments are
      non-zero, it will return -1 and set errno to -EINVAL.
      (PR_SET_NO_NEW_PRIVS behaves similarly.)
      
      This functionality is desired for the proposed seccomp filter patch
      series.  By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the
      system call behavior for itself and its child tasks without being
      able to impact the behavior of a more privileged task.
      
      Another potential use is making certain privileged operations
      unprivileged.  For example, chroot may be considered "safe" if it cannot
      affect privileged tasks.
      
      Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is
      set and AppArmor is in use.  It is fixed in a subsequent patch.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      
      v18: updated change desc
      v17: using new define values as per 3.4
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      259e5e6c
    • M
      skbuff: struct ubuf_info callback type safety · ca8f4fb2
      Michael S. Tsirkin 提交于
      The skb struct ubuf_info callback gets passed struct ubuf_info
      itself, not the arg value as the field name and the function signature
      seem to imply. Rename the arg field to ctx to match usage,
      add documentation and change the callback argument type
      to make usage clear and to have compiler check correctness.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca8f4fb2
  13. 13 4月, 2012 2 次提交
    • M
      ARM: 7366/3: amba: Remove AMBA level regulator support · 1e45860f
      Mark Brown 提交于
      The AMBA bus regulator support is being used to model on/off switches
      for power domains which isn't terribly idiomatic for modern kernels with
      the generic power domain code and creates integration problems on platforms
      which don't use regulators for their power domains as it's hard to tell
      the difference between a regulator that is needed but failed to be provided
      and one that isn't supposed to be there (though DT does make that easier).
      
      Platforms that wish to use the regulator API to manage their power domains
      can indirect via the power domain interface.
      
      This feature is only used with the vape supply of the db8500 PRCMU
      driver which supplies the UARTs and MMC controllers, none of which have
      support for managing vcore at runtime in mainline (only pl022 SPI
      controller does).  Update that supply to have an always_on constraint
      until the power domain support for the system is updated so that it is
      enabled for these users, this is likely to have no impact on practical
      systems as probably at least one of these devices will be active and
      cause AMBA to hold the supply on anyway.
      Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Tested-by: NShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      1e45860f
    • P
      kconfig: fix IS_ENABLED to not require all options to be defined · 69349c2d
      Paul Gortmaker 提交于
      Using IS_ENABLED() within C (vs.  within CPP #if statements) in its
      current form requires us to actually define every possible bool/tristate
      Kconfig option twice (__enabled_* and __enabled_*_MODULE variants).
      
      This results in a huge autoconf.h file, on the order of 16k lines for a
      x86_64 defconfig.
      
      Fixing IS_ENABLED to be able to work on the smaller subset of just
      things that we really have defined is step one to fixing this.  Which
      means it has to not choke when fed non-enabled options, such as:
      
        include/linux/netdevice.h:964:1: warning: "__enabled_CONFIG_FCOE_MODULE" is not defined [-Wundef]
      
      The original prototype of how to implement a C and preprocessor
      compatible way of doing this came from the Google+ user "comex ." in
      response to Linus' crowdsourcing challenge for a possible improvement on
      his earlier C specific solution:
      
      	#define config_enabled(x)       (__stringify(x)[0] == '1')
      
      In this implementation, I've chosen variable names that hopefully make
      how it works more understandable.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69349c2d
  14. 12 4月, 2012 3 次提交
    • M
      fuse: use flexible array in fuse.h · c628ee67
      Miklos Szeredi 提交于
      Use the ISO C standard compliant form instead of the gcc extension in the
      interface definition.
      Reported-by: NShachar Sharon <ssnail@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      c628ee67
    • G
      irq_domain: Move irq_virq_count into NOMAP revmap · 6fa6c8e2
      Grant Likely 提交于
      This patch replaces the old global setting of irq_virq_count that is only
      used by the NOMAP mapping and instead uses a revmap_data property so that
      the maximum NOMAP allocation can be set per NOMAP irq_domain.
      
      There is exactly one user of irq_virq_count in-tree right now: PS3.
      Also, irq_virq_count is only useful for the NOMAP mapping.  So,
      instead of having a single global irq_virq_count values, this change
      drops it entirely and added a max_irq argument to irq_domain_add_nomap().
      That makes it a property of an individual nomap irq domain instead of
      a global system settting.
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      Tested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Milton Miller <miltonm@bga.com>
      6fa6c8e2
    • A
      KVM: unmap pages from the iommu when slots are removed · 32f6daad
      Alex Williamson 提交于
      We've been adding new mappings, but not destroying old mappings.
      This can lead to a page leak as pages are pinned using
      get_user_pages, but only unpinned with put_page if they still
      exist in the memslots list on vm shutdown.  A memslot that is
      destroyed while an iommu domain is enabled for the guest will
      therefore result in an elevated page reference count that is
      never cleared.
      
      Additionally, without this fix, the iommu is only programmed
      with the first translation for a gpa.  This can result in
      peer-to-peer errors if a mapping is destroyed and replaced by a
      new mapping at the same gpa as the iommu will still be pointing
      to the original, pinned memory address.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      32f6daad
  15. 11 4月, 2012 1 次提交
    • E
      tcp: avoid order-1 allocations on wifi and tx path · a21d4572
      Eric Dumazet 提交于
      Marc Merlin reported many order-1 allocations failures in TX path on its
      wireless setup, that dont make any sense with MTU=1500 network, and non
      SG capable hardware.
      
      After investigation, it turns out TCP uses sk_stream_alloc_skb() and
      used as a convention skb_tailroom(skb) to know how many bytes of data
      payload could be put in this skb (for non SG capable devices)
      
      Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER +
      sizeof(struct skb_shared_info) being above 2048)
      
      Later, mac80211 layer need to add some bytes at the tail of skb
      (IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is
      available has to call pskb_expand_head() and request order-1
      allocations.
      
      This patch changes sk_stream_alloc_skb() so that only
      sk->sk_prot->max_header bytes of headroom are reserved, and use a new
      skb field, avail_size to hold the data payload limit.
      
      This way, order-0 allocations done by TCP stack can leave more than 2 KB
      of tailroom and no more allocation is performed in mac80211 layer (or
      any layer needing some tailroom)
      
      avail_size is unioned with mark/dropcount, since mark will be set later
      in IP stack for output packets. Therefore, skb size is unchanged.
      Reported-by: NMarc MERLIN <marc@merlins.org>
      Tested-by: NMarc MERLIN <marc@merlins.org>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a21d4572