1. 01 7月, 2017 1 次提交
    • K
      randstruct: Mark various structs for randomization · 3859a271
      Kees Cook 提交于
      This marks many critical kernel structures for randomization. These are
      structures that have been targeted in the past in security exploits, or
      contain functions pointers, pointers to function pointer tables, lists,
      workqueues, ref-counters, credentials, permissions, or are otherwise
      sensitive. This initial list was extracted from Brad Spengler/PaX Team's
      code in the last public patch of grsecurity/PaX based on my understanding
      of the code. Changes or omissions from the original code are mine and
      don't reflect the original grsecurity/PaX code.
      
      Left out of this list is task_struct, which requires special handling
      and will be covered in a subsequent patch.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      3859a271
  2. 05 9月, 2016 1 次提交
  3. 08 2月, 2016 1 次提交
  4. 24 11月, 2015 1 次提交
    • R
      unix: avoid use-after-free in ep_remove_wait_queue · 7d267278
      Rainer Weikusat 提交于
      Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
      An AF_UNIX datagram socket being the client in an n:1 association with
      some server socket is only allowed to send messages to the server if the
      receive queue of this socket contains at most sk_max_ack_backlog
      datagrams. This implies that prospective writers might be forced to go
      to sleep despite none of the message presently enqueued on the server
      receive queue were sent by them. In order to ensure that these will be
      woken up once space becomes again available, the present unix_dgram_poll
      routine does a second sock_poll_wait call with the peer_wait wait queue
      of the server socket as queue argument (unix_dgram_recvmsg does a wake
      up on this queue after a datagram was received). This is inherently
      problematic because the server socket is only guaranteed to remain alive
      for as long as the client still holds a reference to it. In case the
      connection is dissolved via connect or by the dead peer detection logic
      in unix_dgram_sendmsg, the server socket may be freed despite "the
      polling mechanism" (in particular, epoll) still has a pointer to the
      corresponding peer_wait queue. There's no way to forcibly deregister a
      wait queue with epoll.
      
      Based on an idea by Jason Baron, the patch below changes the code such
      that a wait_queue_t belonging to the client socket is enqueued on the
      peer_wait queue of the server whenever the peer receive queue full
      condition is detected by either a sendmsg or a poll. A wake up on the
      peer queue is then relayed to the ordinary wait queue of the client
      socket via wake function. The connection to the peer wait queue is again
      dissolved if either a wake up is about to be relayed or the client
      socket reconnects or a dead peer is detected or the client socket is
      itself closed. This enables removing the second sock_poll_wait from
      unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
      that no blocked writer sleeps forever.
      Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
      Fixes: ec0d215f ("af_unix: fix 'poll for write'/connected DGRAM sockets")
      Reviewed-by: NJason Baron <jbaron@akamai.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d267278
  5. 08 10月, 2015 1 次提交
  6. 30 9月, 2015 1 次提交
  7. 11 6月, 2015 1 次提交
  8. 10 8月, 2013 1 次提交
    • E
      af_unix: improve STREAM behavior with fragmented memory · e370a723
      Eric Dumazet 提交于
      unix_stream_sendmsg() currently uses order-2 allocations,
      and we had numerous reports this can fail.
      
      The __GFP_REPEAT flag present in sock_alloc_send_pskb() is
      not helping.
      
      This patch extends the work done in commit eb6a2481
      ("af_unix: reduce high order page allocations) for
      datagram sockets.
      
      This opens the possibility of zero copy IO (splice() and
      friends)
      
      The trick is to not use skb_pull() anymore in recvmsg() path,
      and instead add a @consumed field in UNIXCB() to track amount
      of already read payload in the skb.
      
      There is a performance regression for large sends
      because of extra page allocations that will be addressed
      in a follow-up patch, allowing sock_alloc_send_pskb()
      to attempt high order page allocations.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e370a723
  9. 01 8月, 2013 1 次提交
  10. 02 5月, 2013 1 次提交
    • E
      af_unix: fix a fatal race with bit fields · 60bc851a
      Eric Dumazet 提交于
      Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
      uses 64bit instructions to manipulate them.
      If the 64bit word includes any atomic_t or spinlock_t, we can lose
      critical concurrent changes.
      
      This is happening in af_unix, where unix_sk(sk)->gc_candidate/
      gc_maybe_cycle/lock share the same 64bit word.
      
      This leads to fatal deadlock, as one/several cpus spin forever
      on a spinlock that will never be available again.
      
      A safer way would be to use a long to store flags.
      This way we are sure compiler/arch wont do bad things.
      
      As we own unix_gc_lock spinlock when clearing or setting bits,
      we can use the non atomic __set_bit()/__clear_bit().
      
      recursion_level can share the same 64bit location with the spinlock,
      as it is set only with this spinlock held.
      
      [1] bug fixed in gcc-4.8.0 :
      http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080Reported-by: NAmbrose Feinstein <ambrose@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60bc851a
  11. 08 4月, 2013 1 次提交
  12. 22 10月, 2012 1 次提交
  13. 09 6月, 2012 1 次提交
    • E
      af_unix: speedup /proc/net/unix · 7123aaa3
      Eric Dumazet 提交于
      /proc/net/unix has quadratic behavior, and can hold unix_table_lock for
      a while if high number of unix sockets are alive. (90 ms for 200k
      sockets...)
      
      We already have a hash table, so its quite easy to use it.
      
      Problem is unbound sockets are still hashed in a single hash slot
      (unix_socket_table[UNIX_HASH_TABLE])
      
      This patch also spreads unbound sockets to 256 hash slots, to speedup
      both /proc/net/unix and unix_diag.
      
      Time to read /proc/net/unix with 200k unix sockets :
      (time dd if=/proc/net/unix of=/dev/null bs=4k)
      
      before : 520 secs
      after : 2 secs
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7123aaa3
  14. 16 4月, 2012 1 次提交
  15. 21 3月, 2012 1 次提交
  16. 31 12月, 2011 1 次提交
  17. 17 12月, 2011 1 次提交
  18. 25 4月, 2011 1 次提交
  19. 30 11月, 2010 1 次提交
  20. 17 6月, 2010 1 次提交
  21. 02 5月, 2010 1 次提交
    • E
      net: sock_def_readable() and friends RCU conversion · 43815482
      Eric Dumazet 提交于
      sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
      need two atomic operations (and associated dirtying) per incoming
      packet.
      
      RCU conversion is pretty much needed :
      
      1) Add a new structure, called "struct socket_wq" to hold all fields
      that will need rcu_read_lock() protection (currently: a
      wait_queue_head_t and a struct fasync_struct pointer).
      
      [Future patch will add a list anchor for wakeup coalescing]
      
      2) Attach one of such structure to each "struct socket" created in
      sock_alloc_inode().
      
      3) Respect RCU grace period when freeing a "struct socket_wq"
      
      4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
      socket_wq"
      
      5) Change sk_sleep() function to use new sk->sk_wq instead of
      sk->sk_sleep
      
      6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
      a rcu_read_lock() section.
      
      7) Change all sk_has_sleeper() callers to :
        - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
        - Use wq_has_sleeper() to eventually wakeup tasks.
        - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)
      
      8) sock_wake_async() is modified to use rcu protection as well.
      
      9) Exceptions :
        macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
      instead of dynamically allocated ones. They dont need rcu freeing.
      
      Some cleanups or followups are probably needed, (possible
      sk_callback_lock conversion to a spinlock for example...).
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43815482
  22. 27 11月, 2008 1 次提交
  23. 10 11月, 2008 1 次提交
    • M
      net: unix: fix inflight counting bug in garbage collector · 6209344f
      Miklos Szeredi 提交于
      Previously I assumed that the receive queues of candidates don't
      change during the GC.  This is only half true, nothing can be received
      from the queues (see comment in unix_gc()), but buffers could be added
      through the other half of the socket pair, which may still have file
      descriptors referring to it.
      
      This can result in inc_inflight_move_tail() erronously increasing the
      "inflight" counter for a unix socket for which dec_inflight() wasn't
      previously called.  This in turn can trigger the "BUG_ON(total_refs <
      inflight_refs)" in a later garbage collection run.
      
      Fix this by only manipulating the "inflight" counter for sockets which
      are candidates themselves.  Duplicating the file references in
      unix_attach_fds() is also needed to prevent a socket becoming a
      candidate for GC while the skb that contains it is not yet queued.
      Reported-by: NAndrea Bittau <a.bittau@cs.ucl.ac.uk>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      CC: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6209344f
  24. 27 7月, 2008 1 次提交
  25. 29 1月, 2008 2 次提交
  26. 11 11月, 2007 1 次提交
  27. 31 7月, 2007 1 次提交
  28. 12 7月, 2007 1 次提交
    • M
      [AF_UNIX]: Rewrite garbage collector, fixes race. · 1fd05ba5
      Miklos Szeredi 提交于
      Throw out the old mark & sweep garbage collector and put in a
      refcounting cycle detecting one.
      
      The old one had a race with recvmsg, that resulted in false positives
      and hence data loss.  The old algorithm operated on all unix sockets
      in the system, so any additional locking would have meant performance
      problems for all users of these.
      
      The new algorithm instead only operates on "in flight" sockets, which
      are very rare, and the additional locking for these doesn't negatively
      impact the vast majority of users.
      
      In fact it's probable, that there weren't *any* heavy senders of
      sockets over sockets, otherwise the above race would have been
      discovered long ago.
      
      The patch works OK with the app that exposed the race with the old
      code.  The garbage collection has also been verified to work in a few
      simple cases.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fd05ba5
  29. 04 6月, 2007 1 次提交
  30. 03 8月, 2006 1 次提交
    • C
      [AF_UNIX]: Kernel memory leak fix for af_unix datagram getpeersec patch · dc49c1f9
      Catherine Zhang 提交于
      From: Catherine Zhang <cxzhang@watson.ibm.com>
      
      This patch implements a cleaner fix for the memory leak problem of the
      original unix datagram getpeersec patch.  Instead of creating a
      security context each time a unix datagram is sent, we only create the
      security context when the receiver requests it.
      
      This new design requires modification of the current
      unix_getsecpeer_dgram LSM hook and addition of two new hooks, namely,
      secid_to_secctx and release_secctx.  The former retrieves the security
      context and the latter releases it.  A hook is required for releasing
      the security context because it is up to the security module to decide
      how that's done.  In the case of Selinux, it's a simple kfree
      operation.
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc49c1f9
  31. 04 7月, 2006 1 次提交
  32. 30 6月, 2006 1 次提交
    • C
      [AF_UNIX]: Datagram getpeersec · 877ce7c1
      Catherine Zhang 提交于
      This patch implements an API whereby an application can determine the
      label of its peer's Unix datagram sockets via the auxiliary data mechanism of
      recvmsg.
      
      Patch purpose:
      
      This patch enables a security-aware application to retrieve the
      security context of the peer of a Unix datagram socket.  The application
      can then use this security context to determine the security context for
      processing on behalf of the peer who sent the packet.
      
      Patch design and implementation:
      
      The design and implementation is very similar to the UDP case for INET
      sockets.  Basically we build upon the existing Unix domain socket API for
      retrieving user credentials.  Linux offers the API for obtaining user
      credentials via ancillary messages (i.e., out of band/control messages
      that are bundled together with a normal message).  To retrieve the security
      context, the application first indicates to the kernel such desire by
      setting the SO_PASSSEC option via getsockopt.  Then the application
      retrieves the security context using the auxiliary data mechanism.
      
      An example server application for Unix datagram socket should look like this:
      
      toggle = 1;
      toggle_len = sizeof(toggle);
      
      setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
      recvmsg(sockfd, &msg_hdr, 0);
      if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
          cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
          if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
              cmsg_hdr->cmsg_level == SOL_SOCKET &&
              cmsg_hdr->cmsg_type == SCM_SECURITY) {
              memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
          }
      }
      
      sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
      a server socket to receive security context of the peer.
      
      Testing:
      
      We have tested the patch by setting up Unix datagram client and server
      applications.  We verified that the server can retrieve the security context
      using the auxiliary data mechanism of recvmsg.
      Signed-off-by: NCatherine Zhang <cxzhang@watson.ibm.com>
      Acked-by: NAcked-by: James Morris <jmorris@namei.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      877ce7c1
  33. 26 4月, 2006 1 次提交
  34. 21 3月, 2006 1 次提交
  35. 04 1月, 2006 2 次提交
  36. 30 8月, 2005 1 次提交
  37. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4