1. 31 7月, 2007 1 次提交
  2. 12 7月, 2007 1 次提交
    • M
      [AF_UNIX]: Rewrite garbage collector, fixes race. · 1fd05ba5
      Miklos Szeredi 提交于
      Throw out the old mark & sweep garbage collector and put in a
      refcounting cycle detecting one.
      
      The old one had a race with recvmsg, that resulted in false positives
      and hence data loss.  The old algorithm operated on all unix sockets
      in the system, so any additional locking would have meant performance
      problems for all users of these.
      
      The new algorithm instead only operates on "in flight" sockets, which
      are very rare, and the additional locking for these doesn't negatively
      impact the vast majority of users.
      
      In fact it's probable, that there weren't *any* heavy senders of
      sockets over sockets, otherwise the above race would have been
      discovered long ago.
      
      The patch works OK with the app that exposed the race with the old
      code.  The garbage collection has also been verified to work in a few
      simple cases.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fd05ba5
  3. 11 7月, 2007 1 次提交
  4. 08 6月, 2007 1 次提交
    • M
      [AF_UNIX]: Fix stream recvmsg() race. · 3c0d2f37
      Miklos Szeredi 提交于
      A recv() on an AF_UNIX, SOCK_STREAM socket can race with a
      send()+close() on the peer, causing recv() to return zero, even though
      the sent data should be received.
      
      This happens if the send() and the close() is performed between
      skb_dequeue() and checking sk->sk_shutdown in unix_stream_recvmsg():
      
      process A  skb_dequeue() returns NULL, there's no data in the socket queue
      process B  new data is inserted onto the queue by unix_stream_sendmsg()
      process B  sk->sk_shutdown is set to SHUTDOWN_MASK by unix_release_sock()
      process A  sk->sk_shutdown is checked, unix_release_sock() returns zero
      
      I'm surprised nobody noticed this, it's not hard to trigger.  Maybe
      it's just (un)luck with the timing.
      
      It's possible to work around this bug in userspace, by retrying the
      recv() once in case of a zero return value.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c0d2f37
  5. 04 6月, 2007 2 次提交
    • D
      [AF_UNIX]: Fix datagram connect race causing an OOPS. · 278a3de5
      David S. Miller 提交于
      Based upon an excellent bug report and initial patch by
      Frederik Deweerdt.
      
      The UNIX datagram connect code blindly dereferences other->sk_socket
      via the call down to the security_unix_may_send() function.
      
      Without locking 'other' that pointer can go NULL via unix_release_sock()
      which does sock_orphan() which also marks the socket SOCK_DEAD.
      
      So we have to lock both 'sk' and 'other' yet avoid all kinds of
      potential deadlocks (connect to self is OK for datagram sockets and it
      is possible for two datagram sockets to perform a simultaneous connect
      to each other).  So what we do is have a "double lock" function similar
      to how we handle this situation in other areas of the kernel.  We take
      the lock of the socket pointer with the smallest address first in
      order to avoid ABBA style deadlocks.
      
      Once we have them both locked, we check to see if SOCK_DEAD is set
      for 'other' and if so, drop everything and retry the lookup.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      278a3de5
    • D
      [AF_UNIX]: Make socket locking much less confusing. · 1c92b4e5
      David S. Miller 提交于
      The unix_state_*() locking macros imply that there is some
      rwlock kind of thing going on, but the implementation is
      actually a spinlock which makes the code more confusing than
      it needs to be.
      
      So use plain unix_state_lock and unix_state_unlock.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c92b4e5
  6. 09 5月, 2007 1 次提交
  7. 26 4月, 2007 1 次提交
  8. 07 3月, 2007 1 次提交
  9. 03 3月, 2007 1 次提交
  10. 13 2月, 2007 1 次提交
  11. 11 2月, 2007 1 次提交
  12. 03 12月, 2006 1 次提交
  13. 23 9月, 2006 2 次提交
  14. 03 8月, 2006 1 次提交
    • C
      [AF_UNIX]: Kernel memory leak fix for af_unix datagram getpeersec patch · dc49c1f9
      Catherine Zhang 提交于
      From: Catherine Zhang <cxzhang@watson.ibm.com>
      
      This patch implements a cleaner fix for the memory leak problem of the
      original unix datagram getpeersec patch.  Instead of creating a
      security context each time a unix datagram is sent, we only create the
      security context when the receiver requests it.
      
      This new design requires modification of the current
      unix_getsecpeer_dgram LSM hook and addition of two new hooks, namely,
      secid_to_secctx and release_secctx.  The former retrieves the security
      context and the latter releases it.  A hook is required for releasing
      the security context because it is up to the security module to decide
      how that's done.  In the case of Selinux, it's a simple kfree
      operation.
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc49c1f9
  15. 22 7月, 2006 1 次提交
  16. 04 7月, 2006 2 次提交
  17. 01 7月, 2006 1 次提交
  18. 30 6月, 2006 1 次提交
    • C
      [AF_UNIX]: Datagram getpeersec · 877ce7c1
      Catherine Zhang 提交于
      This patch implements an API whereby an application can determine the
      label of its peer's Unix datagram sockets via the auxiliary data mechanism of
      recvmsg.
      
      Patch purpose:
      
      This patch enables a security-aware application to retrieve the
      security context of the peer of a Unix datagram socket.  The application
      can then use this security context to determine the security context for
      processing on behalf of the peer who sent the packet.
      
      Patch design and implementation:
      
      The design and implementation is very similar to the UDP case for INET
      sockets.  Basically we build upon the existing Unix domain socket API for
      retrieving user credentials.  Linux offers the API for obtaining user
      credentials via ancillary messages (i.e., out of band/control messages
      that are bundled together with a normal message).  To retrieve the security
      context, the application first indicates to the kernel such desire by
      setting the SO_PASSSEC option via getsockopt.  Then the application
      retrieves the security context using the auxiliary data mechanism.
      
      An example server application for Unix datagram socket should look like this:
      
      toggle = 1;
      toggle_len = sizeof(toggle);
      
      setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
      recvmsg(sockfd, &msg_hdr, 0);
      if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
          cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
          if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
              cmsg_hdr->cmsg_level == SOL_SOCKET &&
              cmsg_hdr->cmsg_type == SCM_SECURITY) {
              memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
          }
      }
      
      sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
      a server socket to receive security context of the peer.
      
      Testing:
      
      We have tested the patch by setting up Unix datagram client and server
      applications.  We verified that the server can retrieve the security context
      using the auxiliary data mechanism of recvmsg.
      Signed-off-by: NCatherine Zhang <cxzhang@watson.ibm.com>
      Acked-by: NAcked-by: James Morris <jmorris@namei.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      877ce7c1
  19. 26 3月, 2006 1 次提交
    • D
      [PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications · f348d70a
      Davide Libenzi 提交于
      Implement the half-closed devices notifiation, by adding a new POLLRDHUP
      (and its alias EPOLLRDHUP) bit to the existing poll/select sets.  Since the
      existing POLLHUP handling, that does not report correctly half-closed
      devices, was feared to be changed, this implementation leaves the current
      POLLHUP reporting unchanged and simply add a new bit that is set in the few
      places where it makes sense.  The same thing was discussed and conceptually
      agreed quite some time ago:
      
      http://lkml.org/lkml/2003/7/12/116
      
      Since this new event bit is added to the existing Linux poll infrastruture,
      even the existing poll/select system calls will be able to use it.  As far
      as the existing POLLHUP handling, the patch leaves it as is.  The
      pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing
      archs and sets the bit in the six relevant files.  The other attached diff
      is the simple change required to sys/epoll.h to add the EPOLLRDHUP
      definition.
      
      There is "a stupid program" to test POLLRDHUP delivery here:
      
       http://www.xmailserver.org/pollrdhup-test.c
      
      It tests poll(2), but since the delivery is same epoll(2) will work equally.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f348d70a
  20. 21 3月, 2006 2 次提交
  21. 09 3月, 2006 1 次提交
    • D
      [PATCH] fix file counting · 529bf6be
      Dipankar Sarma 提交于
      I have benchmarked this on an x86_64 NUMA system and see no significant
      performance difference on kernbench.  Tested on both x86_64 and powerpc.
      
      The way we do file struct accounting is not very suitable for batched
      freeing.  For scalability reasons, file accounting was
      constructor/destructor based.  This meant that nr_files was decremented
      only when the object was removed from the slab cache.  This is susceptible
      to slab fragmentation.  With RCU based file structure, consequent batched
      freeing and a test program like Serge's, we just speed this up and end up
      with a very fragmented slab -
      
      llm22:~ # cat /proc/sys/fs/file-nr
      587730  0       758844
      
      At the same time, I see only a 2000+ objects in filp cache.  The following
      patch I fixes this problem.
      
      This patch changes the file counting by removing the filp_count_lock.
      Instead we use a separate percpu counter, nr_files, for now and all
      accesses to it are through get_nr_files() api.  In the sysctl handler for
      nr_files, we populate files_stat.nr_files before returning to user.
      
      Counting files as an when they are created and destroyed (as opposed to
      inside slab) allows us to correctly count open files with RCU.
      Signed-off-by: NDipankar Sarma <dipankar@in.ibm.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      529bf6be
  22. 10 1月, 2006 1 次提交
  23. 04 1月, 2006 5 次提交
  24. 09 11月, 2005 1 次提交
  25. 30 8月, 2005 2 次提交
  26. 09 7月, 2005 1 次提交
  27. 20 5月, 2005 1 次提交
  28. 26 4月, 2005 2 次提交
    • A
      [NET]: kill gratitious includes of major.h · 5523662c
      Al Viro 提交于
      	A lot of places in there are including major.h for no reason
      whatsoever.  Removed.  And yes, it still builds.
      
      	The history of that stuff is often amusing.  E.g. for net/core/sock.c
      the story looks so, as far as I've been able to reconstruct it: we used to
      need major.h in net/socket.c circa 1.1.early.  In 1.1.13 that need had
      disappeared, along with register_chrdev(SOCKET_MAJOR, "socket", &net_fops)
      in sock_init().  Include had not.  When 1.2 -> 1.3 reorg of net/* had moved
      a lot of stuff from net/socket.c to net/core/sock.c, this crap had followed...
      Signed-off-by: NAl Viro <viro@parcelfarce.linux.theplanet.co.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5523662c
    • A
      [PATCH] kill gratitious includes of major.h under net/* · b453257f
      Al Viro 提交于
      A lot of places in there are including major.h for no reason whatsoever.
      Removed.  And yes, it still builds. 
      
      The history of that stuff is often amusing.  E.g.  for net/core/sock.c
      the story looks so, as far as I've been able to reconstruct it: we used
      to need major.h in net/socket.c circa 1.1.early.  In 1.1.13 that need
      had disappeared, along with register_chrdev(SOCKET_MAJOR, "socket",
      &net_fops) in sock_init().  Include had not.  When 1.2 -> 1.3 reorg of
      net/* had moved a lot of stuff from net/socket.c to net/core/sock.c,
      this crap had followed... 
      Signed-off-by: NAl Viro <viro@parcelfarce.linux.theplanet.co.uk>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b453257f
  29. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4