1. 29 9月, 2011 1 次提交
    • E
      af_unix: dont send SCM_CREDENTIALS by default · 16e57262
      Eric Dumazet 提交于
      Since commit 7361c36c (af_unix: Allow credentials to work across
      user and pid namespaces) af_unix performance dropped a lot.
      
      This is because we now take a reference on pid and cred in each write(),
      and release them in read(), usually done from another process,
      eventually from another cpu. This triggers false sharing.
      
      # Events: 154K cycles
      #
      # Overhead  Command       Shared Object        Symbol
      # ........  .......  ..................  .........................
      #
          10.40%  hackbench  [kernel.kallsyms]   [k] put_pid
           8.60%  hackbench  [kernel.kallsyms]   [k] unix_stream_recvmsg
           7.87%  hackbench  [kernel.kallsyms]   [k] unix_stream_sendmsg
           6.11%  hackbench  [kernel.kallsyms]   [k] do_raw_spin_lock
           4.95%  hackbench  [kernel.kallsyms]   [k] unix_scm_to_skb
           4.87%  hackbench  [kernel.kallsyms]   [k] pid_nr_ns
           4.34%  hackbench  [kernel.kallsyms]   [k] cred_to_ucred
           2.39%  hackbench  [kernel.kallsyms]   [k] unix_destruct_scm
           2.24%  hackbench  [kernel.kallsyms]   [k] sub_preempt_count
           1.75%  hackbench  [kernel.kallsyms]   [k] fget_light
           1.51%  hackbench  [kernel.kallsyms]   [k]
      __mutex_lock_interruptible_slowpath
           1.42%  hackbench  [kernel.kallsyms]   [k] sock_alloc_send_pskb
      
      This patch includes SCM_CREDENTIALS information in a af_unix message/skb
      only if requested by the sender, [man 7 unix for details how to include
      ancillary data using sendmsg() system call]
      
      Note: This might break buggy applications that expected SCM_CREDENTIAL
      from an unaware write() system call, and receiver not using SO_PASSCRED
      socket option.
      
      If SOCK_PASSCRED is set on source or destination socket, we still
      include credentials for mere write() syscalls.
      
      Performance boost in hackbench : more than 50% gain on a 16 thread
      machine (2 quad-core cpus, 2 threads per core)
      
      hackbench 20 thread 2000
      
      4.228 sec instead of 9.102 sec
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16e57262
  2. 17 9月, 2011 1 次提交
  3. 25 8月, 2011 1 次提交
    • T
      Scm: Remove unnecessary pid & credential references in Unix socket's send and receive path · 0856a304
      Tim Chen 提交于
      Patch series 109f6e39..7361c36c back in 2.6.36 added functionality to
      allow credentials to work across pid namespaces for packets sent via
      UNIX sockets.  However, the atomic reference counts on pid and
      credentials caused plenty of cache bouncing when there are numerous
      threads of the same pid sharing a UNIX socket.  This patch mitigates the
      problem by eliminating extraneous reference counts on pid and
      credentials on both send and receive path of UNIX sockets. I found a 2x
      improvement in hackbench's threaded case.
      
      On the receive path in unix_dgram_recvmsg, currently there is an
      increment of reference count on pid and credentials in scm_set_cred.
      Then there are two decrement of the reference counts.  Once in scm_recv
      and once when skb_free_datagram call skb->destructor function
      unix_destruct_scm.  One pair of increment and decrement of ref count on
      pid and credentials can be eliminated from the receive path.  Until we
      destroy the skb, we already set a reference when we created the skb on
      the send side.
      
      On the send path, there are two increments of ref count on pid and
      credentials, once in scm_send and once in unix_scm_to_skb.  Then there
      is a decrement of the reference counts in scm_destroy's call to
      scm_destroy_cred at the end of unix_dgram_sendmsg functions.   One pair
      of increment and decrement of the reference counts can be removed so we
      only need to increment the ref counts once.
      
      By incorporating these changes, for hackbench running on a 4 socket
      NHM-EX machine with 40 cores, the execution of hackbench on
      50 groups of 20 threads sped up by factor of 2.
      
      Hackbench command used for testing:
      ./hackbench 50 thread 2000
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0856a304
  4. 25 11月, 2010 1 次提交
  5. 17 6月, 2010 2 次提交
  6. 04 11月, 2009 1 次提交
  7. 06 7月, 2009 1 次提交
  8. 14 11月, 2008 1 次提交
  9. 07 11月, 2008 2 次提交
    • D
      net: Fix recursive descent in __scm_destroy(). · 3b53fbf4
      David S. Miller 提交于
      __scm_destroy() walks the list of file descriptors in the scm_fp_list
      pointed to by the scm_cookie argument.
      
      Those, in turn, can close sockets and invoke __scm_destroy() again.
      
      There is nothing which limits how deeply this can occur.
      
      The idea for how to fix this is from Linus.  Basically, we do all of
      the fput()s at the top level by collecting all of the scm_fp_list
      objects hit by an fput().  Inside of the initial __scm_destroy() we
      keep running the list until it is empty.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b53fbf4
    • D
      net: Fix recursive descent in __scm_destroy(). · f8d570a4
      David Miller 提交于
      __scm_destroy() walks the list of file descriptors in the scm_fp_list
      pointed to by the scm_cookie argument.
      
      Those, in turn, can close sockets and invoke __scm_destroy() again.
      
      There is nothing which limits how deeply this can occur.
      
      The idea for how to fix this is from Linus.  Basically, we do all of
      the fput()s at the top level by collecting all of the scm_fp_list
      objects hit by an fput().  Inside of the initial __scm_destroy() we
      keep running the list until it is empty.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8d570a4
  10. 20 10月, 2007 1 次提交
    • P
      pid namespaces: changes to show virtual ids to user · b488893a
      Pavel Emelyanov 提交于
      This is the largest patch in the set. Make all (I hope) the places where
      the pid is shown to or get from user operate on the virtual pids.
      
      The idea is:
       - all in-kernel data structures must store either struct pid itself
         or the pid's global nr, obtained with pid_nr() call;
       - when seeking the task from kernel code with the stored id one
         should use find_task_by_pid() call that works with global pids;
       - when showing pid's numerical value to the user the virtual one
         should be used, but however when one shows task's pid outside this
         task's namespace the global one is to be used;
       - when getting the pid from userspace one need to consider this as
         the virtual one and use appropriate task/pid-searching functions.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: nuther build fix]
      [akpm@linux-foundation.org: yet nuther build fix]
      [akpm@linux-foundation.org: remove unneeded casts]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b488893a
  11. 18 7月, 2007 1 次提交
  12. 03 8月, 2006 1 次提交
    • C
      [AF_UNIX]: Kernel memory leak fix for af_unix datagram getpeersec patch · dc49c1f9
      Catherine Zhang 提交于
      From: Catherine Zhang <cxzhang@watson.ibm.com>
      
      This patch implements a cleaner fix for the memory leak problem of the
      original unix datagram getpeersec patch.  Instead of creating a
      security context each time a unix datagram is sent, we only create the
      security context when the receiver requests it.
      
      This new design requires modification of the current
      unix_getsecpeer_dgram LSM hook and addition of two new hooks, namely,
      secid_to_secctx and release_secctx.  The former retrieves the security
      context and the latter releases it.  A hook is required for releasing
      the security context because it is up to the security module to decide
      how that's done.  In the case of Selinux, it's a simple kfree
      operation.
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc49c1f9
  13. 30 6月, 2006 1 次提交
    • C
      [AF_UNIX]: Datagram getpeersec · 877ce7c1
      Catherine Zhang 提交于
      This patch implements an API whereby an application can determine the
      label of its peer's Unix datagram sockets via the auxiliary data mechanism of
      recvmsg.
      
      Patch purpose:
      
      This patch enables a security-aware application to retrieve the
      security context of the peer of a Unix datagram socket.  The application
      can then use this security context to determine the security context for
      processing on behalf of the peer who sent the packet.
      
      Patch design and implementation:
      
      The design and implementation is very similar to the UDP case for INET
      sockets.  Basically we build upon the existing Unix domain socket API for
      retrieving user credentials.  Linux offers the API for obtaining user
      credentials via ancillary messages (i.e., out of band/control messages
      that are bundled together with a normal message).  To retrieve the security
      context, the application first indicates to the kernel such desire by
      setting the SO_PASSSEC option via getsockopt.  Then the application
      retrieves the security context using the auxiliary data mechanism.
      
      An example server application for Unix datagram socket should look like this:
      
      toggle = 1;
      toggle_len = sizeof(toggle);
      
      setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
      recvmsg(sockfd, &msg_hdr, 0);
      if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
          cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
          if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
              cmsg_hdr->cmsg_level == SOL_SOCKET &&
              cmsg_hdr->cmsg_type == SCM_SECURITY) {
              memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
          }
      }
      
      sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
      a server socket to receive security context of the peer.
      
      Testing:
      
      We have tested the patch by setting up Unix datagram client and server
      applications.  We verified that the server can retrieve the security context
      using the auxiliary data mechanism of recvmsg.
      Signed-off-by: NCatherine Zhang <cxzhang@watson.ibm.com>
      Acked-by: NAcked-by: James Morris <jmorris@namei.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      877ce7c1
  14. 21 3月, 2006 1 次提交
    • B
      [AF_UNIX]: scm: better initialization · 1d541ddd
      Benjamin LaHaise 提交于
      Instead of doing a memset then initialization of the fields of the scm
      structure, just initialize all the members explicitly.  Prevent reloading
      of current on x86 and x86-64 by storing the value in a local variable for
      subsequent dereferences.  This is worth a ~7KB/s increase in af_unix
      bandwidth.  Note that we avoid the issues surrounding potentially
      uninitialized members of the ucred structure by constructing a struct
      ucred instead of assigning the members individually, which forces the
      compiler to zero any padding.
      
      [ I modified the patch not to use the aggregate assignment since
        gcc-3.4.x and earlier cannot optimize that properly at all even
        though gcc-4.0.x and later can -DaveM ]
      Signed-off-by: NBenjamin LaHaise <benjamin.c.lahaise@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d541ddd
  15. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4