1. 01 5月, 2013 4 次提交
    • D
      ipc,sem: do not hold ipc lock more than necessary · 16df3674
      Davidlohr Bueso 提交于
      Instead of holding the ipc lock for permissions and security checks, among
      others, only acquire it when necessary.
      
      Some numbers....
      
      1) With Rik's semop-multi.c microbenchmark we can see the following
         results:
      
      Baseline (3.9-rc1):
      cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
      total operations: 151452270, ops/sec 5048409
      
      +  59.40%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
      +   6.14%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
      +   3.84%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
      +   3.64%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
      +   2.06%            a.out  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
      +   1.86%            a.out  [kernel.kallsyms]  [k] ipc_lock
      
      With this patchset:
      cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
      total operations: 273156400, ops/sec 9105213
      
      +  18.54%            a.out  [kernel.kallsyms]  [k] _raw_spin_lock
      +  11.72%            a.out  [kernel.kallsyms]  [k] sys_semtimedop
      +   7.70%            a.out  [kernel.kallsyms]  [k] ipc_has_perm.isra.21
      +   6.58%            a.out  [kernel.kallsyms]  [k] avc_has_perm_flags
      +   6.54%            a.out  [kernel.kallsyms]  [k] __audit_syscall_exit
      +   4.71%            a.out  [kernel.kallsyms]  [k] ipc_obtain_object_check
      
      2) While on an Oracle swingbench DSS (data mining) workload the
         improvements are not as exciting as with Rik's benchmark, we can see
         some positive numbers.  For an 8 socket machine the following are the
         percentages of %sys time incurred in the ipc lock:
      
      Baseline (3.9-rc1):
      100 swingbench users: 8,74%
      400 swingbench users: 21,86%
      800 swingbench users: 84,35%
      
      With this patchset:
      100 swingbench users: 8,11%
      400 swingbench users: 19,93%
      800 swingbench users: 77,69%
      
      [riel@redhat.com: fix two locking bugs]
      [sasha.levin@oracle.com: prevent releasing RCU read lock twice in semctl_main]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NChegu Vinod <chegu_vinod@hp.com>
      Acked-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16df3674
    • D
      ipc: introduce lockless pre_down ipcctl · 444d0f62
      Davidlohr Bueso 提交于
      Various forms of ipc use ipcctl_pre_down() to retrieve an ipc object and
      check permissions, mostly for IPC_RMID and IPC_SET commands.
      
      Introduce ipcctl_pre_down_nolock(), a lockless version of this function.
      The locking version is retained, yet modified to call the nolock version
      without affecting its semantics, thus transparent to all ipc callers.
      Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      444d0f62
    • D
      ipc: introduce obtaining a lockless ipc object · 4d2bff5e
      Davidlohr Bueso 提交于
      Through ipc_lock() and therefore ipc_lock_check() we currently return the
      locked ipc object.  This is not necessary for all situations and can,
      therefore, cause unnecessary ipc lock contention.
      
      Introduce analogous ipc_obtain_object() and ipc_obtain_object_check()
      functions that only lookup and return the ipc object.
      
      Both these functions must be called within the RCU read critical section.
      
      [akpm@linux-foundation.org: propagate the ipc_obtain_object() errno from ipc_lock()]
      Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NChegu Vinod <chegu_vinod@hp.com>
      Acked-by: NMichel Lespinasse <walken@google.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4d2bff5e
    • D
      ipc: remove bogus lock comment for ipc_checkid · 7bb4deff
      Davidlohr Bueso 提交于
      This series makes the sysv semaphore code more scalable, by reducing the
      time the semaphore lock is held, and making the locking more scalable for
      semaphore arrays with multiple semaphores.
      
      The first four patches were written by Davidlohr Buesso, and reduce the
      hold time of the semaphore lock.
      
      The last three patches change the sysv semaphore code locking to be more
      fine grained, providing a performance boost when multiple semaphores in a
      semaphore array are being manipulated simultaneously.
      
      On a 24 CPU system, performance numbers with the semop-multi
      test with N threads and N semaphores, look like this:
      
      	vanilla		Davidlohr's	Davidlohr's +	Davidlohr's +
      	threads			patches		rwlock patches	v3 patches
      	10	610652		726325		1783589		2142206
      	20	341570		365699		1520453		1977878
      	30	288102		307037		1498167		2037995
      	40	290714		305955		1612665		2256484
      	50	288620		312890		1733453		2650292
      	60	289987		306043		1649360		2388008
      	70	291298		306347		1723167		2717486
      	80	290948		305662		1729545		2763582
      	90	290996		306680		1736021		2757524
      	100	292243		306700		1773700		3059159
      
      This patch:
      
      There is no reason to be holding the ipc lock while reading ipcp->seq,
      hence remove misleading comment.
      
      Also simplify the return value for the function.
      Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Cc: Chegu Vinod <chegu_vinod@hp.com>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bb4deff
  2. 05 1月, 2013 2 次提交
    • S
      ipc: introduce message queue copy feature · 4a674f34
      Stanislav Kinsbursky 提交于
      This patch is required for checkpoint/restore in userspace.
      
      c/r requires some way to get all pending IPC messages without deleting
      them from the queue (checkpoint can fail and in this case tasks will be
      resumed, so queue have to be valid).
      
      To achive this, new operation flag MSG_COPY for sys_msgrcv() system call
      was introduced.  If this flag was specified, then mtype is interpreted as
      number of the message to copy.
      
      If MSG_COPY is set, then kernel will allocate dummy message with passed
      size, and then use new copy_msg() helper function to copy desired message
      (instead of unlinking it from the queue).
      
      Notes:
      
      1) Return -ENOSYS if MSG_COPY is specified, but
         CONFIG_CHECKPOINT_RESTORE is not set.
      Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a674f34
    • S
      ipc: add sysctl to specify desired next object id · 03f59566
      Stanislav Kinsbursky 提交于
      Add 3 new variables and sysctls to tune them (by one "next_id" variable
      for messages, semaphores and shared memory respectively).  This variable
      can be used to set desired id for next allocated IPC object.  By default
      it's equal to -1 and old behaviour is preserved.  If this variable is
      non-negative, then desired idr will be extracted from it and used as a
      start value to search for free IDR slot.
      
      Notes:
      
      1) this patch doesn't guarantee that the new object will have desired
         id.  So it's up to user space how to handle new object with wrong id.
      
      2) After a sucessful id allocation attempt, "next_id" will be set back
         to -1 (if it was non-negative).
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      03f59566
  3. 07 9月, 2012 1 次提交
  4. 31 7月, 2012 1 次提交
  5. 24 3月, 2011 1 次提交
  6. 22 6月, 2009 1 次提交
  7. 19 6月, 2009 2 次提交
  8. 07 4月, 2009 2 次提交
    • S
      namespaces: ipc namespaces: implement support for posix msqueues · 7eafd7c7
      Serge E. Hallyn 提交于
      Implement multiple mounts of the mqueue file system, and link it to usage
      of CLONE_NEWIPC.
      
      Each ipc ns has a corresponding mqueuefs superblock.  When a user does
      clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an
      internal mount of a new mqueuefs sb linked to the new ipc ns.
      
      When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the
      mqueuefs superblock.
      
      Posix message queues can be worked with both through the mq_* system calls
      (see mq_overview(7)), and through the VFS through the mqueue mount.  Any
      usage of mq_open() and friends will work with the acting task's ipc
      namespace.  Any actions through the VFS will work with the mqueuefs in
      which the file was created.  So if a user doesn't remount mqueuefs after
      unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls
      /dev/mqueue".
      
      If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns,
      ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1)
      ipc_ns:1 will be freed, (2) it's superblock will live on until task b
      umounts the corresponding mqueuefs, and vfs actions will continue to
      succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to
      the deceased ipc_ns:1.
      
      To make this happen, we must protect the ipc reference count when
      
      a) a task exits and drops its ipcns->count, since it might be dropping
         it to 0 and freeing the ipcns
      
      b) a task accesses the ipcns through its mqueuefs interface, since it
         bumps the ipcns refcount and might race with the last task in the ipcns
         exiting.
      
      So the kref is changed to an atomic_t so we can use
      atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns
      through ns = mqueuefs_sb->s_fs_info is protected by the same lock.
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7eafd7c7
    • S
      namespaces: mqueue ns: move mqueue_mnt into struct ipc_namespace · 614b84cf
      Serge E. Hallyn 提交于
      Move mqueue vfsmount plus a few tunables into the ipc_namespace struct.
      The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the
      posix message queue namespaces and the SYSV ipc namespaces.
      
      The sysctl code will be fixed separately in patch 3.  After just this
      patch, making a change to posix mqueue tunables always changes the values
      in the initial ipc namespace.
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      614b84cf
  9. 26 7月, 2008 1 次提交
  10. 29 4月, 2008 4 次提交
  11. 09 2月, 2008 3 次提交
    • P
      IPC: make struct ipc_ids static in ipc_namespace · ed2ddbf8
      Pierre Peiffer 提交于
      Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for
      msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids'
      are dynamically allocated for each icp_namespace as the ipc_namespace
      itself (for the init namespace, they are initialized with pointers to
      static variables instead)
      
      It is so for historical reason: in fact, before the use of idr to store the
      ipcs, the ipcs were stored in tables of variable length, depending of the
      maximum number of ipc allowed.  Now, these 'struct ipc_ids' have a fixed
      size.  As they are allocated in any cases for each new ipc_namespace, there
      is no gain of memory in having them allocated separately of the struct
      ipc_namespace.
      
      This patch proposes to make this table static in the struct ipc_namespace.
      Thus, we can allocate all in once and get rid of all the code needed to
      allocate and free these ipc_ids separately.
      Signed-off-by: NPierre Peiffer <pierre.peiffer@bull.net>
      Acked-by: NCedric Le Goater <clg@fr.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Nadia Derbey <Nadia.Derbey@bull.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed2ddbf8
    • P
      ipc: uninline some code from util.h · b2d75cdd
      Pavel Emelyanov 提交于
      ipc_lock_check_down(), ipc_lock_check() and ipcget() seem too large to be
      inline.  Besides, they give no optimization being inline as they perform
      calls inside in any case.
      
      Moving them into ipc/util.c saves 500 bytes of vmlinux and shortens IPC
      internal API.
      
      $ ./scripts/bloat-o-meter vmlinux-orig vmlinux
      add/remove: 3/2 grow/shrink: 0/10 up/down: 490/-989 (-499)
      function                                     old     new   delta
      ipcget                                         -     392    +392
      ipc_lock_check_down                            -      49     +49
      ipc_lock_check                                 -      49     +49
      sys_semget                                   119     105     -14
      sys_shmget                                   108      86     -22
      sys_msgget                                   100      78     -22
      do_msgsnd                                    665     631     -34
      do_msgrcv                                    680     644     -36
      do_shmat                                     771     733     -38
      sys_msgctl                                  1302    1229     -73
      ipcget_new                                    80       -     -80
      sys_semtimedop                              1534    1452     -82
      sys_semctl                                  2034    1922    -112
      sys_shmctl                                  1919    1765    -154
      ipcget_public                                322       -    -322
      
      The ipcget() growth is the result of gcc inlining of currently static
      ipcget_new/_public.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Nadia Derbey <Nadia.Derbey@bull.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b2d75cdd
    • P
      namespaces: move the IPC namespace under IPC_NS option · ae5e1b22
      Pavel Emelyanov 提交于
      Currently the IPC namespace management code is spread over the ipc/*.c files.
      I moved this code into ipc/namespace.c file which is compiled out when needed.
      
      The linux/ipc_namespace.h file is used to store the prototypes of the
      functions in namespace.c and the stubs for NAMESPACES=n case.  This is done
      so, because the stub for copy_ipc_namespace requires the knowledge of the
      CLONE_NEWIPC flag, which is in sched.h.  But the linux/ipc.h file itself in
      included into many many .c files via the sys.h->sem.h sequence so adding the
      sched.h into it will make all these .c depend on sched.h which is not that
      good.  On the other hand the knowledge about the namespaces stuff is required
      in 4 .c files only.
      
      Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
      msg.c and shm.c files.  It turned out that moving these functions into
      namespaces.c is not that easy because they use many other calls and macros
      from the original file.  Moving them would make this patch complicated.  On
      the other hand all these functions can be consolidated, so I will send a
      separate patch doing this a bit later.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae5e1b22
  12. 20 10月, 2007 10 次提交
  13. 17 7月, 2007 1 次提交
  14. 04 11月, 2006 1 次提交
    • P
      [PATCH] Fix ipc entries removal · c7e12b83
      Pavel Emelianov 提交于
      Fix two issuses related to ipc_ids->entries freeing.
      
      1. When freeing ipc namespace we need to free entries allocated
         with ipc_init_ids().
      
      2. When removing old entries in grow_ary() ipc_rcu_putref()
         may be called on entries set to &ids->nullentry earlier in
         ipc_init_ids().
         This is almost impossible without namespaces, but with
         them this situation becomes possible.
      
      Found during OpenVZ testing after obvious leaks in beancounters.
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Cc: Kirill Korotaev <dev@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c7e12b83
  15. 02 10月, 2006 1 次提交
  16. 27 3月, 2006 1 次提交
  17. 15 1月, 2006 1 次提交
  18. 08 9月, 2005 1 次提交
  19. 13 7月, 2005 1 次提交
  20. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4