1. 01 6月, 2012 1 次提交
    • D
      ipc/mqueue: cleanup definition names and locations · 93e6f119
      Doug Ledford 提交于
      Since commit b231cca4 ("message queues: increase range limits") on
      Oct 18, 2008, calls to mq_open() that did not pass in an attribute
      struct and expected to get default values for the size of the queue and
      the max message size now get the system wide maximums instead of
      hardwired defaults like they used to get.
      
      This was uncovered when one of the earlier patches in this patch set
      increased the default system wide maximums at the same time it increased
      the hard ceiling on the system wide maximums (a customer specifically
      needed the hard ceiling brought back up, the new ceiling that commit
      b231cca4 introduced was too low for their production systems).  By
      increasing the default maximums and not realising they were tied to any
      attempt to create a message queue without an attribute struct, I had
      inadvertently made it such that all message queue creation attempts
      without an attribute struct were failing because the new default
      maximums would create a queue that exceeded the default rlimit for
      message queue bytes.
      
      As a result, the system wide defaults were brought back down to their
      previous levels, and the system wide ceilings on the maximums were
      raised to meet the customer's needs.  However, the fact that the no
      attribute struct behavior of mq_open() could be broken by changing the
      system wide maximums for message queues was seen as fundamentally broken
      itself.  So we hardwired the no attribute case back like it used to be.
      But, then we realized that on the very off chance that some piece of
      software in the wild depended on that behavior, we could work around
      that issue by adding two new knobs to /proc that allowed setting the
      defaults for message queues created without an attr struct separately
      from the system wide maximums.
      
      What is not an option IMO is to leave the current behavior in place.  No
      piece of software should ever rely on setting the system wide maximums
      in order to get a desired message queue.  Such a reliance would be so
      fundamentally multitasking OS unfriendly as to not really be tolerable.
      Fortunately, we don't know of any software in the wild that uses this
      except for a regression test program that caught the issue in the first
      place.  If there is though, we have made accommodations with the two new
      /proc knobs (and that's all the accommodations such fundamentally broken
      software can be allowed)..
      
      This patch:
      
      The various defines for minimums and maximums of the sysctl controllable
      mqueue values are scattered amongst different files and named
      inconsistently.  Move them all into ipc_namespace.h and make them have
      consistent names.  Additionally, make the number of queues per namespace
      also have a minimum and maximum and use the same sysctl function as the
      other two settable variables.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Amerigo Wang <amwang@redhat.com>
      Cc: Joe Korty <joe.korty@ccur.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93e6f119
  2. 06 5月, 2012 1 次提交
  3. 03 5月, 2012 1 次提交
  4. 08 4月, 2012 2 次提交
  5. 22 3月, 2012 1 次提交
  6. 21 3月, 2012 1 次提交
  7. 16 3月, 2012 1 次提交
    • C
      [PATCH v3] ipc: provide generic compat versions of IPC syscalls · 48b25c43
      Chris Metcalf 提交于
      When using the "compat" APIs, architectures will generally want to
      be able to make direct syscalls to msgsnd(), shmctl(), etc., and
      in the kernel we would want them to be handled directly by
      compat_sys_xxx() functions, as is true for other compat syscalls.
      
      However, for historical reasons, several of the existing compat IPC
      syscalls do not do this.  semctl() expects a pointer to the fourth
      argument, instead of the fourth argument itself.  msgsnd(), msgrcv()
      and shmat() expect arguments in different order.
      
      This change adds an ARCH_WANT_OLD_COMPAT_IPC config option that can be
      set to preserve this behavior for ports that use it (x86, sparc, powerpc,
      s390, and mips).  No actual semantics are changed for those architectures,
      and there is only a minimal amount of code refactoring in ipc/compat.c.
      
      Newer architectures like tile (and perhaps future architectures such
      as arm64 and unicore64) should not select this option, and thus can
      avoid having any IPC-specific code at all in their architecture-specific
      compat layer.  In the same vein, if this option is not selected, IPC_64
      mode is assumed, since that's what the <asm-generic> headers expect.
      
      The workaround code in "tile" for msgsnd() and msgrcv() is removed
      with this change; it also fixes the bug that shmat() and semctl() were
      not being properly handled.
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      48b25c43
  8. 14 2月, 2012 1 次提交
  9. 24 1月, 2012 3 次提交
    • H
      SHM_UNLOCK: fix Unevictable pages stranded after swap · 24513264
      Hugh Dickins 提交于
      Commit cc39c6a9 ("mm: account skipped entries to avoid looping in
      find_get_pages") correctly fixed an infinite loop; but left a problem
      that find_get_pages() on shmem would return 0 (appearing to callers to
      mean end of tree) when it meets a run of nr_pages swap entries.
      
      The only uses of find_get_pages() on shmem are via pagevec_lookup(),
      called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's
      scan_mapping_unevictable_pages().  The first is already commented, and
      not worth worrying about; but the second can leave pages on the
      Unevictable list after an unusual sequence of swapping and locking.
      
      Fix that by using shmem_find_get_pages_and_swap() (then ignoring the
      swap) instead of pagevec_lookup().
      
      But I don't want to contaminate vmscan.c with shmem internals, nor
      shmem.c with LRU locking.  So move scan_mapping_unevictable_pages() into
      shmem.c, renaming it shmem_unlock_mapping(); and rename
      check_move_unevictable_page() to check_move_unevictable_pages(), looping
      down an array of pages, oftentimes under the same lock.
      
      Leave out the "rotate unevictable list" block: that's a leftover from
      when this was used for /proc/sys/vm/scan_unevictable_pages, whose flawed
      handling involved looking at pages at tail of LRU.
      
      Was there significance to the sequence first ClearPageUnevictable, then
      test page_evictable, then SetPageUnevictable here? I think not, we're
      under LRU lock, and have no barriers between those.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: <stable@vger.kernel.org> [back to 3.1 but will need respins]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24513264
    • H
      SHM_UNLOCK: fix long unpreemptible section · 85046579
      Hugh Dickins 提交于
      scan_mapping_unevictable_pages() is used to make SysV SHM_LOCKed pages
      evictable again once the shared memory is unlocked.  It does this with
      pagevec_lookup()s across the whole object (which might occupy most of
      memory), and takes 300ms to unlock 7GB here.  A cond_resched() every
      PAGEVEC_SIZE pages would be good.
      
      However, KOSAKI-san points out that this is called under shmem.c's
      info->lock, and it's also under shm.c's shm_lock(), both spinlocks.
      There is no strong reason for that: we need to take these pages off the
      unevictable list soonish, but those locks are not required for it.
      
      So move the call to scan_mapping_unevictable_pages() from shmem.c's
      unlock handling up to shm.c's unlock handling.  Remove the recently
      added barrier, not needed now we have spin_unlock() before the scan.
      
      Use get_file(), with subsequent fput(), to make sure we have a reference
      to mapping throughout scan_mapping_unevictable_pages(): that's something
      that was previously guaranteed by the shm_lock().
      
      Remove shmctl's lru_add_drain_all(): we don't fault in pages at SHM_LOCK
      time, and we lazily discover them to be Unevictable later, so it serves
      no purpose for SHM_LOCK; and serves no purpose for SHM_UNLOCK, since
      pages still on pagevec are not marked Unevictable.
      
      The original code avoided redundant rescans by checking VM_LOCKED flag
      at its level: now avoid them by checking shp's SHM_LOCKED.
      
      The original code called scan_mapping_unevictable_pages() on a locked
      area at shm_destroy() time: perhaps we once had accounting cross-checks
      which required that, but not now, so skip the overhead and just let
      inode eviction deal with them.
      
      Put check_move_unevictable_page() and scan_mapping_unevictable_pages()
      under CONFIG_SHMEM (with stub for the TINY case when ramfs is used),
      more as comment than to save space; comment them used for SHM_UNLOCK.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michel Lespinasse <walken@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      85046579
    • D
      ipc/mqueue: simplify reading msgqueue limit · 2a4e64b8
      Davidlohr Bueso 提交于
      Because the current task is being used to get the limit, we can simply
      use rlimit() instead of task_rlimit().
      Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a4e64b8
  10. 11 1月, 2012 1 次提交
    • S
      user namespace: make signal.c respect user namespaces · 6b550f94
      Serge E. Hallyn 提交于
      ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
      user namespace. (new, thanks Oleg)
      
      __send_signal: convert current's uid to the recipient's user namespace
      for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
      again :)
      
      do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
      user namespace
      
      ptrace_signal maps parent's uid into current's user namespace before
      including in signal to current.  IIUC Oleg has argued that this shouldn't
      matter as the debugger will play with it, but it seems like not converting
      the value currently being set is misleading.
      
      Changelog:
      Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
      	simplify callers and help make clear what we are translating
              (which uid into which namespace).  Passing the target task would
      	make callers even easier to read, but we pass in user_ns because
      	current_user_ns() != task_cred_xxx(current, user_ns).
      Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
      	in ptrace_signal().
      Sep 23: In send_signal(), detect when (user) signal is coming from an
      	ancestor or unrelated user namespace.  Pass that on to __send_signal,
      	which sets si_uid to 0 or overflowuid if needed.
      Oct 12: Base on Oleg's fixup_uid() patch.  On top of that, handle all
      	SI_FROMKERNEL cases at callers, because we can't assume sender is
      	current in those cases.
      Nov 10: (mhelsley) rename fixup_uid to more meaningful usern_fixup_signal_uid
      Nov 10: (akpm) make the !CONFIG_USER_NS case clearer
      Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      From: Serge Hallyn <serge.hallyn@canonical.com>
      Subject: __send_signal: pass q->info, not info, to userns_fixup_signal_uid (v2)
      
      Eric Biederman pointed out that passing info is a bug and could lead to a
      NULL pointer deref to boot.
      
      A collection of signal, securebits, filecaps, cap_bounds, and a few other
      ltp tests passed with this kernel.
      
      Changelog:
          Nov 18: previous patch missed a leading '&'
      Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      From: Dan Carpenter <dan.carpenter@oracle.com>
      Subject: ipc/mqueue: lock() => unlock() typo
      
      There was a double lock typo introduced in b085f4bd6b21 "user namespace:
      make signal.c respect user namespaces"
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b550f94
  11. 04 1月, 2012 4 次提交
  12. 09 12月, 2011 1 次提交
  13. 03 11月, 2011 3 次提交
  14. 01 11月, 2011 1 次提交
  15. 05 8月, 2011 1 次提交
  16. 04 8月, 2011 2 次提交
  17. 31 7月, 2011 2 次提交
  18. 27 7月, 2011 3 次提交
    • V
      ipc: introduce shm_rmid_forced sysctl · b34a6b1d
      Vasiliy Kulikov 提交于
      Add support for the shm_rmid_forced sysctl.  If set to 1, all shared
      memory objects in current ipc namespace will be automatically forced to
      use IPC_RMID.
      
      The POSIX way of handling shmem allows one to create shm objects and
      call shmdt(), leaving shm object associated with no process, thus
      consuming memory not counted via rlimits.
      
      With shm_rmid_forced=1 the shared memory object is counted at least for
      one process, so OOM killer may effectively kill the fat process holding
      the shared memory.
      
      It obviously breaks POSIX - some programs relying on the feature would
      stop working.  So set shm_rmid_forced=1 only if you're sure nobody uses
      "orphaned" memory.  Use shm_rmid_forced=0 by default for compatability
      reasons.
      
      The feature was previously impemented in -ow as a configure option.
      
      [akpm@linux-foundation.org: fix documentation, per Randy]
      [akpm@linux-foundation.org: fix warning]
      [akpm@linux-foundation.org: readability/conventionality tweaks]
      [akpm@linux-foundation.org: fix shm_rmid_forced/shm_forced_rmid confusion, use standard comment layout]
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Serge E. Hallyn" <serge.hallyn@canonical.com>
      Cc: Daniel Lezcano <daniel.lezcano@free.fr>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Solar Designer <solar@openwall.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b34a6b1d
    • J
      ipc/mqueue.c: fix mq_open() return value · d40dcdb0
      Jiri Slaby 提交于
      We return ENOMEM from mqueue_get_inode even when we have enough memory.
      Namely in case the system rlimit of mqueue was reached.  This error
      propagates to mq_queue and user sees the error unexpectedly.  So fix
      this up to properly return EMFILE as described in the manpage:
      
      	EMFILE The process already has the maximum number of files and
      	       message queues open.
      
      instead of:
      
      	ENOMEM Insufficient memory.
      
      With the previous patch we just switch to ERR_PTR/PTR_ERR/IS_ERR error
      handling here.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d40dcdb0
    • J
      ipc/mqueue.c: refactor failure handling · 04715206
      Jiri Slaby 提交于
      If new_inode fails to allocate an inode we need only to return with
      NULL.  But now we test the opposite and have all the work in a nested
      block.  So do the opposite to save one indentation level (and remove
      unnecessary line breaks).
      
      This is only a preparation/cleanup for the next patch where we fix up
      return values from mqueue_get_inode.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04715206
  19. 26 7月, 2011 1 次提交
  20. 21 7月, 2011 3 次提交
  21. 27 5月, 2011 1 次提交
  22. 11 5月, 2011 1 次提交
  23. 31 3月, 2011 1 次提交
  24. 28 3月, 2011 1 次提交
  25. 26 3月, 2011 1 次提交
  26. 24 3月, 2011 1 次提交