1. 07 5月, 2009 1 次提交
  2. 03 4月, 2009 1 次提交
    • B
      memcg: show memcg information during OOM · e222432b
      Balbir Singh 提交于
      Add RSS and swap to OOM output from memcg
      
      Display memcg values like failcnt, usage and limit when an OOM occurs due
      to memcg.
      
      Thanks to Johannes Weiner, Li Zefan, David Rientjes, Kamezawa Hiroyuki,
      Daisuke Nishimura and KOSAKI Motohiro for review.
      
      Sample output
      -------------
      
      Task in /a/x killed as a result of limit of /a
      memory: usage 1048576kB, limit 1048576kB, failcnt 4183
      memory+swap: usage 1400964akB, limit 9007199254740991kB, failcnt 0
      
      [akpm@linux-foundation.org: compilation fix]
      [akpm@linux-foundation.org: fix kerneldoc and whitespace]
      [akpm@linux-foundation.org: add printk facility level]
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e222432b
  3. 01 4月, 2009 1 次提交
  4. 09 1月, 2009 2 次提交
  5. 07 1月, 2009 3 次提交
  6. 14 11月, 2008 2 次提交
  7. 11 11月, 2008 1 次提交
  8. 07 11月, 2008 2 次提交
  9. 14 8月, 2008 1 次提交
    • D
      security: Fix setting of PF_SUPERPRIV by __capable() · 5cd9c58f
      David Howells 提交于
      Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags
      the target process if that is not the current process and it is trying to
      change its own flags in a different way at the same time.
      
      __capable() is using neither atomic ops nor locking to protect t->flags.  This
      patch removes __capable() and introduces has_capability() that doesn't set
      PF_SUPERPRIV on the process being queried.
      
      This patch further splits security_ptrace() in two:
      
       (1) security_ptrace_may_access().  This passes judgement on whether one
           process may access another only (PTRACE_MODE_ATTACH for ptrace() and
           PTRACE_MODE_READ for /proc), and takes a pointer to the child process.
           current is the parent.
      
       (2) security_ptrace_traceme().  This passes judgement on PTRACE_TRACEME only,
           and takes only a pointer to the parent process.  current is the child.
      
           In Smack and commoncap, this uses has_capability() to determine whether
           the parent will be permitted to use PTRACE_ATTACH if normal checks fail.
           This does not set PF_SUPERPRIV.
      
      Two of the instances of __capable() actually only act on current, and so have
      been changed to calls to capable().
      
      Of the places that were using __capable():
      
       (1) The OOM killer calls __capable() thrice when weighing the killability of a
           process.  All of these now use has_capability().
      
       (2) cap_ptrace() and smack_ptrace() were using __capable() to check to see
           whether the parent was allowed to trace any process.  As mentioned above,
           these have been split.  For PTRACE_ATTACH and /proc, capable() is now
           used, and for PTRACE_TRACEME, has_capability() is used.
      
       (3) cap_safe_nice() only ever saw current, so now uses capable().
      
       (4) smack_setprocattr() rejected accesses to tasks other than current just
           after calling __capable(), so the order of these two tests have been
           switched and capable() is used instead.
      
       (5) In smack_file_send_sigiotask(), we need to allow privileged processes to
           receive SIGIO on files they're manipulating.
      
       (6) In smack_task_wait(), we let a process wait for a privileged process,
           whether or not the process doing the waiting is privileged.
      
      I've tested this with the LTP SELinux and syscalls testscripts.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
      Acked-by: NAndrew G. Morgan <morgan@kernel.org>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      5cd9c58f
  10. 28 4月, 2008 3 次提交
  11. 16 4月, 2008 1 次提交
    • L
      memcg: fix oops in oom handling · e115f2d8
      Li Zefan 提交于
      When I used a test program to fork mass processes and immediately move them to
      a cgroup where the memory limit is low enough to trigger oom kill, I got oops:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000808
      IP: [<ffffffff8045c47f>] _spin_lock_irqsave+0x8/0x18
      PGD 4c95f067 PUD 4406c067 PMD 0
      Oops: 0002 [1] SMP
      CPU 2
      Modules linked in:
      
      Pid: 11973, comm: a.out Not tainted 2.6.25-rc7 #5
      RIP: 0010:[<ffffffff8045c47f>]  [<ffffffff8045c47f>] _spin_lock_irqsave+0x8/0x18
      RSP: 0018:ffff8100448c7c30  EFLAGS: 00010002
      RAX: 0000000000000202 RBX: 0000000000000009 RCX: 000000000001c9f3
      RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000808
      RBP: ffff81007e444080 R08: 0000000000000000 R09: ffff8100448c7900
      R10: ffff81000105f480 R11: 00000100ffffffff R12: ffff810067c84140
      R13: 0000000000000001 R14: ffff8100441d0018 R15: ffff81007da56200
      FS:  00007f70eb1856f0(0000) GS:ffff81007fbad3c0(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000808 CR3: 000000004498a000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process a.out (pid: 11973, threadinfo ffff8100448c6000, task ffff81007da533e0)
      Stack:  ffffffff8023ef5a 00000000000000d0 ffffffff80548dc0 00000000000000d0
       ffff810067c84140 ffff81007e444080 ffffffff8026cef9 00000000000000d0
       ffff8100441d0000 00000000000000d0 ffff8100441d0000 ffff8100505445c0
      Call Trace:
       [<ffffffff8023ef5a>] ? force_sig_info+0x25/0xb9
       [<ffffffff8026cef9>] ? oom_kill_task+0x77/0xe2
       [<ffffffff8026d696>] ? mem_cgroup_out_of_memory+0x55/0x67
       [<ffffffff802910ad>] ? mem_cgroup_charge_common+0xec/0x202
       [<ffffffff8027997b>] ? handle_mm_fault+0x24e/0x77f
       [<ffffffff8022c4af>] ? default_wake_function+0x0/0xe
       [<ffffffff8027a17a>] ? get_user_pages+0x2ce/0x3af
       [<ffffffff80290fee>] ? mem_cgroup_charge_common+0x2d/0x202
       [<ffffffff8027a441>] ? make_pages_present+0x8e/0xa4
       [<ffffffff8027d1ab>] ? mmap_region+0x373/0x429
       [<ffffffff8027d7eb>] ? do_mmap_pgoff+0x2ff/0x364
       [<ffffffff80210471>] ? sys_mmap+0xe5/0x111
       [<ffffffff8020bfc9>] ? tracesys+0xdc/0xe1
      
      Code: 00 00 01 48 8b 3c 24 e9 46 d4 dd ff f0 ff 07 48 8b 3c 24 e9 3a d4 dd ff fe 07 48 8b 3c 24 e9 2f d4 dd ff 9c 58 fa ba 00 01 00 00 <f0> 66 0f c1 17 38 f2 74 06 f3 90 8a 17 eb f6 c3 fa b8 00 01 00
      RIP  [<ffffffff8045c47f>] _spin_lock_irqsave+0x8/0x18
       RSP <ffff8100448c7c30>
      CR2: 0000000000000808
      ---[ end trace c3702fa668021ea4 ]---
      
      It's reproducable in a x86_64 box, but doesn't happen in x86_32.
      
      This is because tsk->sighand is not guarded by RCU, so we have to
      hold tasklist_lock, just as what out_of_memory() does.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: David Rientjes <rientjes@cs.washington.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e115f2d8
  12. 20 3月, 2008 1 次提交
  13. 05 3月, 2008 1 次提交
  14. 08 2月, 2008 3 次提交
    • D
      oom: add sysctl to enable task memory dump · fef1bdd6
      David Rientjes 提交于
      Adds a new sysctl, 'oom_dump_tasks', that enables the kernel to produce a
      dump of all system tasks (excluding kernel threads) when performing an
      OOM-killing.  Information includes pid, uid, tgid, vm size, rss, cpu,
      oom_adj score, and name.
      
      This is helpful for determining why there was an OOM condition and which
      rogue task caused it.
      
      It is configurable so that large systems, such as those with several
      thousand tasks, do not incur a performance penalty associated with dumping
      data they may not desire.
      
      If an OOM was triggered as a result of a memory controller, the tasklist
      shall be filtered to exclude tasks that are not a member of the same
      cgroup.
      
      Cc: Andrea Arcangeli <andrea@suse.de>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fef1bdd6
    • D
      memcontrol: move oom task exclusion to tasklist scan · 4c4a2214
      David Rientjes 提交于
      Creates a helper function to return non-zero if a task is a member of a
      memory controller:
      
      	int task_in_mem_cgroup(const struct task_struct *task,
      			       const struct mem_cgroup *mem);
      
      When the OOM killer is constrained by the memory controller, the exclusion
      of tasks that are not a member of that controller was previously misplaced
      and appeared in the badness scoring function.  It should be excluded
      during the tasklist scan in select_bad_process() instead.
      
      [akpm@linux-foundation.org: build fix]
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4c4a2214
    • P
      Memory controller: OOM handling · c7ba5c9e
      Pavel Emelianov 提交于
      Out of memory handling for cgroups over their limit. A task from the
      cgroup over limit is chosen using the existing OOM logic and killed.
      
      TODO:
      1. As discussed in the OLS BOF session, consider implementing a user
      space policy for OOM handling.
      
      [akpm@linux-foundation.org: fix build due to oom-killer changes]
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7ba5c9e
  15. 06 2月, 2008 2 次提交
    • S
      oom_kill: remove uid==0 checks · 97829955
      Serge E. Hallyn 提交于
      Root processes are considered more important when out of memory and killing
      proceses.  The check for CAP_SYS_ADMIN was augmented with a check for
      uid==0 or euid==0.
      
      There are several possible ways to look at this:
      
      	1. uid comparisons are unnecessary, trust CAP_SYS_ADMIN
      	   alone.  However CAP_SYS_RESOURCE is the one that really
      	   means "give me extra resources" so allow for that as
      	   well.
      	2. Any privileged code should be protected, but uid is not
      	   an indication of privilege.  So we should check whether
      	   any capabilities are raised.
      	3. uid==0 makes processes on the host as well as in containers
      	   more important, so we should keep the existing checks.
      	4. uid==0 makes processes only on the host more important,
      	   even without any capabilities.  So we should be keeping
      	   the (uid==0||euid==0) check but only when
      	   userns==&init_user_ns.
      
      I'm following number 1 here.
      Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Andrew Morgan <morgan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97829955
    • A
      Add 64-bit capability support to the kernel · e338d263
      Andrew Morgan 提交于
      The patch supports legacy (32-bit) capability userspace, and where possible
      translates 32-bit capabilities to/from userspace and the VFS to 64-bit
      kernel space capabilities.  If a capability set cannot be compressed into
      32-bits for consumption by user space, the system call fails, with -ERANGE.
      
      FWIW libcap-2.00 supports this change (and earlier capability formats)
      
       http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/
      
      [akpm@linux-foundation.org: coding-syle fixes]
      [akpm@linux-foundation.org: use get_task_comm()]
      [ezk@cs.sunysb.edu: build fix]
      [akpm@linux-foundation.org: do not initialise statics to 0 or NULL]
      [akpm@linux-foundation.org: unused var]
      [serue@us.ibm.com: export __cap_ symbols]
      Signed-off-by: NAndrew G. Morgan <morgan@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: NErez Zadok <ezk@cs.sunysb.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e338d263
  16. 26 1月, 2008 1 次提交
  17. 21 10月, 2007 1 次提交
  18. 20 10月, 2007 4 次提交
  19. 17 10月, 2007 8 次提交
  20. 01 8月, 2007 1 次提交