提交 · 6eaf806a223e61dc5f2de4ab591f11beb97a8f3b · openanolis / cloud-kernel

01 8月, 2007 1 次提交

oom: print points as unsigned long · a5e58a61

由 David Rientjes 提交于 7月 31, 2007

In badness(), the automatic variable 'points' is unsigned long.  Print it
as such.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5e58a61

30 7月, 2007 1 次提交

Remove fs.h from mm.h · 4e950f6f

由 Alexey Dobriyan 提交于 7月 30, 2007

Remove fs.h from mm.h. For this,
1) Uninline vma_wants_writenotify(). It's pretty huge anyway.
2) Add back fs.h or less bloated headers (err.h) to files that need it.

As result, on x86_64 allyesconfig, fs.h dependencies cut down from 3929 files
rebuilt down to 3444 (-12.3%).

Cross-compile tested without regressions on my two usual configs and (sigh):

alpha arm-mx1ads mips-bigsur powerpc-ebony
alpha-allnoconfig arm-neponset mips-capcella powerpc-g5
alpha-defconfig arm-netwinder mips-cobalt powerpc-holly
alpha-up arm-netx mips-db1000 powerpc-iseries
arm arm-ns9xxx mips-db1100 powerpc-linkstation
arm-assabet arm-omap_h2_1610 mips-db1200 powerpc-lite5200
arm-at91rm9200dk arm-onearm mips-db1500 powerpc-maple
arm-at91rm9200ek arm-picotux200 mips-db1550 powerpc-mpc7448_hpc2
arm-at91sam9260ek arm-pleb mips-ddb5477 powerpc-mpc8272_ads
arm-at91sam9261ek arm-pnx4008 mips-decstation powerpc-mpc8313_rdb
arm-at91sam9263ek arm-pxa255-idp mips-e55 powerpc-mpc832x_mds
arm-at91sam9rlek arm-realview mips-emma2rh powerpc-mpc832x_rdb
arm-ateb9200 arm-realview-smp mips-excite powerpc-mpc834x_itx
arm-badge4 arm-rpc mips-fulong powerpc-mpc834x_itxgp
arm-carmeva arm-s3c2410 mips-ip22 powerpc-mpc834x_mds
arm-cerfcube arm-shannon mips-ip27 powerpc-mpc836x_mds
arm-clps7500 arm-shark mips-ip32 powerpc-mpc8540_ads
arm-collie arm-simpad mips-jazz powerpc-mpc8544_ds
arm-corgi arm-spitz mips-jmr3927 powerpc-mpc8560_ads
arm-csb337 arm-trizeps4 mips-malta powerpc-mpc8568mds
arm-csb637 arm-versatile mips-mipssim powerpc-mpc85xx_cds
arm-ebsa110 i386 mips-mpc30x powerpc-mpc8641_hpcn
arm-edb7211 i386-allnoconfig mips-msp71xx powerpc-mpc866_ads
arm-em_x270 i386-defconfig mips-ocelot powerpc-mpc885_ads
arm-ep93xx i386-up mips-pb1100 powerpc-pasemi
arm-footbridge ia64 mips-pb1500 powerpc-pmac32
arm-fortunet ia64-allnoconfig mips-pb1550 powerpc-ppc64
arm-h3600 ia64-bigsur mips-pnx8550-jbs powerpc-prpmc2800
arm-h7201 ia64-defconfig mips-pnx8550-stb810 powerpc-ps3
arm-h7202 ia64-gensparse mips-qemu powerpc-pseries
arm-hackkit ia64-sim mips-rbhma4200 powerpc-up
arm-integrator ia64-sn2 mips-rbhma4500 s390
arm-iop13xx ia64-tiger mips-rm200 s390-allnoconfig
arm-iop32x ia64-up mips-sb1250-swarm s390-defconfig
arm-iop33x ia64-zx1 mips-sead s390-up
arm-ixp2000 m68k mips-tb0219 sparc
arm-ixp23xx m68k-amiga mips-tb0226 sparc-allnoconfig
arm-ixp4xx m68k-apollo mips-tb0287 sparc-defconfig
arm-jornada720 m68k-atari mips-workpad sparc-up
arm-kafa m68k-bvme6000 mips-wrppmc sparc64
arm-kb9202 m68k-hp300 mips-yosemite sparc64-allnoconfig
arm-ks8695 m68k-mac parisc sparc64-defconfig
arm-lart m68k-mvme147 parisc-allnoconfig sparc64-up
arm-lpd270 m68k-mvme16x parisc-defconfig um-x86_64
arm-lpd7a400 m68k-q40 parisc-up x86_64
arm-lpd7a404 m68k-sun3 powerpc x86_64-allnoconfig
arm-lubbock m68k-sun3x powerpc-cell x86_64-defconfig
arm-lusl7200 mips powerpc-celleb x86_64-up
arm-mainstone mips-atlas powerpc-chrp32
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4e950f6f

08 5月, 2007 3 次提交

oom: fix constraint deadlock · 2b45ab33

由 David Rientjes 提交于 5月 06, 2007

Fixes a deadlock in the OOM killer for allocations that are not
__GFP_HARDWALL.

Before the OOM killer checks for the allocation constraint, it takes
callback_mutex.

constrained_alloc() iterates through each zone in the allocation zonelist
and calls cpuset_zone_allowed_softwall() to determine whether an allocation
for gfp_mask is possible.  If a zone's node is not in the OOM-triggering
task's mems_allowed, it is not exiting, and we did not fail on a
__GFP_HARDWALL allocation, cpuset_zone_allowed_softwall() attempts to take
callback_mutex to check the nearest exclusive ancestor of current's cpuset.
 This results in deadlock.

We now take callback_mutex after iterating through the zonelist since we
don't need it yet.

Cc: Andi Kleen <ak@suse.de>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Martin J. Bligh <mbligh@mbligh.org>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2b45ab33

mm: fix handling of panic_on_oom when cpusets are in use · 2b744c01

由 Yasunori Goto 提交于 5月 06, 2007

The current panic_on_oom may not work if there is a process using
cpusets/mempolicy, because other nodes' memory may remain.  But some people
want failover by panic ASAP even if they are used.  This patch makes new
setting for its request.

This is tested on my ia64 box which has 3 nodes.
Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Ethan Solomita <solo@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2b744c01

allow oom_adj of saintly processes · 9a82782f

由 Joshua N Pritikin 提交于 5月 06, 2007

If the badness of a process is zero then oom_adj>0 has no effect.  This
patch makes sure that the oom_adj shift actually increases badness points
appropriately.
Signed-off-by: NJoshua N. Pritikin <jpritikin@pobox.com>
Cc: Andrea Arcangeli <andrea@novell.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9a82782f

24 4月, 2007 2 次提交

fix OOM killing processes wrongly thought MPOL_BIND · 3d124cbb

由 Hugh Dickins 提交于 4月 23, 2007

I only have CONFIG_NUMA=y for build testing: surprised when trying a memhog
to see lots of other processes killed with "No available memory
(MPOL_BIND)".  memhog is killed correctly once we initialize nodemask in
constrained_alloc().
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Acked-by: NChristoph Lameter <clameter@sgi.com>
Acked-by: NWilliam Irwin <bill.irwin@oracle.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3d124cbb

oom: kill all threads that share mm with killed task · 650a7c97

由 David Rientjes 提交于 4月 23, 2007

oom_kill_task() calls __oom_kill_task() to OOM kill a selected task.
When finding other threads that share an mm with that task, we need to
kill those individual threads and not the same one.

(Bug introduced by f2a2a710)
Acked-by: NWilliam Irwin <bill.irwin@oracle.com>
Acked-by: NChristoph Lameter <clameter@engr.sgi.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

650a7c97

17 3月, 2007 1 次提交

[PATCH] oom fix: prevent oom from killing a process with children/sibling unkillable · 35ae834f

由 Ankita Garg 提交于 3月 16, 2007

Looking at oom_kill.c, found that the intention to not kill the selected
process if any of its children/siblings has OOM_DISABLE set, is not being
met.
Signed-off-by: NAnkita Garg <ankita@in.ibm.com>
Acked-by: NNick Piggin <npiggin@suse.de>
Acked-by: NWilliam Irwin <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

35ae834f

06 1月, 2007 1 次提交

[PATCH] fix OOM killing of swapoff · 7ba34859

由 Hugh Dickins 提交于 1月 05, 2007

These days, if you swapoff when there isn't enough memory, OOM killer gives
"BUG: scheduling while atomic" and the machine hangs: badness() needs to do
its PF_SWAPOFF return after the task_unlock (tasklist_lock is also held
here, so p isn't going to be freed: PF_SWAPOFF might get turned off at any
moment, but that doesn't really matter).
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7ba34859

31 12月, 2006 1 次提交

[PATCH] fix oom killer kills current every time if there is memory-less-node take2 · 96ac5913

由 KAMEZAWA Hiroyuki 提交于 12月 29, 2006

constrained_alloc(), which is called to detect where oom is from, checks
passed zone_list().  If zone_list doesn't include all nodes, it thinks oom
is from mempolicy.

But there is memory-less-node.  memory-less-node's zones are never included
in zonelist[].

contstrained_alloc() should get memory_less_node into count.  Otherwise, it
always thinks 'oom is from mempolicy'.  This means that current process
dies at any time.  This patch fix it.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

96ac5913

14 12月, 2006 1 次提交

[PATCH] cpuset: rework cpuset_zone_allowed api · 02a0e53d

由 Paul Jackson 提交于 12月 13, 2006

Elaborate the API for calling cpuset_zone_allowed(), so that users have to
explicitly choose between the two variants:

  cpuset_zone_allowed_hardwall()
  cpuset_zone_allowed_softwall()

Until now, whether or not you got the hardwall flavor depended solely on
whether or not you or'd in the __GFP_HARDWALL gfp flag to the gfp_mask
argument.

If you didn't specify __GFP_HARDWALL, you implicitly got the softwall
version.

Unfortunately, this meant that users would end up with the softwall version
without thinking about it.  Since only the softwall version might sleep,
this led to bugs with possible sleeping in interrupt context on more than
one occassion.

The hardwall version requires that the current tasks mems_allowed allows
the node of the specified zone (or that you're in interrupt or that
__GFP_THISNODE is set or that you're on a one cpuset system.)

The softwall version, depending on the gfp_mask, might allow a node if it
was allowed in the nearest enclusing cpuset marked mem_exclusive (which
requires taking the cpuset lock 'callback_mutex' to evaluate.)

This patch removes the cpuset_zone_allowed() call, and forces the caller to
explicitly choose between the hardwall and the softwall case.

If the caller wants the gfp_mask to determine this choice, they should (1)
be sure they can sleep or that __GFP_HARDWALL is set, and (2) invoke the
cpuset_zone_allowed_softwall() routine.

This adds another 100 or 200 bytes to the kernel text space, due to the few
lines of nearly duplicate code at the top of both cpuset_zone_allowed_*
routines.  It should save a few instructions executed for the calls that
turned into calls of cpuset_zone_allowed_hardwall, thanks to not having to
set (before the call) then check (within the call) the __GFP_HARDWALL flag.

For the most critical call, from get_page_from_freelist(), the same
instructions are executed as before -- the old cpuset_zone_allowed()
routine it used to call is the same code as the
cpuset_zone_allowed_softwall() routine that it calls now.

Not a perfect win, but seems worth it, to reduce this chance of hitting a
sleeping with irq off complaint again.
Signed-off-by: NPaul Jackson <pj@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

02a0e53d

08 12月, 2006 3 次提交

[PATCH] oom: less memdie · f2a2a710

由 Nick Piggin 提交于 12月 06, 2006

Don't cause all threads in all other thread groups to gain TIF_MEMDIE
otherwise we'll get a thundering herd eating our memory reserve.  This may not
be the optimal scheme, but it fits our policy of allowing just one TIF_MEMDIE
in the system at once.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f2a2a710

[PATCH] oom: cleanup messages · f3af38d3

由 Nick Piggin 提交于 12月 06, 2006

Clean up the OOM killer messages to be more consistent.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f3af38d3

[PATCH] oom: don't kill unkillable children or siblings · c33e0fca

由 Nick Piggin 提交于 12月 06, 2006

Abort the kill if any of our threads have OOM_DISABLE set.  Having this
test here also prevents any OOM_DISABLE child of the "selected" process
from being killed.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c33e0fca

21 10月, 2006 1 次提交

[PATCH] OOM killer meets userspace headers · 8ac773b4

由 Alexey Dobriyan 提交于 10月 19, 2006

Despite mm.h is not being exported header, it does contain one thing
which is part of userspace ABI -- value disabling OOM killer for given
process. So,
a) create and export include/linux/oom.h
b) move OOM_DISABLE define there.
c) turn bounding values of /proc/$PID/oom_adj into defines and export
   them too.

Note: mass __KERNEL__ removal will be done later.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8ac773b4

30 9月, 2006 7 次提交

[PATCH] oom: don't kill current when another OOM in progress · b78483a4

由 Nick Piggin 提交于 9月 29, 2006

A previous patch to allow an exiting task to OOM kill itself (and thereby
avoid a little deadlock) introduced a problem.  We don't want the
PF_EXITING task, even if it is 'current', to access mem reserves if there
is already a TIF_MEMDIE process in the system sucking up reserves.

Also make the commenting a little bit clearer, and note that our current
scheme of effectively single threading the OOM killer is not itself
perfect.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b78483a4

[PATCH] oom_kill_task(): cleanup ->mm checks · 01017a22

由 Oleg Nesterov 提交于 9月 29, 2006

- It is not possible to have task->mm == &init_mm.

- task_lock() buys nothing for 'if (!p->mm)' check.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

01017a22

[PATCH] select_bad_process(): cleanup 'releasing' check · 972c4ea5

由 Oleg Nesterov 提交于 9月 29, 2006

No logic changes, but imho easier to read.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Acked-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

972c4ea5

[PATCH] select_bad_process(): kill a bogus PF_DEAD/TASK_DEAD check · 28324d1d

由 Oleg Nesterov 提交于 9月 29, 2006

The only one usage of TASK_DEAD outside of last schedule path,

select_bad_process:

	for_each_task(p) {

		if (!p->mm)
			continue;
		...
			if (p->state == TASK_DEAD)
				continue;
		...

TASK_DEAD state is set at the end of do_exit(), this means that p->mm
was already set == NULL by exit_mm(), so this task was already rejected
by 'if (!p->mm)' above.

Note also that the caller holds tasklist_lock, this means that p can't
pass exit_notify() and then set TASK_DEAD when p->mm != NULL.

Also, remove open-coded is_init().
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

28324d1d

[PATCH] introduce TASK_DEAD state · c394cc9f

由 Oleg Nesterov 提交于 9月 29, 2006

I am not sure about this patch, I am asking Ingo to take a decision.

task_struct->state == EXIT_DEAD is a very special case, to avoid a confusion
it makes sense to introduce a new state, TASK_DEAD, while EXIT_DEAD should
live only in ->exit_state as documented in sched.h.

Note that this state is not visible to user-space, get_task_state() masks off
unsuitable states.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c394cc9f

[PATCH] kill PF_DEAD flag · 55a101f8

由 Oleg Nesterov 提交于 9月 29, 2006

After the previous change (->flags & PF_DEAD) <=> (->state == EXIT_DEAD), we
don't need PF_DEAD any longer.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

55a101f8

[PATCH] pidspace: is_init() · f400e198

由 Sukadev Bhattiprolu 提交于 9月 29, 2006

This is an updated version of Eric Biederman's is_init() patch.
(http://lkml.org/lkml/2006/2/6/280).  It applies cleanly to 2.6.18-rc3 and
replaces a few more instances of ->pid == 1 with is_init().

Further, is_init() checks pid and thus removes dependency on Eric's other
patches for now.

Eric's original description:

	There are a lot of places in the kernel where we test for init
	because we give it special properties.  Most  significantly init
	must not die.  This results in code all over the kernel test
	->pid == 1.

	Introduce is_init to capture this case.

	With multiple pid spaces for all of the cases affected we are
	looking for only the first process on the system, not some other
	process that has pid == 1.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: <lxc-devel@lists.sourceforge.net>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f400e198

26 9月, 2006 9 次提交

[PATCH] NUMA: Add zone_to_nid function · 89fa3024

由 Christoph Lameter 提交于 9月 25, 2006

There are many places where we need to determine the node of a zone.
Currently we use a difficult to read sequence of pointer dereferencing.
Put that into an inline function and use throughout VM.  Maybe we can find
a way to optimize the lookup in the future.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

89fa3024

[PATCH] oom-kill: update comments to reflect current code · 5a291b98

由 Ram Gupta 提交于 9月 25, 2006

Update the comments for __oom_kill_task() to reflect the code changes.
Signed-off-by: NRam Gupta <r.gupta@astronautics.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5a291b98

[PATCH] oom: more printk · b72f1604

由 Nick Piggin 提交于 9月 25, 2006

Print the name of the task invoking the OOM killer.  Could make debugging
easier.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b72f1604

[PATCH] oom: kthread infinite loop fix · 5081dde3

由 Nick Piggin 提交于 9月 25, 2006

Skip kernel threads, rather than having them return 0 from badness.
Theoretically, badness might truncate all results to 0, thus a kernel thread
might be picked first, causing an infinite loop.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5081dde3

[PATCH] oom: swapoff tasks tweak · af5b9124

由 Nick Piggin 提交于 9月 25, 2006

PF_SWAPOFF processes currently cause select_bad_process to return straight
away.  Instead, give them high priority, so we will kill them first, however
we also first ensure no parallel OOM kills are happening at the same time.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

af5b9124

[PATCH] oom: handle oom_disable exiting · 4a3ede10

由 Nick Piggin 提交于 9月 25, 2006

Having the oomkilladj == OOM_DISABLE check before the releasing check means
that oomkilladj == OOM_DISABLE tasks exiting will not stop the OOM killer.

Moving the test down will give the desired behaviour. Also: it will allow
them to "OOM-kill" themselves if they are exiting. As per the previous patch,
this is required to prevent OOM killer deadlocks (and they don't actually get
killed, because they're already exiting -- they're simply allowed access to
memory reserves).
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4a3ede10

[PATCH] oom: handle current exiting · 50ec3bbf

由 Nick Piggin 提交于 9月 25, 2006

If current *is* exiting, it should actually be allowed to access reserved
memory rather than OOM kill something else.  Can't do this via a straight
check in page_alloc.c because that would allow multiple tasks to use up
reserves.  Instead cause current to OOM-kill itself which will mark it as
TIF_MEMDIE.

The current procedure of simply aborting the OOM-kill if a task is exiting can
lead to OOM deadlocks.

In the case of killing a PF_EXITING task, don't make a lot of noise about it.
This becomes more important in future patches, where we can "kill" OOM_DISABLE
tasks.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

50ec3bbf

[PATCH] oom: cpuset hint · 7887a3da

由 Nick Piggin 提交于 9月 25, 2006

cpuset_excl_nodes_overlap does not always indicate that killing a task will
not free any memory we for us.  For example, we may be asking for an
allocation from _anywhere_ in the machine, or the task in question may be
pinning memory that is outside its cpuset.  Fix this by just causing
cpuset_excl_nodes_overlap to reduce the badness rather than disallow it.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Acked-by: NPaul Jackson <pj@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7887a3da

[PATCH] out of memory notifier · 8bc719d3

由 Martin Schwidefsky 提交于 9月 25, 2006

Add a notifer chain to the out of memory killer.  If one of the registered
callbacks could release some memory, do not kill the process but return and
retry the allocation that forced the oom killer to run.

The purpose of the notifier is to add a safety net in the presence of
memory ballooners.  If the resource manager inflated the balloon to a size
where memory allocations can not be satisfied anymore, it is better to
deflate the balloon a bit instead of killing processes.

The implementation for the s390 ballooner is included.

[akpm@osdl.org: cleanups]
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8bc719d3

04 7月, 2006 1 次提交

[PATCH] sched: cleanup, remove task_t, convert to struct task_struct · 36c8b586

由 Ingo Molnar 提交于 7月 03, 2006

cleanup: remove task_t and convert all the uses to struct task_struct. I
introduced it for the scheduler anno and it was a mistake.

Conversion was mostly scripted, the result was reviewed and all
secondary whitespace and style impact (if any) was fixed up by hand.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

36c8b586

23 6月, 2006 2 次提交

[PATCH] mm: fix typos in comments in mm/oom_kill.c · 6937a25c

由 Dave Peterson 提交于 6月 23, 2006

This fixes a few typos in the comments in mm/oom_kill.c.
Signed-off-by: NDavid S. Peterson <dsp@llnl.gov>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6937a25c

[PATCH] support for panic at OOM · fadd8fbd

由 KAMEZAWA Hiroyuki 提交于 6月 23, 2006

This patch adds panic_on_oom sysctl under sys.vm.

When sysctl vm.panic_on_oom = 1, the kernel panics intead of killing rogue
processes.  And if vm.panic_on_oom is 0 the kernel will do oom_kill() in
the same way as it does today.  Of course, the default value is 0 and only
root can modifies it.

In general, oom_killer works well and kill rogue processes.  So the whole
system can survive.  But there are environments where panic is preferable
rather than kill some processes.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

fadd8fbd

20 4月, 2006 2 次提交

[PATCH] mm: fix mm_struct reference counting bugs in mm/oom_kill.c · 01315922

由 Dave Peterson 提交于 4月 18, 2006

Fix oom_kill_task() so it doesn't call mmput() (which may sleep) while
holding tasklist_lock.
Signed-off-by: NDavid S. Peterson <dsp@llnl.gov>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

01315922

[PATCH] oom-kill: mm locking fix · 97c2c9b8

由 Andrew Morton 提交于 4月 18, 2006

Dave Peterson <dsp@llnl.gov> points out that badness() is playing with
mm_structs without taking a reference on them.

mmput() can sleep, so taking a reference here (inside tasklist_lock) is
hard.  Fix it up via task_lock() instead.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

97c2c9b8

03 3月, 2006 1 次提交

[PATCH] out_of_memory() locking fix · 140ffcec

由 Andrew Morton 提交于 3月 02, 2006

I seem to have lost this read_unlock().

While we're there, let's turn that interruptible sleep unto uninterruptible,
so we don't get a busywait if signal_pending().  (Again.  We seem to have a
habit of doing this).
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

140ffcec

01 3月, 2006 1 次提交

[PATCH] out_of_memory(): use of uninitialised · d6713e04

由 Andrew Morton 提交于 2月 28, 2006

Under some circumstances `points' can get printed before it's initialised.
Spotted by Carlos Martin <carlos@cmartin.tk>.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d6713e04

21 2月, 2006 2 次提交

[PATCH] Terminate process that fails on a constrained allocation · 9b0f8b04

由 Christoph Lameter 提交于 2月 20, 2006

Some allocations are restricted to a limited set of nodes (due to memory
policies or cpuset constraints).  If the page allocator is not able to find
enough memory then that does not mean that overall system memory is low.

In particular going postal and more or less randomly shooting at processes
is not likely going to help the situation but may just lead to suicide (the
whole system coming down).

It is better to signal to the process that no memory exists given the
constraints that the process (or the configuration of the process) has
placed on the allocation behavior.  The process may be killed but then the
sysadmin or developer can investigate the situation.  The solution is
similar to what we do when running out of hugepages.

This patch adds a check before we kill processes.  At that point
performance considerations do not matter much so we just scan the zonelist
and reconstruct a list of nodes.  If the list of nodes does not contain all
online nodes then this is a constrained allocation and we should kill the
current process.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9b0f8b04

[PATCH] OOM kill: children accounting · 9827b781

由 Kurt Garloff 提交于 2月 20, 2006

In the badness() calculation, there's currently this piece of code:

        /*
         * Processes which fork a lot of child processes are likely
         * a good choice. We add the vmsize of the children if they
         * have an own mm. This prevents forking servers to flood the
         * machine with an endless amount of children
         */
        list_for_each(tsk, &p->children) {
                struct task_struct *chld;
                chld = list_entry(tsk, struct task_struct, sibling);
                if (chld->mm = p->mm && chld->mm)
                        points += chld->mm->total_vm;
        }

The intention is clear: If some server (apache) keeps spawning new children
and we run OOM, we want to kill the father rather than picking a child.

This -- to some degree -- also helps a bit with getting fork bombs under
control, though I'd consider this a desirable side-effect rather than a
feature.

There's one problem with this: No matter how many or few children there are,
if just one of them misbehaves, and all others (including the father) do
everything right, we still always kill the whole family.  This hits in real
life; whether it's javascript in konqueror resulting in kdeinit (and thus the
whole KDE session) being hit or just a classical server that spawns children.

Sidenote: The killer does kill all direct children as well, not only the
selected father, see oom_kill_process().

The idea in attached patch is that we do want to account the memory
consumption of the (direct) children to the father -- however not fully.
This maintains the property that fathers with too many children will still
very likely be picked, whereas a single misbehaving child has the chance to
be picked by the OOM killer.

In the patch I account only half (rounded up) of the children's vm_size to
the parent.  This means that if one child eats more mem than the rest of
the family, it will be picked, otherwise it's still the father and thus the
whole family that gets selected.

This is heuristics -- we could debate whether accounting for a fourth would
be better than for half of it.  Or -- if people would consider it worth the
trouble -- make it a sysctl.  For now I sticked to accounting for half,
which should IMHO be a significant improvement.

The patch does one more thing: As users tend to be irritated by the choice
of killed processes (mainly because the children are killed first, despite
some of them having a very low OOM score), I added some more output: The
selected (father) process will be reported first and it's oom_score printed
to syslog.

Description:

Only account for half of children's vm size in oom score calculation

This should still give the parent enough point in case of fork bombs.  If
any child however has more than 50% of the vm size of all children
together, it'll get a higher score and be elected.

This patch also makes the kernel display the oom_score.
Signed-off-by: NKurt Garloff <garloff@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9827b781

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功