1. 13 4月, 2010 4 次提交
    • L
      anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma · ea90002b
      Linus Torvalds 提交于
      Otherwise we might be mapping in a page in a new mapping, but that page
      (through the swapcache) would later be mapped into an old mapping too.
      The page->mapping must be the case that works for everybody, not just
      the mapping that happened to page it in first.
      
      Here's the scenario:
      
       - page gets allocated/mapped by process A. Let's call the anon_vma we
         associate the page with 'A' to keep it easy to track.
      
       - Process A forks, creating process B. The anon_vma in B is 'B', and has
         a chain that looks like 'B' -> 'A'. Everything is fine.
      
       - Swapping happens. The page (with mapping pointing to 'A') gets swapped
         out (perhaps not to disk - it's enough to assume that it's just not
         mapped any more, and lives entirely in the swap-cache)
      
       - Process B pages it in, which goes like this:
      
              do_swap_page ->
                page = lookup_swap_cache(entry);
               ...
                set_pte_at(mm, address, page_table, pte);
                page_add_anon_rmap(page, vma, address);
      
         And think about what happens here!
      
         In particular, what happens is that this will now be the "first"
         mapping of that page, so page_add_anon_rmap() used to do
      
              if (first)
                      __page_set_anon_rmap(page, vma, address);
      
         and notice what anon_vma it will use? It will use the anon_vma for
         process B!
      
         What happens then? Trivial: process 'A' also pages it in (nothing
         happens, it's not the first mapping), and then process 'B' execve's
         or exits or unmaps, making anon_vma B go away.
      
         End result: process A has a page that points to anon_vma B, but
         anon_vma B does not exist any more.  This can go on forever.  Forget
         about RCU grace periods, forget about locking, forget anything like
         that.  The bug is simply that page->mapping points to an anon_vma
         that was correct at one point, but was _not_ the one that was shared
         by all users of that possible mapping.
      
      Changing it to always use the deepest anon_vma in the anonvma chain gets
      us to the safest model.
      
      This can be improved in certain cases: if we know the page is private to
      just this particular mapping (for example, it's a new page, or it is the
      only swapcache entry), we could pick the top (most specific) anon_vma.
      
      But that's a future optimization. Make it _work_ reliably first.
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: Borislav Petkov <bp@alien8.de> [ "What do you know, I think you fixed it!" ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea90002b
    • L
      anon_vma: clone the anon_vma chain in the right order · 646d87b4
      Linus Torvalds 提交于
      We want to walk the chain in reverse order when cloning it, so that the
      order of the result chain will be the same as the order in the source
      chain.  When we add entries to the chain, they go at the head of the
      chain, so we want to add the source head last.
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: Borislav Petkov <bp@alien8.de> [ "No, it still oopses" ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      646d87b4
    • L
      vma_adjust: fix the copying of anon_vma chains · 287d97ac
      Linus Torvalds 提交于
      When we move the boundaries between two vma's due to things like
      mprotect, we need to make sure that the anon_vma of the pages that got
      moved from one vma to another gets properly copied around.  And that was
      not always the case, in this rather hard-to-follow code sequence.
      
      Clarify the code, and fix it so that it copies the anon_vma from the
      right source.
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: Borislav Petkov <bp@alien8.de> [ "Yeah, not so much this one either" ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      287d97ac
    • L
      Simplify and comment on anon_vma re-use for anon_vma_prepare() · d0e9fe17
      Linus Torvalds 提交于
      This changes the anon_vma reuse case to require that we only reuse
      simple anon_vma's - ie the case when the vma only has a single anon_vma
      associated with it.
      
      This means that a reuse of an anon_vma from an adjacent vma will always
      guarantee that both vma's are associated not only with the same
      anon_vma, they will also have the same anon_vma chain (of just a single
      entry in this case).
      
      And since anon_vma re-use was the only case where the same anon_vma
      might be associated with different chains of anon_vma's, we now have the
      case that every vma that shares the same anon_vma will always also have
      the same chain.  That makes it much easier to think about merging vma's
      that share the same anon_vma's: you can always just drop the other
      anon_vma chain in anon_vma_merge() since you know that they are always
      identical.
      
      This also splits up the function to validate the anon_vma re-use, and
      adds a lot of commentary about the possible races.
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: Borislav Petkov <bp@alien8.de> [ "That didn't fix it" ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0e9fe17
  2. 10 4月, 2010 2 次提交
  3. 07 4月, 2010 5 次提交
  4. 06 4月, 2010 1 次提交
  5. 02 4月, 2010 3 次提交
    • A
      backing-dev: Handle class_create() failure · 14421453
      Anton Blanchard 提交于
      I hit this when we had a bug in IDR for a few days. Basically sysfs would
      fail to create new inodes since it uses an IDR and therefore class_create would
      fail.
      
      While we are unlikely to see this fail we may as well handle it instead of
      oopsing.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      14421453
    • Y
      bootmem, x86: Fix 32bit numa system without RAM on node 0 · aa235fc7
      Yinghai Lu 提交于
      When 32bit numa is used, free_all_bootmem() will still only go over with
      node id 0.
      
      If node 0 doesn't have RAM installed, the lowest populated node
      becomes low RAM.
      
      This one fixes BOOTMEM path by iterating over the bdata_list.
      
      -v3: add more comments, and fix bootmem path too.
      -v4: seperate from one big patch
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4BB416D7.6090203@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      aa235fc7
    • Y
      nobootmem, x86: Fix 32bit numa system without RAM on node 0 · 33799858
      Yinghai Lu 提交于
      On one system without RAM on node0, got following boot dump with a 32
      bit NUMA kernel:
      
      early_node_map[4] active PFN ranges
          1: 0x00000010 -> 0x00000099
          1: 0x00000100 -> 0x0007da00
          1: 0x0007e800 -> 0x0007ffa0
          1: 0x0007ffae -> 0x0007ffb0
      ...
      Subtract (29 early reservations)
        #000 [0000001000 - 0000002000]
        #001 [0000089000 - 000008f000]
        #002 [0000091000 - 0000093500]
      ...
        #027 [007cbfef40 - 007e800000]
        #028 [007e9ca000 - 007ff95000]
      (0 free memory ranges)
      Initializing HighMem for node 0 (00000000:00000000)
      Initializing HighMem for node 1 (00000000:00000000)
      Memory: 0k/2096832k available (6662k kernel code, 2096300k reserved, 4829k data, 484k init, 0k highmem)
      ...
      Checking if this processor honours the WP bit even in supervisor mode...Ok.
      swapper: page allocation failure. order:0, mode:0x0
      Pid: 0, comm: swapper Not tainted 2.6.34-rc3-tip-03818-g4b1ea6c-dirty #35
      Call Trace:
       [<4087a5dc>] ? printk+0xf/0x11
       [<40286728>] __alloc_pages_nodemask+0x417/0x487
       [<402a9ce1>] new_slab+0xe2/0x1fe
       [<402aa5b2>] kmem_cache_open+0x185/0x358
       [<402abbc0>] T.954+0x1c/0x60
       [<40d52a29>] kmem_cache_init+0x24/0x113
       [<40d39738>] start_kernel+0x166/0x2e4
       [<40d3940e>] ? unknown_bootoption+0x0/0x18e
       [<40d390ce>] i386_start_kernel+0xce/0xd5
      Mem-Info:
      Node 1 DMA per-cpu:
      CPU    0: hi:    0, btch:   1 usd:   0
      Node 1 Normal per-cpu:
      CPU    0: hi:    0, btch:   1 usd:   0
      active_anon:0 inactive_anon:0 isolated_anon:0
       active_file:0 inactive_file:0 isolated_file:0
       unevictable:0 dirty:0 writeback:0 unstable:0
       free:0 slab_reclaimable:0 slab_unreclaimable:0
       mapped:0 shmem:0 pagetables:0 bounce:0
      
      When 32bit NUMA is used, free_all_bootmem() will still only go over with
      node id 0.
      
      If node 0 doesn't have RAM installed, We need to go with node1
      because early_node_map still use 1 for all ranges, and ram from node1
      become low ram.
      
      Use MAX_NUMNODES like 64-bit NUMA does.
      
      Note: BOOTMEM path has the same problem.
            this bug exist before We have NO_BOOTMEM support.
      
      -v3: add more comments, and fix bootmem path too.
      -v4: seperate bootmem path fix
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4BB41689.9090502@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      33799858
  6. 30 3月, 2010 3 次提交
    • T
      percpu: don't implicitly include slab.h from percpu.h · de380b55
      Tejun Heo 提交于
      percpu.h has always been including slab.h to get k[mz]alloc/free() for
      UP inline implementation.  percpu.h being used by very low level
      headers including module.h and sched.h, this meant that a lot files
      unintentionally got slab.h inclusion.
      
      Lee Schermerhorn was trying to make topology.h use percpu.h and got
      bitten by this implicit inclusion.  The right thing to do is break
      this ultimately unnecessary dependency.  The previous patch added
      explicit inclusion of either gfp.h or slab.h to the source files using
      them.  This patch updates percpu.h such that slab.h is no longer
      included from percpu.h.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      de380b55
    • R
      kmemcheck: Fix build errors due to missing slab.h · ea5a9f0c
      Randy Dunlap 提交于
      mm/kmemcheck.c:69: error: dereferencing pointer to incomplete type
      mm/kmemcheck.c:69: error: 'SLAB_NOTRACK' undeclared (first use in this function)
      mm/kmemcheck.c:82: error: dereferencing pointer to incomplete type
      mm/kmemcheck.c:94: error: dereferencing pointer to incomplete type
      mm/kmemcheck.c:94: error: dereferencing pointer to incomplete type
      mm/kmemcheck.c:94: error: 'SLAB_DESTROY_BY_RCU' undeclared (first use in this function)
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      ea5a9f0c
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  7. 29 3月, 2010 1 次提交
  8. 26 3月, 2010 2 次提交
  9. 25 3月, 2010 11 次提交
  10. 18 3月, 2010 1 次提交
  11. 13 3月, 2010 7 次提交
    • K
      memcg: fix oom kill behavior · 867578cb
      KAMEZAWA Hiroyuki 提交于
      In current page-fault code,
      
      	handle_mm_fault()
      		-> ...
      		-> mem_cgroup_charge()
      		-> map page or handle error.
      	-> check return code.
      
      If page fault's return code is VM_FAULT_OOM, page_fault_out_of_memory() is
      called.  But if it's caused by memcg, OOM should have been already
      invoked.
      
      Then, I added a patch: a636b327.  That
      patch records last_oom_jiffies for memcg's sub-hierarchy and prevents
      page_fault_out_of_memory from being invoked in near future.
      
      But Nishimura-san reported that check by jiffies is not enough when the
      system is terribly heavy.
      
      This patch changes memcg's oom logic as.
       * If memcg causes OOM-kill, continue to retry.
       * remove jiffies check which is used now.
       * add memcg-oom-lock which works like perzone oom lock.
       * If current is killed(as a process), bypass charge.
      
      Something more sophisticated can be added but this pactch does
      fundamental things.
      TODO:
       - add oom notifier
       - add permemcg disable-oom-kill flag and freezer at oom.
       - more chances for wake up oom waiter (when changing memory limit etc..)
      Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Tested-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      867578cb
    • K
      cgroups: remove events before destroying subsystem state objects · a0a4db54
      Kirill A. Shutemov 提交于
      Events should be removed after rmdir of cgroup directory, but before
      destroying subsystem state objects.  Let's take reference to cgroup
      directory dentry to do that.
      Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hioryu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Dan Malek <dan@embeddedalley.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0a4db54
    • K
      memcg: handle panic_on_oom=always case · daaf1e68
      KAMEZAWA Hiroyuki 提交于
      Presently, if panic_on_oom=2, the whole system panics even if the oom
      happend in some special situation (as cpuset, mempolicy....).  Then,
      panic_on_oom=2 means painc_on_oom_always.
      
      Now, memcg doesn't check panic_on_oom flag. This patch adds a check.
      
      BTW, how it's useful ?
      
      kdump+panic_on_oom=2 is the last tool to investigate what happens in
      oom-ed system.  When a task is killed, the sysytem recovers and there will
      be few hint to know what happnes.  In mission critical system, oom should
      never happen.  Then, panic_on_oom=2+kdump is useful to avoid next OOM by
      knowing precise information via snapshot.
      
      TODO:
       - For memcg, it's for isolate system's memory usage, oom-notiifer and
         freeze_at_oom (or rest_at_oom) should be implemented. Then, management
         daemon can do similar jobs (as kdump) or taking snapshot per cgroup.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      daaf1e68
    • K
      memcg : share event counter rather than duplicate · d2265e6f
      KAMEZAWA Hiroyuki 提交于
      Memcg has 2 eventcountes which counts "the same" event.  Just usages are
      different from each other.  This patch tries to reduce event counter.
      
      Now logic uses "only increment, no reset" counter and masks for each
      checks.  Softlimit chesk was done per 1000 evetns.  So, the similar check
      can be done by !(new_counter & 0x3ff).  Threshold check was done per 100
      events.  So, the similar check can be done by (!new_counter & 0x7f)
      
      ALL event checks are done right after EVENT percpu counter is updated.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d2265e6f
    • K
      memcg: update threshold and softlimit at commit · 430e4863
      KAMEZAWA Hiroyuki 提交于
      Presently, move_task does "batched" precharge.  Because res_counter or
      css's refcnt are not-scalable jobs for memcg, try_charge_()..  tend to be
      done in batched manner if allowed.
      
      Now, softlimit and threshold check their event counter in try_charge, but
      the charge is not a per-page event.  And event counter is not updated at
      charge().  Moreover, precharge doesn't pass "page" to try_charge() and
      softlimit tree will be never updated until uncharge() causes an event."
      
      So the best place to check the event counter is commit_charge().  This is
      per-page event by its nature.  This patch move checks to there.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      430e4863
    • K
      memcg: use generic percpu instead of private implementation · c62b1a3b
      KAMEZAWA Hiroyuki 提交于
      When per-cpu counter for memcg was implemneted, dynamic percpu allocator
      was not very good.  But now, we have good one and useful macros.  This
      patch replaces memcg's private percpu counter implementation with generic
      dynamic percpu allocator.
      
      The benefits are
      	- We can remove private implementation.
      	- The counters will be NUMA-aware. (Current one is not...)
      	- This patch makes sizeof struct mem_cgroup smaller. Then,
      	  struct mem_cgroup may be fit in page size on small config.
              - About basic performance aspects, see below.
      
       [Before]
       # size mm/memcontrol.o
         text    data     bss     dec     hex filename
        24373    2528    4132   31033    7939 mm/memcontrol.o
      
       [page-fault-throuput test on 8cpu/SMP in root cgroup]
       # /root/bin/perf stat -a -e page-faults,cache-misses --repeat 5 ./multi-fault-fork 8
      
       Performance counter stats for './multi-fault-fork 8' (5 runs):
      
             45878618  page-faults                ( +-   0.110% )
            602635826  cache-misses               ( +-   0.105% )
      
         61.005373262  seconds time elapsed   ( +-   0.004% )
      
       Then cache-miss/page fault = 13.14
      
       [After]
       #size mm/memcontrol.o
         text    data     bss     dec     hex filename
        23913    2528    4132   30573    776d mm/memcontrol.o
       # /root/bin/perf stat -a -e page-faults,cache-misses --repeat 5 ./multi-fault-fork 8
      
       Performance counter stats for './multi-fault-fork 8' (5 runs):
      
             48179400  page-faults                ( +-   0.271% )
            588628407  cache-misses               ( +-   0.136% )
      
         61.004615021  seconds time elapsed   ( +-   0.004% )
      
        Then cache-miss/page fault = 12.22
      
       Text size is reduced.
       This performance improvement is not big and will be invisible in real world
       applications. But this result shows this patch has some good effect even
       on (small) SMP.
      
      Here is a test program I used.
      
       1. fork() processes on each cpus.
       2. do page fault repeatedly on each process.
       3. after 60secs, kill all childredn and exit.
      
      (3 is necessary for getting stable data, this is improvement from previous one.)
      
      #define _GNU_SOURCE
      #include <stdio.h>
      #include <sched.h>
      #include <sys/mman.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <signal.h>
      #include <stdlib.h>
      
      /*
       * For avoiding contention in page table lock, FAULT area is
       * sparse. If FAULT_LENGTH is too large for your cpus, decrease it.
       */
      #define FAULT_LENGTH	(2 * 1024 * 1024)
      #define PAGE_SIZE	4096
      #define MAXNUM		(128)
      
      void alarm_handler(int sig)
      {
      }
      
      void *worker(int cpu, int ppid)
      {
      	void *start, *end;
      	char *c;
      	cpu_set_t set;
      	int i;
      
      	CPU_ZERO(&set);
      	CPU_SET(cpu, &set);
      	sched_setaffinity(0, sizeof(set), &set);
      
      	start = mmap(NULL, FAULT_LENGTH, PROT_READ|PROT_WRITE,
      			MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
      	if (start == MAP_FAILED) {
      		perror("mmap");
      		exit(1);
      	}
      	end = start + FAULT_LENGTH;
      
      	pause();
      	//fprintf(stderr, "run%d", cpu);
      	while (1) {
      		for (c = (char*)start; (void *)c < end; c += PAGE_SIZE)
      			*c = 0;
      		madvise(start, FAULT_LENGTH, MADV_DONTNEED);
      	}
      	return NULL;
      }
      
      int main(int argc, char *argv[])
      {
      	int num, i, ret, pid, status;
      	int pids[MAXNUM];
      
      	if (argc < 2)
      		return 0;
      
      	setpgid(0, 0);
      	signal(SIGALRM, alarm_handler);
      	num = atoi(argv[1]);
      	pid = getpid();
      
      	for (i = 0; i < num; ++i) {
      		ret = fork();
      		if (!ret) {
      			worker(i, pid);
      			exit(0);
      		}
      		pids[i] = ret;
      	}
      	sleep(1);
      	kill(-pid, SIGALRM);
      	sleep(60);
      	for (i = 0; i < num; i++)
      		kill(pids[i], SIGKILL);
      	for (i = 0; i < num; i++)
      		waitpid(pids[i], &status, 0);
      	return 0;
      }
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c62b1a3b
    • K
      memcg: typo in comment to mem_cgroup_print_oom_info() · 6a6135b6
      Kirill A. Shutemov 提交于
      s/mem_cgroup_print_mem_info/mem_cgroup_print_oom_info/
      Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a6135b6