1. 29 10月, 2009 10 次提交
  2. 28 10月, 2009 1 次提交
    • J
      percpu: allow pcpu_alloc() to be called with IRQs off · 403a91b1
      Jiri Kosina 提交于
      pcpu_alloc() and pcpu_extend_area_map() perform a series of
      spin_lock_irq()/spin_unlock_irq() calls, which make them unsafe
      with respect to being called from contexts which have IRQs off.
      
      This patch converts the code to perform save/restore of flags instead,
      making pcpu_alloc() (or __alloc_percpu() respectively) to be called
      from early kernel startup stage, where IRQs are off.
      
      This is needed for proper initialization of per-cpu rq_weight data from
      sched_init().
      
      tj: added comment explaining why irqsave/restore is used in alloc path.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      403a91b1
  3. 27 10月, 2009 1 次提交
  4. 19 10月, 2009 4 次提交
  5. 12 10月, 2009 2 次提交
  6. 10 10月, 2009 2 次提交
  7. 09 10月, 2009 2 次提交
  8. 08 10月, 2009 3 次提交
  9. 02 10月, 2009 5 次提交
    • K
      memcg: reduce check for softlimit excess · ef8745c1
      KAMEZAWA Hiroyuki 提交于
      In charge/uncharge/reclaim path, usage_in_excess is calculated repeatedly
      and it takes res_counter's spin_lock every time.
      
      This patch removes unnecessary calls for res_count_soft_limit_excess.
      Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef8745c1
    • K
      memcg: some modification to softlimit under hierarchical memory reclaim. · 4e649152
      KAMEZAWA Hiroyuki 提交于
      This patch clean up/fixes for memcg's uncharge soft limit path.
      
      Problems:
        Now, res_counter_charge()/uncharge() handles softlimit information at
        charge/uncharge and softlimit-check is done when event counter per memcg
        goes over limit. Now, event counter per memcg is updated only when
        memory usage is over soft limit. Here, considering hierarchical memcg
        management, ancesotors should be taken care of.
      
        Now, ancerstors(hierarchy) are handled in charge() but not in uncharge().
        This is not good.
      
        Prolems:
        1. memcg's event counter incremented only when softlimit hits. That's bad.
           It makes event counter hard to be reused for other purpose.
      
        2. At uncharge, only the lowest level rescounter is handled. This is bug.
           Because ancesotor's event counter is not incremented, children should
           take care of them.
      
        3. res_counter_uncharge()'s 3rd argument is NULL in most case.
           ops under res_counter->lock should be small. No "if" sentense is better.
      
      Fixes:
        * Removed soft_limit_xx poitner and checks in charge and uncharge.
          Do-check-only-when-necessary scheme works enough well without them.
      
        * make event-counter of memcg incremented at every charge/uncharge.
          (per-cpu area will be accessed soon anyway)
      
        * All ancestors are checked at soft-limit-check. This is necessary because
          ancesotor's event counter may never be modified. Then, they should be
          checked at the same time.
      Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e649152
    • K
      memcg: fix refcnt going negative · 26251eaf
      KAMEZAWA Hiroyuki 提交于
      __mem_cgroup_largest_soft_limit_node() returns a mem_cgroup_per_zone "mz"
      with incremnted mz->mem->css's refcnt.  Then, the caller of this function
      has to call css_put(mz->mem->css).
      
      But, mz can be !NULL even if "not found" i.e.  without css_get().  By
      this, css->refcnt will go down to minus.
      
      This may cause various things...one of results will be
      initite-loop in css_tryget()  as this.
      
      INFO: RCU detected CPU 0 stall (t=10000 jiffies)
      sending NMI to all CPUs:
      NMI backtrace for cpu 0
      CPU 0:
      <snip>
      
       <<EOE>>  <IRQ>  [<ffffffff810884bd>] trace_hardirqs_off+0xd/0x10
        [<ffffffff8102a940>] flat_send_IPI_mask+0x90/0xb0
        [<ffffffff8102a9c9>] flat_send_IPI_all+0x69/0x70
        [<ffffffff81027372>] arch_trigger_all_cpu_backtrace+0x62/0xa0
        [<ffffffff810bff8e>] __rcu_pending+0x7e/0x370
        [<ffffffff810c02c7>] rcu_check_callbacks+0x47/0x130
        [<ffffffff81063a26>] update_process_times+0x46/0x70
        [<ffffffff81085930>] tick_sched_timer+0x60/0x160
        [<ffffffff810858d0>] ? tick_sched_timer+0x0/0x160
        [<ffffffff8107a03a>] __run_hrtimer+0xba/0x150
        [<ffffffff8107a325>] hrtimer_interrupt+0xd5/0x1b0
        [<ffffffff81426dfe>] ? trace_hardirqs_off_thunk+0x3a/0x3c
        [<ffffffff8142cacd>] smp_apic_timer_interrupt+0x6d/0x9b
        [<ffffffff8100cb33>] apic_timer_interrupt+0x13/0x20
        <EOI>  [<ffffffff811317b6>] ? mem_cgroup_walk_tree+0x156/0x180
        [<ffffffff811316d3>] ? mem_cgroup_walk_tree+0x73/0x180
        [<ffffffff81131692>] ? mem_cgroup_walk_tree+0x32/0x180
        [<ffffffff81131a00>] ? mem_cgroup_get_local_stat+0x0/0x110
        [<ffffffff81131d5b>] ? mem_control_stat_show+0x14b/0x330
        [<ffffffff810a57fd>] ? cgroup_seqfile_show+0x3d/0x60
      
      Above shows CPU0 caught in css_tryget()'s inifinite loop because
      of bad refcnt.
      
      This is a fix to set mz=NULL at the top of retry path.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26251eaf
    • H
      mm/rmap.c: fix comment · bf89c8c8
      Huang Shijie 提交于
      The page_address_in_vma() is not only used in unuse_vma().
      Signed-off-by: NHuang Shijie <shijie8@gmail.com>
      Acked-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bf89c8c8
    • S
      swapfile: avoid NULL pointer dereference in swapon when s_bdev is NULL · 3bd0f0c7
      Suresh Jayaraman 提交于
      While testing Swap over NFS patchset, I noticed an oops that was triggered
      during swapon. Investigating further, the NULL pointer deference is due to the
      SSD device check/optimization in the swapon code that assumes s_bdev could never
      be NULL.
      
      inode->i_sb->s_bdev could be NULL in a few cases. For e.g. one such case is
      loopback NFS mount, there could be others as well. Fix this by ensuring s_bdev
      is not NULL before we try to deference s_bdev.
      Signed-off-by: NSuresh Jayaraman <sjayaraman@suse.de>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3bd0f0c7
  10. 29 9月, 2009 5 次提交
    • T
      percpu: make allocation failures more verbose · f2badb0c
      Tejun Heo 提交于
      Warn and dump stack when percpu allocation fails.  percpu allocator is
      still young and unchecked NULL percpu pointer usage can result in
      random memory corruption when combined with the pointer shifting in
      access macros.  Allocation failures should be rare and the warning
      message will be disabled after certain times.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f2badb0c
    • T
      percpu: make pcpu_setup_first_chunk() failures more verbose · 635b75fc
      Tejun Heo 提交于
      The parameters to pcpu_setup_first_chunk() come from different sources
      depending on architecture and can be quite complex.  The function runs
      various sanity checks on the parameters and triggers BUG() if
      something isn't right.  However, this is very early during the boot
      and not reporting exactly what the problem is makes debugging even
      harder.
      
      Add PCPU_SETUP_BUG() macro which prints out enough information about
      the parameters.  As the macro still puts separate BUG() for each
      check, it won't lose any information even on the situations where only
      the program counter can be retrieved.
      
      While at it, also bump pcpu_dump_alloc_info() message to KERN_INFO so
      that it's visible on the console if boot fails to complete.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      635b75fc
    • T
      percpu: make embedding first chunk allocator check vmalloc space size · 6ea529a2
      Tejun Heo 提交于
      Embedding first chunk allocator maintains the distances between units
      in the vmalloc area and thus needs vmalloc space to be larger than the
      maximum distances between units; otherwise, it wouldn't be able to
      create any dynamic chunks.  This patch makes the embedding first chunk
      allocator check vmalloc space size and if the maximum distance between
      units is larger than 75% of it, print warning and, if page mapping
      allocator is available, fail initialization so that the system falls
      back onto it.
      
      This should work around percpu allocation failure problems on certain
      sparc64 configurations where distances between NUMA nodes are larger
      than the vmalloc area and makes percpu allocator more robust for
      future configurations.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6ea529a2
    • T
      percpu: make pcpu_build_alloc_info() clear static buffers · fb59e72e
      Tejun Heo 提交于
      pcpu_build_alloc_info() may be called multiple times when percpu is
      falling back to different first chunk allocator.  Make it clear static
      buffers so that they don't contain values from previous runs.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      fb59e72e
    • T
      percpu: fix unit_map[] verification in pcpu_setup_first_chunk() · ffe0d5a5
      Tejun Heo 提交于
      pcpu_setup_first_chunk() incorrectly used NR_CPUS as the impossible
      unit number while unit number can equal and go over NR_CPUS with
      sparse unit map.  This triggers BUG_ON() spuriously on machines which
      have non-power-of-two number of cpus.  Use UINT_MAX instead.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NTony Vroon <tony@linx.net>
      ffe0d5a5
  11. 28 9月, 2009 1 次提交
  12. 27 9月, 2009 1 次提交
    • L
      x86: Fix hwpoison code related build failure on 32-bit NUMAQ · d949f36f
      Linus Torvalds 提交于
      This build failure triggers:
      
       In file included from include/linux/suspend.h:8,
                       from arch/x86/kernel/asm-offsets_32.c:11,
                       from arch/x86/kernel/asm-offsets.c:2:
       include/linux/mm.h:503:2: error: #error SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS
      
      Because due to the hwpoison page flag we ran out of page
      flags on 32-bit.
      
      Dont turn on hwpoison on 32-bit NUMA (it's rare in any
      case).
      
      Also clean up the Kconfig dependencies in the generic MM
      code by introducing ARCH_SUPPORTS_MEMORY_FAILURE.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d949f36f
  13. 26 9月, 2009 3 次提交