1. 13 11月, 2013 34 次提交
  2. 02 11月, 2013 1 次提交
    • G
      memcg: remove incorrect underflow check · 6920a1bd
      Greg Thelen 提交于
      When a memcg is deleted mem_cgroup_reparent_charges() moves charged
      memory to the parent memcg.  As of v3.11-9444-g3ea67d06 "memcg: add per
      cgroup writeback pages accounting" there's bad pointer read.  The goal
      was to check for counter underflow.  The counter is a per cpu counter
      and there are two problems with the code:
      
       (1) per cpu access function isn't used, instead a naked pointer is used
           which easily causes oops.
       (2) the check doesn't sum all cpus
      
      Test:
        $ cd /sys/fs/cgroup/memory
        $ mkdir x
        $ echo 3 > /proc/sys/vm/drop_caches
        $ (echo $BASHPID >> x/tasks && exec cat) &
        [1] 7154
        $ grep ^mapped x/memory.stat
        mapped_file 53248
        $ echo 7154 > tasks
        $ rmdir x
        <OOPS>
      
      The fix is to remove the check.  It's currently dangerous and isn't
      worth fixing it to use something expensive, such as
      percpu_counter_sum(), for each reparented page.  __this_cpu_read() isn't
      enough to fix this because there's no guarantees of the current cpus
      count.  The only guarantees is that the sum of all per-cpu counter is >=
      nr_pages.
      
      Fixes: 3ea67d06 ("memcg: add per cgroup writeback pages accounting")
      Reported-and-tested-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Reviewed-by: NSha Zhengju <handai.szj@taobao.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6920a1bd
  3. 01 11月, 2013 3 次提交
  4. 31 10月, 2013 2 次提交
    • G
      memcg: use __this_cpu_sub() to dec stats to avoid incorrect subtrahend casting · 5e8cfc3c
      Greg Thelen 提交于
      As of commit 3ea67d06 ("memcg: add per cgroup writeback pages
      accounting") memcg counter errors are possible when moving charged
      memory to a different memcg.  Charge movement occurs when processing
      writes to memory.force_empty, moving tasks to a memcg with
      memcg.move_charge_at_immigrate=1, or memcg deletion.
      
      An example showing error after memory.force_empty:
      
        $ cd /sys/fs/cgroup/memory
        $ mkdir x
        $ rm /data/tmp/file
        $ (echo $BASHPID >> x/tasks && exec mmap_writer /data/tmp/file 1M) &
        [1] 13600
        $ grep ^mapped x/memory.stat
        mapped_file 1048576
        $ echo 13600 > tasks
        $ echo 1 > x/memory.force_empty
        $ grep ^mapped x/memory.stat
        mapped_file 4503599627370496
      
      mapped_file should end with 0.
        4503599627370496 == 0x10,0000,0000,0000 == 0x100,0000,0000 pages
        1048576          == 0x10,0000           == 0x100 pages
      
      This issue only affects the source memcg on 64 bit machines; the
      destination memcg counters are correct.  So the rmdir case is not too
      important because such counters are soon disappearing with the entire
      memcg.  But the memcg.force_empty and memory.move_charge_at_immigrate=1
      cases are larger problems as the bogus counters are visible for the
      (possibly long) remaining life of the source memcg.
      
      The problem is due to memcg use of __this_cpu_from(.., -nr_pages), which
      is subtly wrong because it subtracts the unsigned int nr_pages (either
      -1 or -512 for THP) from a signed long percpu counter.  When
      nr_pages=-1, -nr_pages=0xffffffff.  On 64 bit machines stat->count[idx]
      is signed 64 bit.  So memcg's attempt to simply decrement a count (e.g.
      from 1 to 0) boils down to:
      
        long count = 1
        unsigned int nr_pages = 1
        count += -nr_pages  /* -nr_pages == 0xffff,ffff */
        count is now 0x1,0000,0000 instead of 0
      
      The fix is to subtract the unsigned page count rather than adding its
      negation.  This only works once "percpu: fix this_cpu_sub() subtrahend
      casting for unsigneds" is applied to fix this_cpu_sub().
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e8cfc3c
    • C
      mm/pagewalk.c: fix walk_page_range() access of wrong PTEs · 3017f079
      Chen LinX 提交于
      When walk_page_range walk a memory map's page tables, it'll skip
      VM_PFNMAP area, then variable 'next' will to assign to vma->vm_end, it
      maybe larger than 'end'.  In next loop, 'addr' will be larger than
      'next'.  Then in /proc/XXXX/pagemap file reading procedure, the 'addr'
      will growing forever in pagemap_pte_range, pte_to_pagemap_entry will
      access the wrong pte.
      
        BUG: Bad page map in process procrank  pte:8437526f pmd:785de067
        addr:9108d000 vm_flags:00200073 anon_vma:f0d99020 mapping:  (null) index:9108d
        CPU: 1 PID: 4974 Comm: procrank Tainted: G    B   W  O 3.10.1+ #1
        Call Trace:
          dump_stack+0x16/0x18
          print_bad_pte+0x114/0x1b0
          vm_normal_page+0x56/0x60
          pagemap_pte_range+0x17a/0x1d0
          walk_page_range+0x19e/0x2c0
          pagemap_read+0x16e/0x200
          vfs_read+0x84/0x150
          SyS_read+0x4a/0x80
          syscall_call+0x7/0xb
      Signed-off-by: NLiu ShuoX <shuox.liu@intel.com>
      Signed-off-by: NChen LinX <linx.z.chen@intel.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: <stable@vger.kernel.org>	[3.10.x+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3017f079