1. 31 5月, 2012 1 次提交
  2. 30 5月, 2012 1 次提交
  3. 26 5月, 2012 2 次提交
    • C
      mm: add new arch_make_huge_pte() method for tile support · d9ed9faa
      Chris Metcalf 提交于
      The tile support for multiple-size huge pages requires tagging
      the hugetlb PTE with a "super" bit for PTEs that are multiples of
      the basic size of a pagetable span.  To set that bit properly
      we need to tweak the PTe in make_huge_pte() based on the vma.
      
      This change provides the API for a subsequent tile-specific
      change to use.
      Reviewed-by: NHillf Danton <dhillf@gmail.com>
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      d9ed9faa
    • C
      arch/tile: allow building Linux with transparent huge pages enabled · 73636b1a
      Chris Metcalf 提交于
      The change adds some infrastructure for managing tile pmd's more generally,
      using pte_pmd() and pmd_pte() methods to translate pmd values to and
      from ptes, since on TILEPro a pmd is really just a nested structure
      holding a pgd (aka pte).  Several existing pmd methods are moved into
      this framework, and a whole raft of additional pmd accessors are defined
      that are used by the transparent hugepage framework.
      
      The tile PTE now has a "client2" bit.  The bit is used to indicate a
      transparent huge page is in the process of being split into subpages.
      
      This change also fixes a generic bug where the return value of the
      generic pmdp_splitting_flush() was incorrect.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      73636b1a
  4. 24 5月, 2012 2 次提交
    • T
      mm: add a low limit to alloc_large_system_hash · 31fe62b9
      Tim Bird 提交于
      UDP stack needs a minimum hash size value for proper operation and also
      uses alloc_large_system_hash() for proper NUMA distribution of its hash
      tables and automatic sizing depending on available system memory.
      
      On some low memory situations, udp_table_init() must ignore the
      alloc_large_system_hash() result and reallocs a bigger memory area.
      
      As we cannot easily free old hash table, we leak it and kmemleak can
      issue a warning.
      
      This patch adds a low limit parameter to alloc_large_system_hash() to
      solve this problem.
      
      We then specify UDP_HTABLE_SIZE_MIN for UDP/UDPLite hash table
      allocation.
      Reported-by: NMark Asselstine <mark.asselstine@windriver.com>
      Reported-by: NTim Bird <tim.bird@am.sony.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31fe62b9
    • M
      mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages · 05f144a0
      Mel Gorman 提交于
      Dave Jones' system call fuzz testing tool "trinity" triggered the
      following bug error with slab debugging enabled
      
          =============================================================================
          BUG numa_policy (Not tainted): Poison overwritten
          -----------------------------------------------------------------------------
      
          INFO: 0xffff880146498250-0xffff880146498250. First byte 0x6a instead of 0x6b
          INFO: Allocated in mpol_new+0xa3/0x140 age=46310 cpu=6 pid=32154
           __slab_alloc+0x3d3/0x445
           kmem_cache_alloc+0x29d/0x2b0
           mpol_new+0xa3/0x140
           sys_mbind+0x142/0x620
           system_call_fastpath+0x16/0x1b
          INFO: Freed in __mpol_put+0x27/0x30 age=46268 cpu=6 pid=32154
           __slab_free+0x2e/0x1de
           kmem_cache_free+0x25a/0x260
           __mpol_put+0x27/0x30
           remove_vma+0x68/0x90
           exit_mmap+0x118/0x140
           mmput+0x73/0x110
           exit_mm+0x108/0x130
           do_exit+0x162/0xb90
           do_group_exit+0x4f/0xc0
           sys_exit_group+0x17/0x20
           system_call_fastpath+0x16/0x1b
          INFO: Slab 0xffffea0005192600 objects=27 used=27 fp=0x          (null) flags=0x20000000004080
          INFO: Object 0xffff880146498250 @offset=592 fp=0xffff88014649b9d0
      
      This implied a reference counting bug and the problem happened during
      mbind().
      
      mbind() applies a new memory policy to a range and uses mbind_range() to
      merge existing VMAs or split them as necessary.  In the event of splits,
      mpol_dup() will allocate a new struct mempolicy and maintain existing
      reference counts whose rules are documented in
      Documentation/vm/numa_memory_policy.txt .
      
      The problem occurs with shared memory policies.  The vm_op->set_policy
      increments the reference count if necessary and split_vma() and
      vma_merge() have already handled the existing reference counts.
      However, policy_vma() screws it up by replacing an existing
      vma->vm_policy with one that potentially has the wrong reference count
      leading to a premature free.  This patch removes the damage caused by
      policy_vma().
      
      With this patch applied Dave's trinity tool runs an mbind test for 5
      minutes without error.  /proc/slabinfo reported that there are no
      numa_policy or shared_policy_node objects allocated after the test
      completed and the shared memory region was deleted.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Dave Jones <davej@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Stephen Wilson <wilsons@start.ca>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05f144a0
  5. 21 5月, 2012 13 次提交
  6. 20 5月, 2012 1 次提交
    • H
      memcg,thp: fix res_counter:96 regression · 62ade86a
      Hugh Dickins 提交于
      Occasionally, testing memcg's move_charge_at_immigrate on rc7 shows
      a flurry of hundreds of warnings at kernel/res_counter.c:96, where
      res_counter_uncharge_locked() does WARN_ON(counter->usage < val).
      
      The first trace of each flurry implicates __mem_cgroup_cancel_charge()
      of mc.precharge, and an audit of mc.precharge handling points to
      mem_cgroup_move_charge_pte_range()'s THP handling in commit 12724850
      ("memcg: avoid THP split in task migration").
      
      Checking !mc.precharge is good everywhere else, when a single page is to
      be charged; but here the "mc.precharge -= HPAGE_PMD_NR" likely to
      follow, is liable to result in underflow (a lot can change since the
      precharge was estimated).
      
      Simply check against HPAGE_PMD_NR: there's probably a better
      alternative, trying precharge for more, splitting if unsuccessful; but
      this one-liner is safer for now - no kernel/res_counter.c:96 warnings
      seen in 26 hours.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      62ade86a
  7. 18 5月, 2012 1 次提交
    • M
      slub: missing test for partial pages flush work in flush_all() · 02e1a9cd
      majianpeng 提交于
      I found some kernel messages such as:
      
          SLUB raid5-md127: kmem_cache_destroy called for cache that still has objects.
          Pid: 6143, comm: mdadm Tainted: G           O 3.4.0-rc6+        #75
          Call Trace:
          kmem_cache_destroy+0x328/0x400
          free_conf+0x2d/0xf0 [raid456]
          stop+0x41/0x60 [raid456]
          md_stop+0x1a/0x60 [md_mod]
          do_md_stop+0x74/0x470 [md_mod]
          md_ioctl+0xff/0x11f0 [md_mod]
          blkdev_ioctl+0xd8/0x7a0
          block_ioctl+0x3b/0x40
          do_vfs_ioctl+0x96/0x560
          sys_ioctl+0x91/0xa0
          system_call_fastpath+0x16/0x1b
      
      Then using kmemleak I found these messages:
      
          unreferenced object 0xffff8800b6db7380 (size 112):
            comm "mdadm", pid 5783, jiffies 4294810749 (age 90.589s)
            hex dump (first 32 bytes):
              01 01 db b6 ad 4e ad de ff ff ff ff ff ff ff ff  .....N..........
              ff ff ff ff ff ff ff ff 98 40 4a 82 ff ff ff ff  .........@J.....
            backtrace:
              kmemleak_alloc+0x21/0x50
              kmem_cache_alloc+0xeb/0x1b0
              kmem_cache_open+0x2f1/0x430
              kmem_cache_create+0x158/0x320
              setup_conf+0x649/0x770 [raid456]
              run+0x68b/0x840 [raid456]
              md_run+0x529/0x940 [md_mod]
              do_md_run+0x18/0xc0 [md_mod]
              md_ioctl+0xba8/0x11f0 [md_mod]
              blkdev_ioctl+0xd8/0x7a0
              block_ioctl+0x3b/0x40
              do_vfs_ioctl+0x96/0x560
              sys_ioctl+0x91/0xa0
              system_call_fastpath+0x16/0x1b
      
      This bug was introduced by commit a8364d55 ("slub: only IPI CPUs that
      have per cpu obj to flush"), which did not include checks for per cpu
      partial pages being present on a cpu.
      Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Tested-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02e1a9cd
  8. 16 5月, 2012 2 次提交
  9. 12 5月, 2012 1 次提交
    • H
      mm: raise MemFree by reverting percpu_pagelist_fraction to 0 · 1b76b02f
      Hugh Dickins 提交于
      Why is there less MemFree than there used to be?  It perturbed a test,
      so I've just been bisecting linux-next, and now find the offender went
      upstream yesterday.
      
      Commit 93278814 "mm: fix division by 0 in percpu_pagelist_fraction()"
      mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8,
      which leaves 1/8th of memory on percpu lists (on each cpu??); but most of
      us expect it to be left unset at 0 (and it's not then used as a divisor).
      
        MemTotal: 8061476kB  8061476kB  8061476kB  8061476kB  8061476kB  8061476kB
        Repetitive test with percpu_pagelist_fraction 8:
        MemFree:  6948420kB  6237172kB  6949696kB  6840692kB  6949048kB  6862984kB
        Same test with percpu_pagelist_fraction back to 0:
        MemFree:  7945000kB  7944908kB  7948568kB  7949060kB  7948796kB  7948812kB
      Signed-off-by: NHugh Dickins <hughd@google.com>
      [ We really should fix the crazy sysctl interface too, but that's a
        separate thing - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b76b02f
  10. 11 5月, 2012 4 次提交
  11. 10 5月, 2012 2 次提交
  12. 08 5月, 2012 1 次提交
  13. 07 5月, 2012 2 次提交
  14. 06 5月, 2012 2 次提交
  15. 03 5月, 2012 1 次提交
  16. 26 4月, 2012 4 次提交
    • S
      mm: fix NULL ptr dereference in move_pages · 6e8b09ea
      Sasha Levin 提交于
      Commit 3268c63e ("mm: fix move/migrate_pages() race on task struct") has
      added an odd construct where 'mm' is checked for being NULL, and if it is,
      it would get dereferenced anyways by mput()ing it.
      Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e8b09ea
    • S
      mm: fix NULL ptr dereference in migrate_pages · f2a9ef88
      Sasha Levin 提交于
      Commit 3268c63e ("mm: fix move/migrate_pages() race on task struct") has
      added an odd construct where 'mm' is checked for being NULL, and if it is,
      it would get dereferenced anyways by mput()ing it.
      
      This would lead to the following NULL ptr deref and BUG() when calling
      migrate_pages() with a pid that has no mm struct:
      
      [25904.193704] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
      [25904.194235] IP: [<ffffffff810b0de7>] mmput+0x27/0xf0
      [25904.194235] PGD 773e6067 PUD 77da0067 PMD 0
      [25904.194235] Oops: 0002 [#1] PREEMPT SMP
      [25904.194235] CPU 2
      [25904.194235] Pid: 31608, comm: trinity Tainted: G        W    3.4.0-rc2-next-20120412-sasha #69
      [25904.194235] RIP: 0010:[<ffffffff810b0de7>]  [<ffffffff810b0de7>] mmput+0x27/0xf0
      [25904.194235] RSP: 0018:ffff880077d49e08  EFLAGS: 00010202
      [25904.194235] RAX: 0000000000000286 RBX: 0000000000000000 RCX: 0000000000000000
      [25904.194235] RDX: ffff880075ef8000 RSI: 000000000000023d RDI: 0000000000000286
      [25904.194235] RBP: ffff880077d49e18 R08: 0000000000000001 R09: 0000000000000001
      [25904.194235] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      [25904.194235] R13: 00000000ffffffea R14: ffff880034287740 R15: ffff8800218d3010
      [25904.194235] FS:  00007fc8b244c700(0000) GS:ffff880029800000(0000) knlGS:0000000000000000
      [25904.194235] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [25904.194235] CR2: 0000000000000050 CR3: 00000000767c6000 CR4: 00000000000406e0
      [25904.194235] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [25904.194235] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [25904.194235] Process trinity (pid: 31608, threadinfo ffff880077d48000, task ffff880075ef8000)
      [25904.194235] Stack:
      [25904.194235]  ffff8800342876c0 0000000000000000 ffff880077d49f78 ffffffff811b8020
      [25904.194235]  ffffffff811b7d91 ffff880075ef8000 ffff88002256d200 0000000000000000
      [25904.194235]  00000000000003ff 0000000000000000 0000000000000000 0000000000000000
      [25904.194235] Call Trace:
      [25904.194235]  [<ffffffff811b8020>] sys_migrate_pages+0x340/0x3a0
      [25904.194235]  [<ffffffff811b7d91>] ? sys_migrate_pages+0xb1/0x3a0
      [25904.194235]  [<ffffffff8266cbb9>] system_call_fastpath+0x16/0x1b
      [25904.194235] Code: c9 c3 66 90 55 31 d2 48 89 e5 be 3d 02 00 00 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 89 fb 48 c7 c7 cf 0e e1 82 e8 69 18 03 00 <f0> ff 4b 50 0f 94 c0 84 c0 0f 84 aa 00 00 00 48 89 df e8 72 f1
      [25904.194235] RIP  [<ffffffff810b0de7>] mmput+0x27/0xf0
      [25904.194235]  RSP <ffff880077d49e08>
      [25904.194235] CR2: 0000000000000050
      [25904.348999] ---[ end trace a307b3ed40206b4b ]---
      Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2a9ef88
    • Y
      mm: fix up the vmscan stat in vmstat · 904249aa
      Ying Han 提交于
      The "pgsteal" stat is confusing because it counts both direct reclaim as
      well as background reclaim.  However, we have "kswapd_steal" which also
      counts background reclaim value.
      
      This patch fixes it and also makes it match the existng "pgscan_" stats.
      
      Test:
      pgsteal_kswapd_dma32 447623
      pgsteal_kswapd_normal 42272677
      pgsteal_kswapd_movable 0
      pgsteal_direct_dma32 2801
      pgsteal_direct_normal 44353270
      pgsteal_direct_movable 0
      Signed-off-by: NYing Han <yinghan@google.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      904249aa
    • K
      mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma · b1c12cbc
      Konstantin Khlebnikov 提交于
      Fix a gcc warning (and bug?) introduced in cc9a6c87 ("cpuset: mm: reduce
      large amounts of memory barrier related damage v3")
      
      Local variable "page" can be uninitialized if the nodemask from vma policy
      does not intersects with nodemask from cpuset.  Even if it doesn't happens
      it is better to initialize this variable explicitly than to introduce
      a kernel oops in a weird corner case.
      
      mm/hugetlb.c: In function `alloc_huge_page':
      mm/hugetlb.c:1135:5: warning: `page' may be used uninitialized in this function
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1c12cbc