1. 01 11月, 2011 3 次提交
    • H
      mm/huge_memory: fix copying user highpage · 0089e485
      Hillf Danton 提交于
      The THP copy-on-write handler falls back to regular-sized pages for a huge
      page replacement upon allocation failure or if THP has been individually
      disabled in the target VMA.  The loop responsible for copying page-sized
      chunks accidentally uses multiples of PAGE_SHIFT instead of PAGE_SIZE as
      the virtual address arg for copy_user_highpage().
      Signed-off-by: NHillf Danton <dhillf@gmail.com>
      Acked-by: NJohannes Weiner <jweiner@redhat.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0089e485
    • H
      mm/huge_memory.c: quiet sparse noise · 2f1da642
      H Hartley Sweeten 提交于
      Quiet the sparse noise:
      
      warning: symbol 'khugepaged_scan' was not declared. Should it be static?
      warning: context imbalance in 'khugepaged_scan_mm_slot' - unexpected unlock
      Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f1da642
    • A
      thp: mremap support and TLB optimization · 37a1c49a
      Andrea Arcangeli 提交于
      This adds THP support to mremap (decreases the number of split_huge_page()
      calls).
      
      Here are also some benchmarks with a proggy like this:
      
      ===
      #define _GNU_SOURCE
      #include <sys/mman.h>
      #include <stdlib.h>
      #include <stdio.h>
      #include <string.h>
      #include <sys/time.h>
      
      #define SIZE (5UL*1024*1024*1024)
      
      int main()
      {
              static struct timeval oldstamp, newstamp;
      	long diffsec;
      	char *p, *p2, *p3, *p4;
      	if (posix_memalign((void **)&p, 2*1024*1024, SIZE))
      		perror("memalign"), exit(1);
      	if (posix_memalign((void **)&p2, 2*1024*1024, SIZE))
      		perror("memalign"), exit(1);
      	if (posix_memalign((void **)&p3, 2*1024*1024, 4096))
      		perror("memalign"), exit(1);
      
      	memset(p, 0xff, SIZE);
      	memset(p2, 0xff, SIZE);
      	memset(p3, 0x77, 4096);
      	gettimeofday(&oldstamp, NULL);
      	p4 = mremap(p, SIZE, SIZE, MREMAP_FIXED|MREMAP_MAYMOVE, p3);
      	gettimeofday(&newstamp, NULL);
      	diffsec = newstamp.tv_sec - oldstamp.tv_sec;
      	diffsec = newstamp.tv_usec - oldstamp.tv_usec + 1000000 * diffsec;
      	printf("usec %ld\n", diffsec);
      	if (p == MAP_FAILED || p4 != p3)
      	//if (p == MAP_FAILED)
      		perror("mremap"), exit(1);
      	if (memcmp(p4, p2, SIZE))
      		printf("mremap bug\n"), exit(1);
      	printf("ok\n");
      
      	return 0;
      }
      ===
      
      THP on
      
       Performance counter stats for './largepage13' (3 runs):
      
                69195836 dTLB-loads                 ( +-   3.546% )  (scaled from 50.30%)
                   60708 dTLB-load-misses           ( +-  11.776% )  (scaled from 52.62%)
               676266476 dTLB-stores                ( +-   5.654% )  (scaled from 69.54%)
                   29856 dTLB-store-misses          ( +-   4.081% )  (scaled from 89.22%)
              1055848782 iTLB-loads                 ( +-   4.526% )  (scaled from 80.18%)
                    8689 iTLB-load-misses           ( +-   2.987% )  (scaled from 58.20%)
      
              7.314454164  seconds time elapsed   ( +-   0.023% )
      
      THP off
      
       Performance counter stats for './largepage13' (3 runs):
      
              1967379311 dTLB-loads                 ( +-   0.506% )  (scaled from 60.59%)
                 9238687 dTLB-load-misses           ( +-  22.547% )  (scaled from 61.87%)
              2014239444 dTLB-stores                ( +-   0.692% )  (scaled from 60.40%)
                 3312335 dTLB-store-misses          ( +-   7.304% )  (scaled from 67.60%)
              6764372065 iTLB-loads                 ( +-   0.925% )  (scaled from 79.00%)
                    8202 iTLB-load-misses           ( +-   0.475% )  (scaled from 70.55%)
      
              9.693655243  seconds time elapsed   ( +-   0.069% )
      
      grep thp /proc/vmstat
      thp_fault_alloc 35849
      thp_fault_fallback 0
      thp_collapse_alloc 3
      thp_collapse_alloc_failed 0
      thp_split 0
      
      thp_split 0 confirms no thp split despite plenty of hugepages allocated.
      
      The measurement of only the mremap time (so excluding the 3 long
      memset and final long 10GB memory accessing memcmp):
      
      THP on
      
      usec 14824
      usec 14862
      usec 14859
      
      THP off
      
      usec 256416
      usec 255981
      usec 255847
      
      With an older kernel without the mremap optimizations (the below patch
      optimizes the non THP version too).
      
      THP on
      
      usec 392107
      usec 390237
      usec 404124
      
      THP off
      
      usec 444294
      usec 445237
      usec 445820
      
      I guess with a threaded program that sends more IPI on large SMP it'd
      create an even larger difference.
      
      All debug options are off except DEBUG_VM to avoid skewing the
      results.
      
      The only problem for native 2M mremap like it happens above both the
      source and destination address must be 2M aligned or the hugepmd can't be
      moved without a split but that is an hardware limitation.
      
      [akpm@linux-foundation.org: coding-style nitpicking]
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NJohannes Weiner <jweiner@redhat.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      37a1c49a
  2. 26 7月, 2011 1 次提交
  3. 16 6月, 2011 1 次提交
  4. 25 5月, 2011 2 次提交
  5. 29 4月, 2011 1 次提交
    • A
      mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups · 78f11a25
      Andrea Arcangeli 提交于
      The huge_memory.c THP page fault was allowed to run if vm_ops was null
      (which would succeed for /dev/zero MAP_PRIVATE, as the f_op->mmap wouldn't
      setup a special vma->vm_ops and it would fallback to regular anonymous
      memory) but other THP logics weren't fully activated for vmas with vm_file
      not NULL (/dev/zero has a not NULL vma->vm_file).
      
      So this removes the vm_file checks so that /dev/zero also can safely use
      THP (the other albeit safer approach to fix this bug would have been to
      prevent the THP initial page fault to run if vm_file was set).
      
      After removing the vm_file checks, this also makes huge_memory.c stricter
      in khugepaged for the DEBUG_VM=y case.  It doesn't replace the vm_file
      check with a is_pfn_mapping check (but it keeps checking for VM_PFNMAP
      under VM_BUG_ON) because for a is_cow_mapping() mapping VM_PFNMAP should
      only be allowed to exist before the first page fault, and in turn when
      vma->anon_vma is null (so preventing khugepaged registration).  So I tend
      to think the previous comment saying if vm_file was set, VM_PFNMAP might
      have been set and we could still be registered in khugepaged (despite
      anon_vma was not NULL to be registered in khugepaged) was too paranoid.
      The is_linear_pfn_mapping check is also I think superfluous (as described
      by comment) but under DEBUG_VM it is safe to stay.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=33682Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NCaspar Zhang <bugs@casparzhang.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: <stable@kernel.org>		[2.6.38.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78f11a25
  6. 15 4月, 2011 2 次提交
  7. 23 3月, 2011 1 次提交
  8. 14 3月, 2011 1 次提交
  9. 05 3月, 2011 3 次提交
  10. 16 2月, 2011 1 次提交
  11. 12 2月, 2011 1 次提交
    • K
      memcg: fix leak of accounting at failure path of hugepage collapsing · 678ff896
      KAMEZAWA Hiroyuki 提交于
      mem_cgroup_uncharge_page() should be called in all failure cases after
      mem_cgroup_charge_newpage() is called in huge_memory.c::collapse_huge_page()
      
       [ 4209.076861] BUG: Bad page state in process khugepaged  pfn:1e9800
       [ 4209.077601] page:ffffea0006b14000 count:0 mapcount:0 mapping:          (null) index:0x2800
       [ 4209.078674] page flags: 0x40000000004000(head)
       [ 4209.079294] pc:ffff880214a30000 pc->flags:2146246697418756 pc->mem_cgroup:ffffc9000177a000
       [ 4209.082177] (/A)
       [ 4209.082500] Pid: 31, comm: khugepaged Not tainted 2.6.38-rc3-mm1 #1
       [ 4209.083412] Call Trace:
       [ 4209.083678]  [<ffffffff810f4454>] ? bad_page+0xe4/0x140
       [ 4209.084240]  [<ffffffff810f53e6>] ? free_pages_prepare+0xd6/0x120
       [ 4209.084837]  [<ffffffff8155621d>] ? rwsem_down_failed_common+0xbd/0x150
       [ 4209.085509]  [<ffffffff810f5462>] ? __free_pages_ok+0x32/0xe0
       [ 4209.086110]  [<ffffffff810f552b>] ? free_compound_page+0x1b/0x20
       [ 4209.086699]  [<ffffffff810fad6c>] ? __put_compound_page+0x1c/0x30
       [ 4209.087333]  [<ffffffff810fae1d>] ? put_compound_page+0x4d/0x200
       [ 4209.087935]  [<ffffffff810fb015>] ? put_page+0x45/0x50
       [ 4209.097361]  [<ffffffff8113f779>] ? khugepaged+0x9e9/0x1430
       [ 4209.098364]  [<ffffffff8107c870>] ? autoremove_wake_function+0x0/0x40
       [ 4209.099121]  [<ffffffff8113ed90>] ? khugepaged+0x0/0x1430
       [ 4209.099780]  [<ffffffff8107c236>] ? kthread+0x96/0xa0
       [ 4209.100452]  [<ffffffff8100dda4>] ? kernel_thread_helper+0x4/0x10
       [ 4209.101214]  [<ffffffff8107c1a0>] ? kthread+0x0/0xa0
       [ 4209.101842]  [<ffffffff8100dda0>] ? kernel_thread_helper+0x0/0x10
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      678ff896
  12. 03 2月, 2011 1 次提交
    • J
      thp: fix the wrong reported address of hwpoisoned hugepages · a6d30ddd
      Jin Dongming 提交于
      When the tail page of THP is poisoned, the head page will be poisoned too.
       And the wrong address, address of head page, will be sent with sigbus
      always.
      
      So when the poisoned page is used by Guest OS which is running on KVM,
      after the address changing(hva->gpa) by qemu, the unexpected process on
      Guest OS will be killed by sigbus.
      
      What we expected is that the process using the poisoned tail page could be
      killed on Guest OS, but not that the process using the healthy head page
      is killed.
      
      Since it is not good to poison the healthy page, avoid poisoning other
      than the page which is really poisoned.
        (While we poison all pages in a huge page in case of hugetlb,
         we can do this for THP thanks to split_huge_page().)
      
      Here we fix two parts:
        1. Isolate the poisoned page only to make sure
           the reported address is the address of poisoned page.
        2. make the poisoned page work as the poisoned regular page.
      
      [akpm@linux-foundation.org: fix spello in comment]
      Signed-off-by: NJin Dongming <jin.dongming@np.css.fujitsu.com>
      Reviewed-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6d30ddd
  13. 21 1月, 2011 2 次提交
  14. 14 1月, 2011 20 次提交