1. 23 2月, 2017 4 次提交
  2. 11 1月, 2017 1 次提交
  3. 13 12月, 2016 4 次提交
  4. 12 11月, 2016 1 次提交
  5. 08 10月, 2016 5 次提交
  6. 12 8月, 2016 1 次提交
  7. 03 8月, 2016 2 次提交
  8. 01 8月, 2016 1 次提交
  9. 29 7月, 2016 1 次提交
  10. 27 7月, 2016 2 次提交
  11. 15 7月, 2016 1 次提交
    • H
      mm: thp: refix false positive BUG in page_move_anon_rmap() · 5a49973d
      Hugh Dickins 提交于
      The VM_BUG_ON_PAGE in page_move_anon_rmap() is more trouble than it's
      worth: the syzkaller fuzzer hit it again.  It's still wrong for some THP
      cases, because linear_page_index() was never intended to apply to
      addresses before the start of a vma.
      
      That's easily fixed with a signed long cast inside linear_page_index();
      and Dmitry has tested such a patch, to verify the false positive.  But
      why extend linear_page_index() just for this case? when the avoidance in
      page_move_anon_rmap() has already grown ugly, and there's no reason for
      the check at all (nothing else there is using address or index).
      
      Remove address arg from page_move_anon_rmap(), remove VM_BUG_ON_PAGE,
      remove CONFIG_DEBUG_VM PageTransHuge adjustment.
      
      And one more thing: should the compound_head(page) be done inside or
      outside page_move_anon_rmap()? It's usually pushed down to the lowest
      level nowadays (and mm/memory.c shows no other explicit use of it), so I
      think it's better done in page_move_anon_rmap() than by caller.
      
      Fixes: 0798d3c0 ("mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()")
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1607120444540.12528@eggly.anvilsSigned-off-by: NHugh Dickins <hughd@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.5+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a49973d
  12. 06 7月, 2016 1 次提交
  13. 25 6月, 2016 2 次提交
    • G
      mm/hugetlb: clear compound_mapcount when freeing gigantic pages · c8cc708a
      Gerald Schaefer 提交于
      While working on s390 support for gigantic hugepages I ran into the
      following "Bad page state" warning when freeing gigantic pages:
      
        BUG: Bad page state in process bash  pfn:580001
        page:000003d116000040 count:0 mapcount:0 mapping:ffffffff00000000 index:0x0
        flags: 0x7fffc0000000000()
        page dumped because: non-NULL mapping
      
      This is because page->compound_mapcount, which is part of a union with
      page->mapping, is initialized with -1 in prep_compound_gigantic_page(),
      and not cleared again during destroy_compound_gigantic_page().  Fix this
      by clearing the compound_mapcount in destroy_compound_gigantic_page()
      before clearing compound_head.
      
      Interestingly enough, the warning will not show up on x86_64, although
      this should not be architecture specific.  Apparently there is an
      endianness issue, combined with the fact that the union contains both a
      64 bit ->mapping pointer and a 32 bit atomic_t ->compound_mapcount as
      members.  The resulting bogus page->mapping on x86_64 therefore contains
      00000000ffffffff instead of ffffffff00000000 on s390, which will falsely
      trigger the PageAnon() check in free_pages_prepare() because
      page->mapping & PAGE_MAPPING_ANON is true on little-endian architectures
      like x86_64 in this case (the page is not compound anymore,
      ->compound_head was already cleared before).  As a result, page->mapping
      will be cleared before doing the checks in free_pages_check().
      
      Not sure if the bogus "PageAnon() returning true" on x86_64 for the
      first tail page of a gigantic page (at this stage) has other theoretical
      implications, but they would also be fixed with this patch.
      
      Link: http://lkml.kernel.org/r/1466612719-5642-1-git-send-email-gerald.schaefer@de.ibm.comSigned-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8cc708a
    • K
      hugetlb: fix nr_pmds accounting with shared page tables · c17b1f42
      Kirill A. Shutemov 提交于
      We account HugeTLB's shared page table to all processes who share it.
      The accounting happens during huge_pmd_share().
      
      If somebody populates pud entry under us, we should decrease pagetable's
      refcount and decrease nr_pmds of the process.
      
      By mistake, I increase nr_pmds again in this case.  :-/ It will lead to
      "BUG: non-zero nr_pmds on freeing mm: 2" on process' exit.
      
      Let's fix this by increasing nr_pmds only when we're sure that the page
      table will be used.
      
      Link: http://lkml.kernel.org/r/20160617122506.GC6534@node.shutemov.name
      Fixes: dc6c9a35 ("mm: account pmd page tables to the process")
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: Nzhongjiang <zhongjiang@huawei.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c17b1f42
  14. 10 6月, 2016 1 次提交
    • M
      mm/hugetlb: fix huge page reserve accounting for private mappings · 67961f9d
      Mike Kravetz 提交于
      When creating a private mapping of a hugetlbfs file, it is possible to
      unmap pages via ftruncate or fallocate hole punch.  If subsequent faults
      repopulate these mappings, the reserve counts will go negative.  This is
      because the code currently assumes all faults to private mappings will
      consume reserves.  The problem can be recreated as follows:
      
       - mmap(MAP_PRIVATE) a file in hugetlbfs filesystem
       - write fault in pages in the mapping
       - fallocate(FALLOC_FL_PUNCH_HOLE) some pages in the mapping
       - write fault in pages in the hole
      
      This will result in negative huge page reserve counts and negative
      subpool usage counts for the hugetlbfs.  Note that this can also be
      recreated with ftruncate, but fallocate is more straight forward.
      
      This patch modifies the routines vma_needs_reserves and vma_has_reserves
      to examine the reserve map associated with private mappings similar to
      that for shared mappings.  However, the reserve map semantics for
      private and shared mappings are very different.  This results in subtly
      different code that is explained in the comments.
      
      Link: http://lkml.kernel.org/r/1464720957-15698-1-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      67961f9d
  15. 30 5月, 2016 1 次提交
  16. 21 5月, 2016 1 次提交
    • D
      /dev/dax, core: file operations and dax-mmap · dee41079
      Dan Williams 提交于
      The "Device DAX" core enables dax mappings of performance / feature
      differentiated memory.  An open mapping or file handle keeps the backing
      struct device live, but new mappings are only possible while the device
      is enabled.   Faults are handled under rcu_read_lock to synchronize
      with the enabled state of the device.
      
      Similar to the filesystem-dax case the backing memory may optionally
      have struct page entries.  However, unlike fs-dax there is no support
      for private mappings, or mappings that are not backed by media (see
      use of zero-page in fs-dax).
      
      Mappings are always guaranteed to match the alignment of the dax_region.
      If the dax_region is configured to have a 2MB alignment, all mappings
      are guaranteed to be backed by a pmd entry.  Contrast this determinism
      with the fs-dax case where pmd mappings are opportunistic.  If userspace
      attempts to force a misaligned mapping, the driver will fail the mmap
      attempt.  See dax_dev_check_vma() for other scenarios that are rejected,
      like MAP_PRIVATE mappings.
      
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      dee41079
  17. 20 5月, 2016 5 次提交
    • J
      mm/hugetlb: add same zone check in pfn_range_valid_gigantic() · f44b2dda
      Joonsoo Kim 提交于
      This patchset deals with some problematic sites that iterate pfn ranges.
      
      There is a system thats node's pfns are overlapped as follows:
      
        -----pfn-------->
        N0 N1 N2 N0 N1 N2
      
      Therefore, we need to take care of this overlapping when iterating pfn
      range.
      
      I audit many iterating sites that uses pfn_valid(), pfn_valid_within(),
      zone_start_pfn and etc.  and others looks safe to me.  This is a
      preparation step for a new CMA implementation, ZONE_CMA
      (https://lkml.org/lkml/2015/2/12/95), because it would be easily
      overlapped with other zones.  But, zone overlap check is also needed for
      the general case so I send it separately.
      
      This patch (of 5):
      
      alloc_gigantic_page() uses alloc_contig_range() and this requires that
      the requested range is in a single zone.  To satisfy this requirement,
      add this check to pfn_range_valid_gigantic().
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f44b2dda
    • A
      mm/hugetlb.c: use first_memory_node · 54f18d35
      Andrew Morton 提交于
      Instead of open-coding it.
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54f18d35
    • V
      mm/hugetlb: introduce hugetlb_bad_size() · 9fee021d
      Vaishali Thakkar 提交于
      When any unsupported hugepage size is specified, 'hugepagesz=' and
      'hugepages=' should be ignored during command line parsing until any
      supported hugepage size is found.  But currently incorrect number of
      hugepages are allocated when unsupported size is specified as it fails
      to ignore the 'hugepages=' command.
      
      Test case:
      
      Note that this is specific to x86 architecture.
      
      Boot the kernel with command line option 'hugepagesz=256M hugepages=X'.
      After boot, dmesg output shows that X number of hugepages of the size 2M
      is pre-allocated instead of 0.
      
      So, to handle such command line options, introduce new routine
      hugetlb_bad_size.  The routine hugetlb_bad_size sets the global variable
      parsed_valid_hugepagesz.  We are using parsed_valid_hugepagesz to save
      the state when unsupported hugepagesize is found so that we can ignore
      the 'hugepages=' parameters after that and then reset the variable when
      supported hugepage size is found.
      
      The routine hugetlb_bad_size can be called while setting 'hugepagesz='
      parameter in an architecture specific code.
      Signed-off-by: NVaishali Thakkar <vaishali.thakkar@oracle.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9fee021d
    • M
      mm/hugetlb: optimize minimum size (min_size) accounting · 09a95e29
      Mike Kravetz 提交于
      It was observed that minimum size accounting associated with the
      hugetlbfs min_size mount option may not perform optimally and as
      expected.  As huge pages/reservations are released from the filesystem
      and given back to the global pools, they are reserved for subsequent
      filesystem use as long as the subpool reserved count is less than
      subpool minimum size.  It does not take into account used pages within
      the filesystem.  The filesystem size limits are not exceeded and this is
      technically not a bug.  However, better behavior would be to wait for
      the number of used pages/reservations associated with the filesystem to
      drop below the minimum size before taking reservations to satisfy
      minimum size.
      
      An optimization is also made to the hugepage_subpool_get_pages() routine
      which is called when pages/reservations are allocated.  This does not
      change behavior, but simply avoids the accounting if all reservations
      have already been taken (subpool reserved count == 0).
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09a95e29
    • A
      include/linux/nodemask.h: create next_node_in() helper · 0edaf86c
      Andrew Morton 提交于
      Lots of code does
      
      	node = next_node(node, XXX);
      	if (node == MAX_NUMNODES)
      		node = first_node(XXX);
      
      so create next_node_in() to do this and use it in various places.
      
      [mhocko@suse.com: use next_node_in() helper]
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@kernel.org>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Hui Zhu <zhuhui@xiaomi.com>
      Cc: Wang Xiaoqiang <wangxq10@lzu.edu.cn>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0edaf86c
  18. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  19. 18 3月, 2016 1 次提交
  20. 10 3月, 2016 2 次提交
    • J
      mm/hugetlb: use EOPNOTSUPP in hugetlb sysctl handlers · 86613628
      Jan Stancek 提交于
      Replace ENOTSUPP with EOPNOTSUPP.  If hugepages are not supported, this
      value is propagated to userspace.  EOPNOTSUPP is part of uapi and is
      widely supported by libc libraries.
      
      It gives nicer message to user, rather than:
      
        # cat /proc/sys/vm/nr_hugepages
        cat: /proc/sys/vm/nr_hugepages: Unknown error 524
      
      And also LTP's proc01 test was failing because this ret code (524)
      was unexpected:
      
        proc01      1  TFAIL  :  proc01.c:396: read failed: /proc/sys/vm/nr_hugepages: errno=???(524): Unknown error 524
        proc01      2  TFAIL  :  proc01.c:396: read failed: /proc/sys/vm/nr_hugepages_mempolicy: errno=???(524): Unknown error 524
        proc01      3  TFAIL  :  proc01.c:396: read failed: /proc/sys/vm/nr_overcommit_hugepages: errno=???(524): Unknown error 524
      Signed-off-by: NJan Stancek <jstancek@redhat.com>
      Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      86613628
    • G
      mm/hugetlb: hugetlb_no_page: rate-limit warning message · 910154d5
      Geoffrey Thomas 提交于
      The warning message "killed due to inadequate hugepage pool" simply
      indicates that SIGBUS was sent, not that the process was forcibly killed.
      If the process has a signal handler installed does not fix the problem,
      this message can rapidly spam the kernel log.
      
      On my amd64 dev machine that does not have hugepages configured, I can
      reproduce the repeated warnings easily by setting vm.nr_hugepages=2 (i.e.,
      4 megabytes of huge pages) and running something that sets a signal
      handler and forks, like
      
        #include <sys/mman.h>
        #include <signal.h>
        #include <stdlib.h>
        #include <unistd.h>
      
        sig_atomic_t counter = 10;
        void handler(int signal)
        {
            if (counter-- == 0)
               exit(0);
        }
      
        int main(void)
        {
            int status;
            char *addr = mmap(NULL, 4 * 1048576, PROT_READ | PROT_WRITE,
                    MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
            if (addr == MAP_FAILED) {perror("mmap"); return 1;}
            *addr = 'x';
            switch (fork()) {
               case -1:
                  perror("fork"); return 1;
               case 0:
                  signal(SIGBUS, handler);
                  *addr = 'x';
                  break;
               default:
                  *addr = 'x';
                  wait(&status);
                  if (WIFSIGNALED(status)) {
                     psignal(WTERMSIG(status), "child");
                  }
                  break;
            }
        }
      Signed-off-by: NGeoffrey Thomas <geofft@ldpreload.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      910154d5
  21. 19 2月, 2016 1 次提交
    • V
      mm/hugetlb.c: fix incorrect proc nr_hugepages value · f8b74815
      Vaishali Thakkar 提交于
      Currently incorrect default hugepage pool size is reported by proc
      nr_hugepages when number of pages for the default huge page size is
      specified twice.
      
      When multiple huge page sizes are supported, /proc/sys/vm/nr_hugepages
      indicates the current number of pre-allocated huge pages of the default
      size.  Basically /proc/sys/vm/nr_hugepages displays default_hstate->
      max_huge_pages and after boot time pre-allocation, max_huge_pages should
      equal the number of pre-allocated pages (nr_hugepages).
      
      Test case:
      
      Note that this is specific to x86 architecture.
      
      Boot the kernel with command line option 'default_hugepagesz=1G
      hugepages=X hugepagesz=2M hugepages=Y hugepagesz=1G hugepages=Z'.  After
      boot, 'cat /proc/sys/vm/nr_hugepages' and 'sysctl -a | grep hugepages'
      returns the value X.  However, dmesg output shows that Z huge pages were
      pre-allocated.
      
      So, the root cause of the problem here is that the global variable
      default_hstate_max_huge_pages is set if a default huge page size is
      specified (directly or indirectly) on the command line.  After the command
      line processing in hugetlb_init, if default_hstate_max_huge_pages is set,
      the value is assigned to default_hstae.max_huge_pages.  However,
      default_hstate.max_huge_pages may have already been set based on the
      number of pre-allocated huge pages of default_hstate size.
      
      The solution to this problem is if hstate->max_huge_pages is already set
      then it should not set as a result of global max_huge_pages value.
      Basically if the value of the variable hugepages is set multiple times on
      a command line for a specific supported hugepagesize then proc layer
      should consider the last specified value.
      Signed-off-by: NVaishali Thakkar <vaishali.thakkar@oracle.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8b74815
  22. 06 2月, 2016 1 次提交
    • V
      mm, hugetlb: don't require CMA for runtime gigantic pages · 080fe206
      Vlastimil Babka 提交于
      Commit 944d9fec ("hugetlb: add support for gigantic page allocation
      at runtime") has added the runtime gigantic page allocation via
      alloc_contig_range(), making this support available only when CONFIG_CMA
      is enabled.  Because it doesn't depend on MIGRATE_CMA pageblocks and the
      associated infrastructure, it is possible with few simple adjustments to
      require only CONFIG_MEMORY_ISOLATION instead of full CONFIG_CMA.
      
      After this patch, alloc_contig_range() and related functions are
      available and used for gigantic pages with just CONFIG_MEMORY_ISOLATION
      enabled.  Note CONFIG_CMA selects CONFIG_MEMORY_ISOLATION.  This allows
      supporting runtime gigantic pages without the CMA-specific checks in
      page allocator fastpaths.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      080fe206