• D
    mm, hugetlb: use memory policy when available · 099730d6
    Dave Hansen 提交于
    I have a hugetlbfs user which is never explicitly allocating huge pages
    with 'nr_hugepages'.  They only set 'nr_overcommit_hugepages' and then let
    the pages be allocated from the buddy allocator at fault time.
    
    This works, but they noticed that mbind() was not doing them any good and
    the pages were being allocated without respect for the policy they
    specified.
    
    The code in question is this:
    
    > struct page *alloc_huge_page(struct vm_area_struct *vma,
    ...
    >         page = dequeue_huge_page_vma(h, vma, addr, avoid_reserve, gbl_chg);
    >         if (!page) {
    >                 page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
    
    dequeue_huge_page_vma() is smart and will respect the VMA's memory policy.
     But, it only grabs _existing_ huge pages from the huge page pool.  If the
    pool is empty, we fall back to alloc_buddy_huge_page() which obviously
    can't do anything with the VMA's policy because it isn't even passed the
    VMA.
    
    Almost everybody preallocates huge pages.  That's probably why nobody has
    ever noticed this.  Looking back at the git history, I don't think this
    _ever_ worked from when alloc_buddy_huge_page() was introduced in
    7893d1d5, 8 years ago.
    
    The fix is to pass vma/addr down in to the places where we actually call
    in to the buddy allocator.  It's fairly straightforward plumbing.  This
    has been lightly tested.
    Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
    Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
    Cc: David Rientjes <rientjes@google.com>
    Acked-by: NVlastimil Babka <vbabka@suse.cz>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    099730d6
hugetlb.c 117.6 KB