- 23 2月, 2017 4 次提交
-
-
由 Andrea Arcangeli 提交于
Add support for VM_FAULT_RETRY to follow_hugetlb_page() so that get_user_pages_unlocked/locked and "nonblocking/FOLL_NOWAIT" features will work on hugetlbfs. This is required for fully functional userfaultfd non-present support on hugetlbfs. Link: http://lkml.kernel.org/r/20161216144821.5183-25-aarcange@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michael Rapoport <RAPOPORT@il.ibm.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
When processing a hugetlb fault for no page present, check the vma to determine if faults are to be handled via userfaultfd. If so, drop the hugetlb_fault_mutex and call handle_userfault(). Link: http://lkml.kernel.org/r/20161216144821.5183-21-aarcange@redhat.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Michael Rapoport <RAPOPORT@il.ibm.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
The new routine copy_huge_page_from_user() uses kmap_atomic() to map PAGE_SIZE pages. However, this prevents page faults in the subsequent call to copy_from_user(). This is OK in the case where the routine is copied with mmap_sema held. However, in another case we want to allow page faults. So, add a new argument allow_pagefault to indicate if the routine should allow page faults. [dan.carpenter@oracle.com: unmap the correct pointer] Link: http://lkml.kernel.org/r/20170113082608.GA3548@mwanda [akpm@linux-foundation.org: kunmap() takes a page*, per Hugh] Link: http://lkml.kernel.org/r/20161216144821.5183-20-aarcange@redhat.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michael Rapoport <RAPOPORT@il.ibm.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hugh Dickins <hughd@google.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
hugetlb_mcopy_atomic_pte is the low level routine that implements the userfaultfd UFFDIO_COPY command. It is based on the existing mcopy_atomic_pte routine with modifications for huge pages. Link: http://lkml.kernel.org/r/20161216144821.5183-18-aarcange@redhat.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michael Rapoport <RAPOPORT@il.ibm.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 1月, 2017 1 次提交
-
-
由 Mike Kravetz 提交于
return_unused_surplus_pages() decrements the global reservation count, and frees any unused surplus pages that were backing the reservation. Commit 7848a4bf ("mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()") added a call to cond_resched_lock in the loop freeing the pages. As a result, the hugetlb_lock could be dropped, and someone else could use the pages that will be freed in subsequent iterations of the loop. This could result in inconsistent global hugetlb page state, application api failures (such as mmap) failures or application crashes. When dropping the lock in return_unused_surplus_pages, make sure that the global reservation count (resv_huge_pages) remains sufficiently large to prevent someone else from claiming pages about to be freed. Analyzed by Paul Cassella. Fixes: 7848a4bf ("mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()") Link: http://lkml.kernel.org/r/1483991767-6879-1-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com> Reported-by: NPaul Cassella <cassella@cray.com> Suggested-by: NMichal Hocko <mhocko@kernel.org> Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> [3.15+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 12月, 2016 4 次提交
-
-
由 Aneesh Kumar K.V 提交于
With commit e77b0852 ("mm/mmu_gather: track page size with mmu gather and force flush if page size change") we added the ability to force a tlb flush when the page size change in a mmu_gather loop. We did that by checking for a page size change every time we added a page to mmu_gather for lazy flush/remove. We can improve that by moving the page size change check early and not doing it every time we add a page. This also helps us to do tlb flush when invalidating a range covering dax mapping. Wrt dax mapping we don't have a backing struct page and hence we don't call tlb_remove_page, which earlier forced the tlb flush on page size change. Moving the page size change check earlier means we will do the same even for dax mapping. We also avoid doing this check on architecture other than powerpc. In a later patch we will remove page size check from tlb_remove_page(). Link: http://lkml.kernel.org/r/20161026084839.27299-5-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
This add tlb_remove_hugetlb_entry similar to tlb_remove_pmd_tlb_entry. Link: http://lkml.kernel.org/r/20161026084839.27299-4-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
No functional change by this patch. Link: http://lkml.kernel.org/r/20161018090234.22574-1-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
We cannot use the pte value used in set_pte_at for pte_same comparison, because archs like ppc64, filter/add new pte flag in set_pte_at. Instead fetch the pte value inside hugetlb_cow. We are comparing pte value to make sure the pte didn't change since we dropped the page table lock. hugetlb_cow get called with page table lock held, and we can take a copy of the pte value before we drop the page table lock. With hugetlbfs, we optimize the MAP_PRIVATE write fault path with no previous mapping (huge_pte_none entries), by forcing a cow in the fault path. This avoid take an addition fault to covert a read-only mapping to read/write. Here we were comparing a recently instantiated pte (via set_pte_at) to the pte values from linux page table. As explained above on ppc64 such pte_same check returned wrong result, resulting in us taking an additional fault on ppc64. Fixes: 6a119eae ("powerpc/mm: Add a _PAGE_PTE bit") Link: http://lkml.kernel.org/r/20161018154245.18023-1-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reported-by: NJan Stancek <jstancek@redhat.com> Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Scott Wood <scottwood@freescale.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 11月, 2016 1 次提交
-
-
由 Mike Kravetz 提交于
Error paths in hugetlb_cow() and hugetlb_no_page() may free a newly allocated huge page. If a reservation was associated with the huge page, alloc_huge_page() consumed the reservation while allocating. When the newly allocated page is freed in free_huge_page(), it will increment the global reservation count. However, the reservation entry in the reserve map will remain. This is not an issue for shared mappings as the entry in the reserve map indicates a reservation exists. But, an entry in a private mapping reserve map indicates the reservation was consumed and no longer exists. This results in an inconsistency between the reserve map and the global reservation count. This 'leaks' a reserved huge page. Create a new routine restore_reserve_on_error() to restore the reserve entry in these specific error paths. This routine makes use of a new function vma_add_reservation() which will add a reserve entry for a specific address/page. In general, these error paths were rarely (if ever) taken on most architectures. However, powerpc contained arch specific code that that resulted in an extra fault and execution of these error paths on all private mappings. Fixes: 67961f9d ("mm/hugetlb: fix huge page reserve accounting for private mappings) Link: http://lkml.kernel.org/r/1476933077-23091-2-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com> Reported-by: NJan Stancek <jstancek@redhat.com> Tested-by: NJan Stancek <jstancek@redhat.com> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Kirill A . Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 10月, 2016 5 次提交
-
-
由 zhong jiang 提交于
When the huge page is added to the page cahce (huge_add_to_page_cache), the page private flag will be cleared. since this code (remove_inode_hugepages) will only be called for pages in the page cahce, PagePrivate(page) will always be false. The patch remove the code without any functional change. Link: http://lkml.kernel.org/r/1475113323-29368-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhong jiang <zhongjiang@huawei.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Tested-by: NMike Kravetz <mike.kravetz@oracle.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yisheng Xie 提交于
Avoid making ifdef get pretty unwieldy if many ARCHs support gigantic page. No functional change with this patch. Link: http://lkml.kernel.org/r/1475227569-63446-2-git-send-email-xieyisheng1@huawei.comSigned-off-by: NYisheng Xie <xieyisheng1@huawei.com> Suggested-by: NMichal Hocko <mhocko@suse.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gerald Schaefer 提交于
For every pfn aligned to minimum_order, dissolve_free_huge_pages() will call dissolve_free_huge_page() which takes the hugetlb spinlock, even if the page is not huge at all or a hugepage that is in-use. Improve this by doing the PageHuge() and page_count() checks already in dissolve_free_huge_pages() before calling dissolve_free_huge_page(). In dissolve_free_huge_page(), when holding the spinlock, those checks need to be revalidated. Link: http://lkml.kernel.org/r/20160926172811.94033-4-gerald.schaefer@de.ibm.comSigned-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Rui Teng <rui.teng@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gerald Schaefer 提交于
In dissolve_free_huge_pages(), free hugepages will be dissolved without making sure that there are enough of them left to satisfy hugepage reservations. Fix this by adding a return value to dissolve_free_huge_pages() and checking h->free_huge_pages vs. h->resv_huge_pages. Note that this may lead to the situation where dissolve_free_huge_page() returns an error and all free hugepages that were dissolved before that error are lost, while the memory block still cannot be set offline. Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Link: http://lkml.kernel.org/r/20160926172811.94033-3-gerald.schaefer@de.ibm.comSigned-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Rui Teng <rui.teng@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gerald Schaefer 提交于
Patch series "mm/hugetlb: memory offline issues with hugepages", v4. This addresses several issues with hugepages and memory offline. While the first patch fixes a panic, and is therefore rather important, the last patch is just a performance optimization. The second patch fixes a theoretical issue with reserved hugepages, while still leaving some ugly usability issue, see description. This patch (of 3): dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a list corruption and addressing exception when trying to set a memory block offline that is part (but not the first part) of a "gigantic" hugetlb page with a size > memory block size. When no other smaller hugetlb page sizes are present, the VM_BUG_ON() will trigger directly. In the other case we will run into an addressing exception later, because dissolve_free_huge_page() will not work on the head page of the compound hugetlb page which will result in a NULL hstate from page_hstate(). To fix this, first remove the VM_BUG_ON() because it is wrong, and then use the compound head page in dissolve_free_huge_page(). This means that an unused pre-allocated gigantic page that has any part of itself inside the memory block that is going offline will be dissolved completely. Losing an unused gigantic hugepage is preferable to failing the memory offline, for example in the situation where a (possibly faulty) memory DIMM needs to go offline. Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Link: http://lkml.kernel.org/r/20160926172811.94033-2-gerald.schaefer@de.ibm.comSigned-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Rui Teng <rui.teng@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 8月, 2016 1 次提交
-
-
由 zhong jiang 提交于
When memory hotplug operates, free hugepages will be freed if the movable node is offline. Therefore, /proc/sys/vm/nr_hugepages will be incorrect. Fix it by reducing max_huge_pages when the node is offlined. n-horiguchi@ah.jp.nec.com said: : dissolve_free_huge_page intends to break a hugepage into buddy, and the : destination hugepage is supposed to be allocated from the pool of the : destination node, so the system-wide pool size is reduced. So adding : h->max_huge_pages-- makes sense to me. Link: http://lkml.kernel.org/r/1470624546-902-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhong jiang <zhongjiang@huawei.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 8月, 2016 2 次提交
-
-
由 Michal Hocko 提交于
Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he runs his database load with memory online and offline running in parallel. The reason is that huge_pmd_share might detect a shared pmd which is currently migrated and so it has migration pte which is !pte_huge. There doesn't seem to be any easy way to prevent from the race and in fact seeing the migration swap entry is not harmful. Both callers of huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range will copy the swap entry and make it COW if needed. hugetlb_fault will back off and so the page fault is retries if the page is still under migration and waits for its completion in hugetlb_fault. That means that the BUG_ON is wrong and we should update it. Let's simply check that all present ptes are pte_huge instead. Link: http://lkml.kernel.org/r/20160721074340.GA26398@dhcp22.suse.czSigned-off-by: NMichal Hocko <mhocko@suse.com> Reported-by: Nzhongjiang <zhongjiang@huawei.com> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jia He 提交于
In powerpc servers with large memory(32TB), we watched several soft lockups for hugepage under stress tests. The call traces are as follows: 1. get_page_from_freelist+0x2d8/0xd50 __alloc_pages_nodemask+0x180/0xc20 alloc_fresh_huge_page+0xb0/0x190 set_max_huge_pages+0x164/0x3b0 2. prep_new_huge_page+0x5c/0x100 alloc_fresh_huge_page+0xc8/0x190 set_max_huge_pages+0x164/0x3b0 This patch fixes such soft lockups. It is safe to call cond_resched() there because it is out of spin_lock/unlock section. Link: http://lkml.kernel.org/r/1469674442-14848-1-git-send-email-hejianet@gmail.comSigned-off-by: NJia He <hejianet@gmail.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 8月, 2016 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
Some archs like ppc64 need to do special things when flushing tlb for hugepage. Add a new helper to flush hugetlb tlb range. This helps us to avoid flushing the entire tlb mapping for the pid. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 29 7月, 2016 1 次提交
-
-
由 Naoya Horiguchi 提交于
dequeue_hwpoisoned_huge_page() can be called without page lock hold, so let's remove incorrect comment. The reason why the page lock is not really needed is that dequeue_hwpoisoned_huge_page() checks page_huge_active() inside hugetlb_lock, which allows us to avoid trying to dequeue a hugepage that are just allocated but not linked to active list yet, even without taking page lock. Link: http://lkml.kernel.org/r/20160720092901.GA15995@www9186uo.sakura.ne.jpSigned-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: NZhan Chen <zhanc1@andrew.cmu.edu> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 7月, 2016 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
This allows an arch which needs to do special handing with respect to different page size when flushing tlb to implement the same in mmu gather. Link: http://lkml.kernel.org/r/1465049193-22197-3-git-send-email-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
For hugetlb like THP (and unlike regular page), we do tlb flush after dropping ptl. Because of the above, we don't need to track force_flush like we do now. Instead we can simply call tlb_remove_page() which will do the flush if needed. No functionality change in this patch. Link: http://lkml.kernel.org/r/1465049193-22197-1-git-send-email-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 7月, 2016 1 次提交
-
-
由 Hugh Dickins 提交于
The VM_BUG_ON_PAGE in page_move_anon_rmap() is more trouble than it's worth: the syzkaller fuzzer hit it again. It's still wrong for some THP cases, because linear_page_index() was never intended to apply to addresses before the start of a vma. That's easily fixed with a signed long cast inside linear_page_index(); and Dmitry has tested such a patch, to verify the false positive. But why extend linear_page_index() just for this case? when the avoidance in page_move_anon_rmap() has already grown ugly, and there's no reason for the check at all (nothing else there is using address or index). Remove address arg from page_move_anon_rmap(), remove VM_BUG_ON_PAGE, remove CONFIG_DEBUG_VM PageTransHuge adjustment. And one more thing: should the compound_head(page) be done inside or outside page_move_anon_rmap()? It's usually pushed down to the lowest level nowadays (and mm/memory.c shows no other explicit use of it), so I think it's better done in page_move_anon_rmap() than by caller. Fixes: 0798d3c0 ("mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()") Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1607120444540.12528@eggly.anvilsSigned-off-by: NHugh Dickins <hughd@google.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 7月, 2016 1 次提交
-
-
由 Gerald Schaefer 提交于
This adds support for 2GB hugetlbfs pages on s390. Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 25 6月, 2016 2 次提交
-
-
由 Gerald Schaefer 提交于
While working on s390 support for gigantic hugepages I ran into the following "Bad page state" warning when freeing gigantic pages: BUG: Bad page state in process bash pfn:580001 page:000003d116000040 count:0 mapcount:0 mapping:ffffffff00000000 index:0x0 flags: 0x7fffc0000000000() page dumped because: non-NULL mapping This is because page->compound_mapcount, which is part of a union with page->mapping, is initialized with -1 in prep_compound_gigantic_page(), and not cleared again during destroy_compound_gigantic_page(). Fix this by clearing the compound_mapcount in destroy_compound_gigantic_page() before clearing compound_head. Interestingly enough, the warning will not show up on x86_64, although this should not be architecture specific. Apparently there is an endianness issue, combined with the fact that the union contains both a 64 bit ->mapping pointer and a 32 bit atomic_t ->compound_mapcount as members. The resulting bogus page->mapping on x86_64 therefore contains 00000000ffffffff instead of ffffffff00000000 on s390, which will falsely trigger the PageAnon() check in free_pages_prepare() because page->mapping & PAGE_MAPPING_ANON is true on little-endian architectures like x86_64 in this case (the page is not compound anymore, ->compound_head was already cleared before). As a result, page->mapping will be cleared before doing the checks in free_pages_check(). Not sure if the bogus "PageAnon() returning true" on x86_64 for the first tail page of a gigantic page (at this stage) has other theoretical implications, but they would also be fixed with this patch. Link: http://lkml.kernel.org/r/1466612719-5642-1-git-send-email-gerald.schaefer@de.ibm.comSigned-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
We account HugeTLB's shared page table to all processes who share it. The accounting happens during huge_pmd_share(). If somebody populates pud entry under us, we should decrease pagetable's refcount and decrease nr_pmds of the process. By mistake, I increase nr_pmds again in this case. :-/ It will lead to "BUG: non-zero nr_pmds on freeing mm: 2" on process' exit. Let's fix this by increasing nr_pmds only when we're sure that the page table will be used. Link: http://lkml.kernel.org/r/20160617122506.GC6534@node.shutemov.name Fixes: dc6c9a35 ("mm: account pmd page tables to the process") Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Nzhongjiang <zhongjiang@huawei.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 6月, 2016 1 次提交
-
-
由 Mike Kravetz 提交于
When creating a private mapping of a hugetlbfs file, it is possible to unmap pages via ftruncate or fallocate hole punch. If subsequent faults repopulate these mappings, the reserve counts will go negative. This is because the code currently assumes all faults to private mappings will consume reserves. The problem can be recreated as follows: - mmap(MAP_PRIVATE) a file in hugetlbfs filesystem - write fault in pages in the mapping - fallocate(FALLOC_FL_PUNCH_HOLE) some pages in the mapping - write fault in pages in the hole This will result in negative huge page reserve counts and negative subpool usage counts for the hugetlbfs. Note that this can also be recreated with ftruncate, but fallocate is more straight forward. This patch modifies the routines vma_needs_reserves and vma_has_reserves to examine the reserve map associated with private mappings similar to that for shared mappings. However, the reserve map semantics for private and shared mappings are very different. This results in subtly different code that is explained in the comments. Link: http://lkml.kernel.org/r/1464720957-15698-1-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com> Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 5月, 2016 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 21 5月, 2016 1 次提交
-
-
由 Dan Williams 提交于
The "Device DAX" core enables dax mappings of performance / feature differentiated memory. An open mapping or file handle keeps the backing struct device live, but new mappings are only possible while the device is enabled. Faults are handled under rcu_read_lock to synchronize with the enabled state of the device. Similar to the filesystem-dax case the backing memory may optionally have struct page entries. However, unlike fs-dax there is no support for private mappings, or mappings that are not backed by media (see use of zero-page in fs-dax). Mappings are always guaranteed to match the alignment of the dax_region. If the dax_region is configured to have a 2MB alignment, all mappings are guaranteed to be backed by a pmd entry. Contrast this determinism with the fs-dax case where pmd mappings are opportunistic. If userspace attempts to force a misaligned mapping, the driver will fail the mmap attempt. See dax_dev_check_vma() for other scenarios that are rejected, like MAP_PRIVATE mappings. Cc: Hannes Reinecke <hare@suse.de> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 20 5月, 2016 5 次提交
-
-
由 Joonsoo Kim 提交于
This patchset deals with some problematic sites that iterate pfn ranges. There is a system thats node's pfns are overlapped as follows: -----pfn--------> N0 N1 N2 N0 N1 N2 Therefore, we need to take care of this overlapping when iterating pfn range. I audit many iterating sites that uses pfn_valid(), pfn_valid_within(), zone_start_pfn and etc. and others looks safe to me. This is a preparation step for a new CMA implementation, ZONE_CMA (https://lkml.org/lkml/2015/2/12/95), because it would be easily overlapped with other zones. But, zone overlap check is also needed for the general case so I send it separately. This patch (of 5): alloc_gigantic_page() uses alloc_contig_range() and this requires that the requested range is in a single zone. To satisfy this requirement, add this check to pfn_range_valid_gigantic(). Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
Instead of open-coding it. Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vaishali Thakkar 提交于
When any unsupported hugepage size is specified, 'hugepagesz=' and 'hugepages=' should be ignored during command line parsing until any supported hugepage size is found. But currently incorrect number of hugepages are allocated when unsupported size is specified as it fails to ignore the 'hugepages=' command. Test case: Note that this is specific to x86 architecture. Boot the kernel with command line option 'hugepagesz=256M hugepages=X'. After boot, dmesg output shows that X number of hugepages of the size 2M is pre-allocated instead of 0. So, to handle such command line options, introduce new routine hugetlb_bad_size. The routine hugetlb_bad_size sets the global variable parsed_valid_hugepagesz. We are using parsed_valid_hugepagesz to save the state when unsupported hugepagesize is found so that we can ignore the 'hugepages=' parameters after that and then reset the variable when supported hugepage size is found. The routine hugetlb_bad_size can be called while setting 'hugepagesz=' parameter in an architecture specific code. Signed-off-by: NVaishali Thakkar <vaishali.thakkar@oracle.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
It was observed that minimum size accounting associated with the hugetlbfs min_size mount option may not perform optimally and as expected. As huge pages/reservations are released from the filesystem and given back to the global pools, they are reserved for subsequent filesystem use as long as the subpool reserved count is less than subpool minimum size. It does not take into account used pages within the filesystem. The filesystem size limits are not exceeded and this is technically not a bug. However, better behavior would be to wait for the number of used pages/reservations associated with the filesystem to drop below the minimum size before taking reservations to satisfy minimum size. An optimization is also made to the hugepage_subpool_get_pages() routine which is called when pages/reservations are allocated. This does not change behavior, but simply avoids the accounting if all reservations have already been taken (subpool reserved count == 0). Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
Lots of code does node = next_node(node, XXX); if (node == MAX_NUMNODES) node = first_node(XXX); so create next_node_in() to do this and use it in various places. [mhocko@suse.com: use next_node_in() helper] Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Hocko <mhocko@kernel.org> Signed-off-by: NMichal Hocko <mhocko@suse.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Hui Zhu <zhuhui@xiaomi.com> Cc: Wang Xiaoqiang <wangxq10@lzu.edu.cn> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 4月, 2016 1 次提交
-
-
由 Kirill A. Shutemov 提交于
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 3月, 2016 1 次提交
-
-
由 Joe Perches 提交于
There are a mixture of pr_warning and pr_warn uses in mm. Use pr_warn consistently. Miscellanea: - Coalesce formats - Realign arguments Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: Tejun Heo <tj@kernel.org> [percpu] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 3月, 2016 2 次提交
-
-
由 Jan Stancek 提交于
Replace ENOTSUPP with EOPNOTSUPP. If hugepages are not supported, this value is propagated to userspace. EOPNOTSUPP is part of uapi and is widely supported by libc libraries. It gives nicer message to user, rather than: # cat /proc/sys/vm/nr_hugepages cat: /proc/sys/vm/nr_hugepages: Unknown error 524 And also LTP's proc01 test was failing because this ret code (524) was unexpected: proc01 1 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_hugepages: errno=???(524): Unknown error 524 proc01 2 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_hugepages_mempolicy: errno=???(524): Unknown error 524 proc01 3 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_overcommit_hugepages: errno=???(524): Unknown error 524 Signed-off-by: NJan Stancek <jstancek@redhat.com> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Geoffrey Thomas 提交于
The warning message "killed due to inadequate hugepage pool" simply indicates that SIGBUS was sent, not that the process was forcibly killed. If the process has a signal handler installed does not fix the problem, this message can rapidly spam the kernel log. On my amd64 dev machine that does not have hugepages configured, I can reproduce the repeated warnings easily by setting vm.nr_hugepages=2 (i.e., 4 megabytes of huge pages) and running something that sets a signal handler and forks, like #include <sys/mman.h> #include <signal.h> #include <stdlib.h> #include <unistd.h> sig_atomic_t counter = 10; void handler(int signal) { if (counter-- == 0) exit(0); } int main(void) { int status; char *addr = mmap(NULL, 4 * 1048576, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); if (addr == MAP_FAILED) {perror("mmap"); return 1;} *addr = 'x'; switch (fork()) { case -1: perror("fork"); return 1; case 0: signal(SIGBUS, handler); *addr = 'x'; break; default: *addr = 'x'; wait(&status); if (WIFSIGNALED(status)) { psignal(WTERMSIG(status), "child"); } break; } } Signed-off-by: NGeoffrey Thomas <geofft@ldpreload.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 2月, 2016 1 次提交
-
-
由 Vaishali Thakkar 提交于
Currently incorrect default hugepage pool size is reported by proc nr_hugepages when number of pages for the default huge page size is specified twice. When multiple huge page sizes are supported, /proc/sys/vm/nr_hugepages indicates the current number of pre-allocated huge pages of the default size. Basically /proc/sys/vm/nr_hugepages displays default_hstate-> max_huge_pages and after boot time pre-allocation, max_huge_pages should equal the number of pre-allocated pages (nr_hugepages). Test case: Note that this is specific to x86 architecture. Boot the kernel with command line option 'default_hugepagesz=1G hugepages=X hugepagesz=2M hugepages=Y hugepagesz=1G hugepages=Z'. After boot, 'cat /proc/sys/vm/nr_hugepages' and 'sysctl -a | grep hugepages' returns the value X. However, dmesg output shows that Z huge pages were pre-allocated. So, the root cause of the problem here is that the global variable default_hstate_max_huge_pages is set if a default huge page size is specified (directly or indirectly) on the command line. After the command line processing in hugetlb_init, if default_hstate_max_huge_pages is set, the value is assigned to default_hstae.max_huge_pages. However, default_hstate.max_huge_pages may have already been set based on the number of pre-allocated huge pages of default_hstate size. The solution to this problem is if hstate->max_huge_pages is already set then it should not set as a result of global max_huge_pages value. Basically if the value of the variable hugepages is set multiple times on a command line for a specific supported hugepagesize then proc layer should consider the last specified value. Signed-off-by: NVaishali Thakkar <vaishali.thakkar@oracle.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 2月, 2016 1 次提交
-
-
由 Vlastimil Babka 提交于
Commit 944d9fec ("hugetlb: add support for gigantic page allocation at runtime") has added the runtime gigantic page allocation via alloc_contig_range(), making this support available only when CONFIG_CMA is enabled. Because it doesn't depend on MIGRATE_CMA pageblocks and the associated infrastructure, it is possible with few simple adjustments to require only CONFIG_MEMORY_ISOLATION instead of full CONFIG_CMA. After this patch, alloc_contig_range() and related functions are available and used for gigantic pages with just CONFIG_MEMORY_ISOLATION enabled. Note CONFIG_CMA selects CONFIG_MEMORY_ISOLATION. This allows supporting runtime gigantic pages without the CMA-specific checks in page allocator fastpaths. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-