- 09 9月, 2015 4 次提交
-
-
由 Mike Kravetz 提交于
Modify truncate_hugepages() to take a range of pages (start, end) instead of simply start. If an end value of LLONG_MAX is passed, the current "truncate" functionality is maintained. Existing callers are modified to pass LLONG_MAX as end of range. By keying off end == LLONG_MAX, the routine behaves differently for truncate and hole punch. Page removal is now synchronized with page allocation via faults by using the fault mutex table. The hole punch case can experience the rare region_del error and must handle accordingly. Add the routine hugetlb_fix_reserve_counts to fix up reserve counts in the case where region_del returns an error. Since the routine handles more than just the truncate case, it is renamed to remove_inode_hugepages(). To be consistent, the routine truncate_huge_page() is renamed remove_huge_page(). Downstream of remove_inode_hugepages(), the routine hugetlb_unreserve_pages() is also modified to take a range of pages. hugetlb_unreserve_pages is modified to detect an error from region_del and pass it back to the caller. Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
hugetlb page faults are currently synchronized by the table of mutexes (htlb_fault_mutex_table). fallocate code will need to synchronize with the page fault code when it allocates or deletes pages. Expose interfaces so that fallocate operations can be synchronized with page faults. Minor name changes to be more consistent with other global hugetlb symbols. Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
fallocate hole punch will want to remove a specific range of pages. The existing region_truncate() routine deletes all region/reserve map entries after a specified offset. region_del() will provide this same functionality if the end of region is specified as LONG_MAX. Hence, region_del() can replace region_truncate(). Unlike region_truncate(), region_del() can return an error in the rare case where it can not allocate memory for a region descriptor. This ONLY happens in the case where an existing region must be split. Current callers passing LONG_MAX as end of range will never experience this error and do not need to deal with error handling. Future callers of region_del() (such as fallocate hole punch) will need to handle this error. Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
hugetlbfs is used today by applications that want a high degree of control over huge page usage. Often, large hugetlbfs files are used to map a large number huge pages into the application processes. The applications know when page ranges within these large files will no longer be used, and ideally would like to release them back to the subpool or global pools for other uses. The fallocate() system call provides an interface for preallocation and hole punching within files. This patch set adds fallocate functionality to hugetlbfs. fallocate hole punch will want to remove a specific range of pages. When pages are removed, their associated entries in the region/reserve map will also be removed. This will break an assumption in the region_chg/region_add calling sequence. If a new region descriptor must be allocated, it is done as part of the region_chg processing. In this way, region_add can not fail because it does not need to attempt an allocation. To prepare for fallocate hole punch, create a "cache" of descriptors that can be used by region_add if necessary. region_chg will ensure there are sufficient entries in the cache. It will be necessary to track the number of in progress add operations to know a sufficient number of descriptors reside in the cache. A new routine region_abort is added to adjust this in progress count when add operations are aborted. vma_abort_reservation is also added for callers creating reservations with vma_needs_reservation/vma_commit_reservation. [akpm@linux-foundation.org: fix typo in comment, use more cols] Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 9月, 2015 2 次提交
-
-
由 Nicholas Krause 提交于
This makes vma_has_reserves() return bool due to this particular function only returning either one or zero as its return value. Signed-off-by: NNicholas Krause <xerofoify@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nicholas Krause 提交于
This makes vma_shareable() return bool now due to this particular function only ever returning either one or zero as its return value. Signed-off-by: NNicholas Krause <xerofoify@gmail.com> Acked-by: NMike Kravetz <mike.kravetz@oracle.com> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 6月, 2015 1 次提交
-
-
由 Dominik Dingel 提交于
With s390 dropping support for emulated hugepages, the last user of arch_prepare_hugepage and arch_release_hugepage is gone. Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 6月, 2015 5 次提交
-
-
由 Mike Kravetz 提交于
alloc_huge_page and hugetlb_reserve_pages use region_chg to calculate the number of pages which will be added to the reserve map. Subpool and global reserve counts are adjusted based on the output of region_chg. Before the pages are actually added to the reserve map, these routines could race and add fewer pages than expected. If this happens, the subpool and global reserve counts are not correct. Compare the number of pages actually added (region_add) to those expected to added (region_chg). If fewer pages are actually added, this indicates a race and adjust counters accordingly. Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: NDavidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
Modify region_add() to keep track of regions(pages) added to the reserve map and return this value. The return value can be compared to the return value of region_chg() to determine if the map was modified between calls. Make vma_commit_reservation() also pass along the return value of region_add(). In the normal case, we want vma_commit_reservation to return the same value as the preceding call to vma_needs_reservation. Create a common __vma_reservation_common routine to help keep the special case return values in sync Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
While working on hugetlbfs fallocate support, I noticed the following race in the existing code. It is unlikely that this race is hit very often in the current code. However, if more functionality to add and remove pages to hugetlbfs mappings (such as fallocate) is added the likelihood of hitting this race will increase. alloc_huge_page and hugetlb_reserve_pages use information from the reserve map to determine if there are enough available huge pages to complete the operation, as well as adjust global reserve and subpool usage counts. The order of operations is as follows: - call region_chg() to determine the expected change based on reserve map - determine if enough resources are available for this operation - adjust global counts based on the expected change - call region_add() to update the reserve map The issue is that reserve map could change between the call to region_chg and region_add. In this case, the counters which were adjusted based on the output of region_chg will not be correct. In order to hit this race today, there must be an existing shared hugetlb mmap created with the MAP_NORESERVE flag. A page fault to allocate a huge page via this mapping must occur at the same another task is mapping the same region without the MAP_NORESERVE flag. The patch set does not prevent the race from happening. Rather, it adds simple functionality to detect when the race has occurred. If a race is detected, then the incorrect counts are adjusted. Review comments pointed out the need for documentation of the existing region/reserve map routines. This patch set also adds documentation in this area. This patch (of 3): This is a documentation only patch and does not modify any code. Descriptions of the routines used for reserve map/region tracking are added. Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
Currently the initial value of order in dissolve_free_huge_page is 64 or 32, which leads to the following warning in static checker: mm/hugetlb.c:1203 dissolve_free_huge_pages() warn: potential right shift more than type allows '9,18,64' This is a potential risk of infinite loop, because 1 << order (== 0) is used in for-loop like this: for (pfn =3D start_pfn; pfn < end_pfn; pfn +=3D 1 << order) ... So this patch fixes it by using global minimum_order calculated at boot time. text data bss dec hex filename 28313 469 84236 113018 1b97a mm/hugetlb.o 28256 473 84236 112965 1b945 mm/hugetlb.o (patched) Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zhang Zhen 提交于
Currently we have many duplicates in definitions of huge_pmd_unshare. In all architectures this function just returns 0 when CONFIG_ARCH_WANT_HUGE_PMD_SHARE is N. This patch puts the default implementation in mm/hugetlb.c and lets these architectures use the common code. Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: David Rientjes <rientjes@google.com> Cc: James Yang <James.Yang@freescale.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 4月, 2015 5 次提交
-
-
由 Naoya Horiguchi 提交于
Now we have an easy access to hugepages' activeness, so existing helpers to get the information can be cleaned up. [akpm@linux-foundation.org: s/PageHugeActive/page_huge_active/] Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
We are not safe from calling isolate_huge_page() on a hugepage concurrently, which can make the victim hugepage in invalid state and results in BUG_ON(). The root problem of this is that we don't have any information on struct page (so easily accessible) about hugepages' activeness. Note that hugepages' activeness means just being linked to hstate->hugepage_activelist, which is not the same as normal pages' activeness represented by PageActive flag. Normal pages are isolated by isolate_lru_page() which prechecks PageLRU before isolation, so let's do similarly for hugetlb with a new paeg_huge_active(). set/clear_page_huge_active() should be called within hugetlb_lock. But hugetlb_cow() and hugetlb_no_page() don't do this, being justified because in these functions set_page_huge_active() is called right after the hugepage is allocated and no other thread tries to isolate it. [akpm@linux-foundation.org: s/PageHugeActive/page_huge_active/, make it return bool] [fengguang.wu@intel.com: set_page_huge_active() can be static] Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
Make 'min_size=<value>' be an option when mounting a hugetlbfs. This option takes the same value as the 'size' option. min_size can be specified without specifying size. If both are specified, min_size must be less that or equal to size else the mount will fail. If min_size is specified, then at mount time an attempt is made to reserve min_size pages. If the reservation fails, the mount fails. At umount time, the reserved pages are released. Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
The same routines that perform subpool maximum size accounting hugepage_subpool_get/put_pages() are modified to also perform minimum size accounting. When a delta value is passed to these routines, calculate how global reservations must be adjusted to maintain the subpool minimum size. The routines now return this global reserve count adjustment. This global reserve count adjustment is then passed to the global accounting routine hugetlb_acct_memory(). Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
hugetlbfs allocates huge pages from the global pool as needed. Even if the global pool contains a sufficient number pages for the filesystem size at mount time, those global pages could be grabbed for some other use. As a result, filesystem huge page allocations may fail due to lack of pages. Applications such as a database want to use huge pages for performance reasons. hugetlbfs filesystem semantics with ownership and modes work well to manage access to a pool of huge pages. However, the application would like some reasonable assurance that allocations will not fail due to a lack of huge pages. At application startup time, the application would like to configure itself to use a specific number of huge pages. Before starting, the application can check to make sure that enough huge pages exist in the system global pools. However, there are no guarantees that those pages will be available when needed by the application. What the application wants is exclusive use of a subset of huge pages. Add a new hugetlbfs mount option 'min_size=<value>' to indicate that the specified number of pages will be available for use by the filesystem. At mount time, this number of huge pages will be reserved for exclusive use of the filesystem. If there is not a sufficient number of free pages, the mount will fail. As pages are allocated to and freeed from the filesystem, the number of reserved pages is adjusted so that the specified minimum is maintained. This patch (of 4): Add a field to the subpool structure to indicate the minimimum number of huge pages to always be used by this subpool. This minimum count includes allocated pages as well as reserved pages. If the minimum number of pages for the subpool have not been allocated, pages are reserved up to this minimum. An additional field (rsv_hpages) is used to track the number of pages reserved to meet this minimum size. The hstate pointer in the subpool is convenient to have when reserving and unreserving the pages. Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 4月, 2015 2 次提交
-
-
由 David Rientjes 提交于
If __get_user_pages() is faulting a significant number of hugetlb pages, usually as the result of mmap(MAP_LOCKED), it can potentially allocate a very large amount of memory. If the process has been oom killed, this will cause a lot of memory to potentially deplete memory reserves. In the same way that commit 4779280d ("mm: make get_user_pages() interruptible") aborted for pending SIGKILLs when faulting non-hugetlb memory, based on the premise of commit 462e00cc ("oom: stop allocating user memory if TIF_MEMDIE is set"), hugetlb page faults now terminate when the process has been oom killed. Signed-off-by: NDavid Rientjes <rientjes@google.com> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: NGreg Thelen <gthelen@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NDavidlohr Bueso <dave@stgolabs.net> Acked-by: N"Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gerald Schaefer 提交于
Commit 61f77eda ("mm/hugetlb: reduce arch dependent code around follow_huge_*") broke follow_huge_pmd() on s390, where pmd and pte layout differ and using pte_page() on a huge pmd will return wrong results. Using pmd_page() instead fixes this. All architectures that were touched by that commit have pmd_page() defined, so this should not break anything on other architectures. Fixes: 61f77eda "mm/hugetlb: reduce arch dependent code around follow_huge_*" Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@suse.cz>, Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 3月, 2015 1 次提交
-
-
由 David Rientjes 提交于
Now that gigantic pages are dynamically allocatable, care must be taken to ensure that p->first_page is valid before setting PageTail. If this isn't done, then it is possible to race and have compound_head() return NULL. Signed-off-by: NDavid Rientjes <rientjes@google.com> Acked-by: NDavidlohr Bueso <dave@stgolabs.net> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 2月, 2015 7 次提交
-
-
由 Kirill A. Shutemov 提交于
Dave noticed that unprivileged process can allocate significant amount of memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and memory cgroup. The trick is to allocate a lot of PMD page tables. Linux kernel doesn't account PMD tables to the process, only PTE. The use-cases below use few tricks to allocate a lot of PMD page tables while keeping VmRSS and VmPTE low. oom_score for the process will be 0. #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> #include <sys/prctl.h> #define PUD_SIZE (1UL << 30) #define PMD_SIZE (1UL << 21) #define NR_PUD 130000 int main(void) { char *addr = NULL; unsigned long i; prctl(PR_SET_THP_DISABLE); for (i = 0; i < NR_PUD ; i++) { addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); if (addr == MAP_FAILED) { perror("mmap"); break; } *addr = 'x'; munmap(addr, PMD_SIZE); mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0); if (addr == MAP_FAILED) perror("re-mmap"), exit(1); } printf("PID %d consumed %lu KiB in PMD page tables\n", getpid(), i * 4096 >> 10); return pause(); } The patch addresses the issue by account PMD tables to the process the same way we account PTE. The main place where PMD tables is accounted is __pmd_alloc() and free_pmd_range(). But there're few corner cases: - HugeTLB can share PMD page tables. The patch handles by accounting the table to all processes who share it. - x86 PAE pre-allocates few PMD tables on fork. - Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity check on exit(2). Accounting only happens on configuration where PMD page table's level is present (PMD is not folded). As with nr_ptes we use per-mm counter. The counter value is used to calculate baseline for badness score by oom-killer. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: David Rientjes <rientjes@google.com> Tested-by: NSedat Dilek <sedat.dilek@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
If __unmap_hugepage_range() tries to unmap the address range over which hugepage migration is on the way, we get the wrong page because pte_page() doesn't work for migration entries. This patch simply clears the pte for migration entries as we do for hwpoison entries. Fixes: 290408d4 ("hugetlb: hugepage migration core") Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: <stable@vger.kernel.org> [2.6.36+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
There is a race condition between hugepage migration and change_protection(), where hugetlb_change_protection() doesn't care about migration entries and wrongly overwrites them. That causes unexpected results like kernel crash. HWPoison entries also can cause the same problem. This patch adds is_hugetlb_entry_(migration|hwpoisoned) check in this function to do proper actions. Fixes: 290408d4 ("hugetlb: hugepage migration core") Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: <stable@vger.kernel.org> [2.6.36+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
When running the test which causes the race as shown in the previous patch, we can hit the BUG "get_page() on refcount 0 page" in hugetlb_fault(). This race happens when pte turns into migration entry just after the first check of is_hugetlb_entry_migration() in hugetlb_fault() passed with false. To fix this, we need to check pte_present() again after huge_ptep_get(). This patch also reorders taking ptl and doing pte_page(), because pte_page() should be done in ptl. Due to this reordering, we need use trylock_page() in page != pagecache_page case to respect locking order. Fixes: 66aebce7 ("hugetlb: fix race condition in hugetlb_fault()") Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: <stable@vger.kernel.org> [3.2+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
We have a race condition between move_pages() and freeing hugepages, where move_pages() calls follow_page(FOLL_GET) for hugepages internally and tries to get its refcount without preventing concurrent freeing. This race crashes the kernel, so this patch fixes it by moving FOLL_GET code for hugepages into follow_huge_pmd() with taking the page table lock. This patch intentionally removes page==NULL check after pte_page. This is justified because pte_page() never returns NULL for any architectures or configurations. This patch changes the behavior of follow_huge_pmd() for tail pages and then tail pages can be pinned/returned. So the caller must be changed to properly handle the returned tail pages. We could have a choice to add the similar locking to follow_huge_(addr|pud) for consistency, but it's not necessary because currently these functions don't support FOLL_GET flag, so let's leave it for future development. Here is the reproducer: $ cat movepages.c #include <stdio.h> #include <stdlib.h> #include <numaif.h> #define ADDR_INPUT 0x700000000000UL #define HPS 0x200000 #define PS 0x1000 int main(int argc, char *argv[]) { int i; int nr_hp = strtol(argv[1], NULL, 0); int nr_p = nr_hp * HPS / PS; int ret; void **addrs; int *status; int *nodes; pid_t pid; pid = strtol(argv[2], NULL, 0); addrs = malloc(sizeof(char *) * nr_p + 1); status = malloc(sizeof(char *) * nr_p + 1); nodes = malloc(sizeof(char *) * nr_p + 1); while (1) { for (i = 0; i < nr_p; i++) { addrs[i] = (void *)ADDR_INPUT + i * PS; nodes[i] = 1; status[i] = 0; } ret = numa_move_pages(pid, nr_p, addrs, nodes, status, MPOL_MF_MOVE_ALL); if (ret == -1) err("move_pages"); for (i = 0; i < nr_p; i++) { addrs[i] = (void *)ADDR_INPUT + i * PS; nodes[i] = 0; status[i] = 0; } ret = numa_move_pages(pid, nr_p, addrs, nodes, status, MPOL_MF_MOVE_ALL); if (ret == -1) err("move_pages"); } return 0; } $ cat hugepage.c #include <stdio.h> #include <sys/mman.h> #include <string.h> #define ADDR_INPUT 0x700000000000UL #define HPS 0x200000 int main(int argc, char *argv[]) { int nr_hp = strtol(argv[1], NULL, 0); char *p; while (1) { p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); if (p != (void *)ADDR_INPUT) { perror("mmap"); break; } memset(p, 0, nr_hp * HPS); munmap(p, nr_hp * HPS); } } $ sysctl vm.nr_hugepages=40 $ ./hugepage 10 & $ ./movepages 10 $(pgrep -f hugepage) Fixes: e632a938 ("mm: migrate: add hugepage migration code to move_pages()") Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: NHugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
Migrating hugepages and hwpoisoned hugepages are considered as non-present hugepages, and they are referenced via migration entries and hwpoison entries in their page table slots. This behavior causes race condition because pmd_huge() doesn't tell non-huge pages from migrating/hwpoisoned hugepages. follow_page_mask() is one example where the kernel would call follow_page_pte() for such hugepage while this function is supposed to handle only normal pages. To avoid this, this patch makes pmd_huge() return true when pmd_none() is true *and* pmd_present() is false. We don't have to worry about mixing up non-present pmd entry with normal pmd (pointing to leaf level pte entry) because pmd_present() is true in normal pmd. The same race condition could happen in (x86-specific) gup_pmd_range(), where this patch simply adds pmd_present() check instead of pmd_huge(). This is because gup_pmd_range() is fast path. If we have non-present hugepage in this function, we will go into gup_huge_pmd(), then return 0 at flag mask check, and finally fall back to the slow path. Fixes: 290408d4 ("hugetlb: hugepage migration core") Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: <stable@vger.kernel.org> [2.6.36+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
Currently we have many duplicates in definitions around follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this patch tries to remove the m. The basic idea is to put the default implementation for these functions in mm/hugetlb.c as weak symbols (regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement arch-specific code only when the arch needs it. For follow_huge_addr(), only powerpc and ia64 have their own implementation, and in all other architectures this function just returns ERR_PTR(-EINVAL). So this patch sets returning ERR_PTR(-EINVAL) as default. As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to always return 0 in your architecture (like in ia64 or sparc,) it's never called (the callsite is optimized away) no matter how implemented it is. So in such architectures, we don't need arch-specific implementation. In some architecture (like mips, s390 and tile,) their current arch-specific follow_huge_(pmd|pud)() are effectively identical with the common code, so this patch lets these architecture use the common code. One exception is metag, where pmd_huge() could return non-zero but it expects follow_huge_pmd() to always return NULL. This means that we need arch-specific implementation which returns NULL. This behavior looks strange to me (because non-zero pmd_huge() implies that the architecture supports PMD-based hugepage, so follow_huge_pmd() can/should return some relevant value,) but that's beyond this cleanup patch, so let's keep it. Justification of non-trivial changes: - in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE is true when follow_huge_pmd() can be called (note that pmd_huge() has the same check and always returns 0 for !MACHINE_HAS_HPAGE.) - in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common code. This patch forces these archs use PMD_MASK, but it's OK because they are identical in both archs. In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20. In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but PTE_ORDER is always 0, so these are identical. Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 2月, 2015 1 次提交
-
-
由 Andrey Ryabinin 提交于
hugetlb_treat_as_movable declared as unsigned long, but proc_dointvec() used for parsing it: static struct ctl_table vm_table[] = { ... { .procname = "hugepages_treat_as_movable", .data = &hugepages_treat_as_movable, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec, }, This seems harmless, but it's better to use int type here. Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Manfred Spraul <manfred@colorfullife.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 12月, 2014 4 次提交
-
-
由 Luiz Capitulino 提交于
This function is only called during initialization. Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Luiz Capitulino 提交于
No reason to duplicate the code of an existing macro. Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
The i_mmap_mutex is a close cousin of the anon vma lock, both protecting similar data, one for file backed pages and the other for anon memory. To this end, this lock can also be a rwsem. In addition, there are some important opportunities to share the lock when there are no tree modifications. This conversion is straightforward. For now, all users take the write lock. [sfr@canb.auug.org.au: update fremap.c] Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Reviewed-by: NRik van Riel <riel@redhat.com> Acked-by: N"Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: NHugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
Convert all open coded mutex_lock/unlock calls to the i_mmap_[lock/unlock]_write() helpers. Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: N"Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: NHugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 12月, 2014 1 次提交
-
-
由 Hillf Danton 提交于
First, after flushing TLB, we have no need to scan pte from start again. Second, before bail out loop, the address is forwarded one step. Signed-off-by: NHillf Danton <hillf.zj@alibaba-inc.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 10月, 2014 1 次提交
-
-
由 Vladimir Davydov 提交于
Current cpuset API for checking if a zone/node is allowed to allocate from looks rather awkward. We have hardwall and softwall versions of cpuset_node_allowed with the softwall version doing literally the same as the hardwall version if __GFP_HARDWALL is passed to it in gfp flags. If it isn't, the softwall version may check the given node against the enclosing hardwall cpuset, which it needs to take the callback lock to do. Such a distinction was introduced by commit 02a0e53d ("cpuset: rework cpuset_zone_allowed api"). Before, we had the only version with the __GFP_HARDWALL flag determining its behavior. The purpose of the commit was to avoid sleep-in-atomic bugs when someone would mistakenly call the function without the __GFP_HARDWALL flag for an atomic allocation. The suffixes introduced were intended to make the callers think before using the function. However, since the callback lock was converted from mutex to spinlock by the previous patch, the softwall check function cannot sleep, and these precautions are no longer necessary. So let's simplify the API back to the single check. Suggested-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NVladimir Davydov <vdavydov@parallels.com> Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NZefan Li <lizefan@huawei.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 10 10月, 2014 1 次提交
-
-
由 Sasha Levin 提交于
Trivially convert a few VM_BUG_ON calls to VM_BUG_ON_VMA to extract more information when they trigger. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NSasha Levin <sasha.levin@oracle.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michel Lespinasse <walken@google.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 8月, 2014 5 次提交
-
-
由 Li Zhong 提交于
It is possible for some platforms, such as powerpc to set HPAGE_SHIFT to 0 to indicate huge pages not supported. When this is the case, hugetlbfs could be disabled during boot time: hugetlbfs: disabling because there are no supported hugepage sizes Then in dissolve_free_huge_pages(), order is kept maximum (64 for 64bits), and the for loop below won't end: for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << order) As suggested by Naoya, below fix checks hugepages_supported() before calling dissolve_free_huge_pages(). [rientjes@google.com: no legitimate reason to call dissolve_free_huge_pages() when !hugepages_supported()] Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com> Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
They are unnecessary: "zero" can be used in place of "hugetlb_zero" and passing extra2 == NULL is equivalent to infinity. Signed-off-by: NDavid Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: NLuiz Capitulino <lcapitulino@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
Three different interfaces alter the maximum number of hugepages for an hstate: - /proc/sys/vm/nr_hugepages for global number of hugepages of the default hstate, - /sys/kernel/mm/hugepages/hugepages-X/nr_hugepages for global number of hugepages for a specific hstate, and - /sys/kernel/mm/hugepages/hugepages-X/nr_hugepages/mempolicy for number of hugepages for a specific hstate over the set of allowed nodes. Generalize the code so that a single function handles all of these writes instead of duplicating the code in two different functions. This decreases the number of lines of code, but also reduces the size of .text by about half a percent since set_max_huge_pages() can be inlined. Signed-off-by: NDavid Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: NLuiz Capitulino <lcapitulino@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Acked-by: NDavidlohr Bueso <davidlohr@hp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
When returning from hugetlb_cow(), we always (1) put back the refcount for each referenced page -- always 'old', and 'new' if allocation was successful. And (2) retake the page table lock right before returning, as the callers expects. This logic can be simplified and encapsulated, as proposed in this patch. In addition to cleaner code, we also shave a few bytes off the instruction text: text data bss dec hex filename 28399 462 41328 70189 1122d mm/hugetlb.o-baseline 28367 462 41328 70157 1120d mm/hugetlb.o-patched Passes libhugetlbfs testcases. Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
This function always returns 1, thus no need to check return value in hugetlb_cow(). By doing so, we can get rid of the unnecessary WARN_ON call. While this logic perhaps existed as a way of identifying future unmap_ref_private() mishandling, reality is it serves no apparent purpose. Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-