1. 01 2月, 2018 4 次提交
    • M
      mm, hugetlb: do not rely on overcommit limit during migration · ab5ac90a
      Michal Hocko 提交于
      hugepage migration relies on __alloc_buddy_huge_page to get a new page.
      This has 2 main disadvantages.
      
      1) it doesn't allow to migrate any huge page if the pool is used
         completely which is not an exceptional case as the pool is static and
         unused memory is just wasted.
      
      2) it leads to a weird semantic when migration between two numa nodes
         might increase the pool size of the destination NUMA node while the
         page is in use.  The issue is caused by per NUMA node surplus pages
         tracking (see free_huge_page).
      
      Address both issues by changing the way how we allocate and account
      pages allocated for migration.  Those should temporal by definition.  So
      we mark them that way (we will abuse page flags in the 3rd page) and
      update free_huge_page to free such pages to the page allocator.  Page
      migration path then just transfers the temporal status from the new page
      to the old one which will be freed on the last reference.  The global
      surplus count will never change during this path but we still have to be
      careful when migrating a per-node suprlus page.  This is now handled in
      move_hugetlb_state which is called from the migration path and it copies
      the hugetlb specific page state and fixes up the accounting when needed
      
      Rename __alloc_buddy_huge_page to __alloc_surplus_huge_page to better
      reflect its purpose.  The new allocation routine for the migration path
      is __alloc_migrate_huge_page.
      
      The user visible effect of this patch is that migrated pages are really
      temporal and they travel between NUMA nodes as per the migration
      request:
      
      Before migration
        /sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0
        /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:1
        /sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
        /sys/devices/system/node/node1/hugepages/hugepages-2048kB/free_hugepages:0
        /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:0
        /sys/devices/system/node/node1/hugepages/hugepages-2048kB/surplus_hugepages:0
      
      After
        /sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0
        /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:0
        /sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
        /sys/devices/system/node/node1/hugepages/hugepages-2048kB/free_hugepages:0
        /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:1
        /sys/devices/system/node/node1/hugepages/hugepages-2048kB/surplus_hugepages:0
      
      with the previous implementation, both nodes would have nr_hugepages:1
      until the page is freed.
      
      Link: http://lkml.kernel.org/r/20180103093213.26329-4-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andrea Reale <ar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab5ac90a
    • M
      hugetlb: implement memfd sealing · ff62a342
      Marc-André Lureau 提交于
      Implements memfd sealing, similar to shmem:
       - WRITE: deny fallocate(PUNCH_HOLE). mmap() write is denied in
         memfd_add_seals(). write() doesn't exist for hugetlbfs.
       - SHRINK: added similar check as shmem_setattr()
       - GROW: added similar check as shmem_setattr() & shmem_fallocate()
      
      Except write() operation that doesn't exist with hugetlbfs, that should
      make sealing as close as it can be to shmem support.
      
      Link: http://lkml.kernel.org/r/20171107122800.25517-5-marcandre.lureau@redhat.comSigned-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: David Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff62a342
    • M
      hugetlb: expose hugetlbfs_inode_info in header · da14c1e5
      Marc-André Lureau 提交于
      hugetlbfs inode information will need to be accessed by code in
      mm/shmem.c for file sealing operations.  Move inode information
      definition from .c file to header for needed access.
      
      Link: http://lkml.kernel.org/r/20171107122800.25517-4-marcandre.lureau@redhat.comSigned-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: David Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da14c1e5
    • M
      mm, hugetlb: remove hugepages_treat_as_movable sysctl · d6cb41cc
      Michal Hocko 提交于
      hugepages_treat_as_movable has been introduced by 396faf03 ("Allow
      huge page allocations to use GFP_HIGH_MOVABLE") to allow hugetlb
      allocations from ZONE_MOVABLE even when hugetlb pages were not
      migrateable.  The purpose of the movable zone was different at the time.
      It aimed at reducing memory fragmentation and hugetlb pages being long
      lived and large werre not contributing to the fragmentation so it was
      acceptable to use the zone back then.
      
      Things have changed though and the primary purpose of the zone became
      migratability guarantee.  If we allow non migrateable hugetlb pages to
      be in ZONE_MOVABLE memory hotplug might fail to offline the memory.
      
      Remove the knob and only rely on hugepage_migration_supported to allow
      movable zones.
      
      Mel said:
      
      : Primarily it was aimed at allowing the hugetlb pool to safely shrink with
      : the ability to grow it again.  The use case was for batched jobs, some of
      : which needed huge pages and others that did not but didn't want the memory
      : useless pinned in the huge pages pool.
      :
      : I suspect that more users rely on THP than hugetlbfs for flexible use of
      : huge pages with fallback options so I think that removing the option
      : should be ok.
      
      Link: http://lkml.kernel.org/r/20171003072619.8654-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NAlexandru Moise <00moses.alexander00@gmail.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Alexandru Moise <00moses.alexander00@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d6cb41cc
  2. 30 11月, 2017 1 次提交
    • D
      mm: fix device-dax pud write-faults triggered by get_user_pages() · 1501899a
      Dan Williams 提交于
      Currently only get_user_pages_fast() can safely handle the writable gup
      case due to its use of pud_access_permitted() to check whether the pud
      entry is writable.  In the gup slow path pud_write() is used instead of
      pud_access_permitted() and to date it has been unimplemented, just calls
      BUG_ON().
      
          kernel BUG at ./include/linux/hugetlb.h:244!
          [..]
          RIP: 0010:follow_devmap_pud+0x482/0x490
          [..]
          Call Trace:
           follow_page_mask+0x28c/0x6e0
           __get_user_pages+0xe4/0x6c0
           get_user_pages_unlocked+0x130/0x1b0
           get_user_pages_fast+0x89/0xb0
           iov_iter_get_pages_alloc+0x114/0x4a0
           nfs_direct_read_schedule_iovec+0xd2/0x350
           ? nfs_start_io_direct+0x63/0x70
           nfs_file_direct_read+0x1e0/0x250
           nfs_file_read+0x90/0xc0
      
      For now this just implements a simple check for the _PAGE_RW bit similar
      to pmd_write.  However, this implies that the gup-slow-path check is
      missing the extra checks that the gup-fast-path performs with
      pud_access_permitted.  Later patches will align all checks to use the
      'access_permitted' helper if the architecture provides it.
      
      Note that the generic 'access_permitted' helper fallback is the simple
      _PAGE_RW check on architectures that do not define the
      'access_permitted' helper(s).
      
      [dan.j.williams@intel.com: fix powerpc compile error]
        Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: a00cc7d9 ("mm, x86: add support for PUD-sized transparent hugepages")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: Thomas Gleixner <tglx@linutronix.de>	[x86]
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1501899a
  3. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  4. 15 8月, 2017 1 次提交
    • A
      mm/hugetlb: Allow arch to override and call the weak function · e24a1307
      Aneesh Kumar K.V 提交于
      When running in guest mode ppc64 supports a different mechanism for hugetlb
      allocation/reservation. The LPAR management application called HMC can
      be used to reserve a set of hugepages and we pass the details of
      reserved pages via device tree to the guest. (more details in
      htab_dt_scan_hugepage_blocks()) . We do the memblock_reserve of the range
      and later in the boot sequence, we add the reserved range to huge_boot_pages.
      
      But to enable 16G hugetlb on baremetal config (when we are not running as guest)
      we want to do memblock reservation during boot. Generic code already does this
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e24a1307
  5. 11 7月, 2017 5 次提交
  6. 07 7月, 2017 7 次提交
  7. 06 7月, 2017 1 次提交
    • D
      hugetlbfs: Implement show_options · 4a25220d
      David Howells 提交于
      Implement the show_options superblock op for hugetlbfs as part of a bid to
      get rid of s_options and generic_show_options() to make it easier to
      implement a context-based mount where the mount options can be passed
      individually over a file descriptor.
      
      Note that the uid and gid should possibly be displayed relative to the
      viewer's user namespace.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Nadia Yvette Chambers <nyc@holomorphy.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4a25220d
  8. 10 3月, 2017 1 次提交
  9. 23 2月, 2017 2 次提交
  10. 08 10月, 2016 2 次提交
  11. 21 5月, 2016 2 次提交
  12. 20 5月, 2016 1 次提交
    • V
      mm/hugetlb: introduce hugetlb_bad_size() · 9fee021d
      Vaishali Thakkar 提交于
      When any unsupported hugepage size is specified, 'hugepagesz=' and
      'hugepages=' should be ignored during command line parsing until any
      supported hugepage size is found.  But currently incorrect number of
      hugepages are allocated when unsupported size is specified as it fails
      to ignore the 'hugepages=' command.
      
      Test case:
      
      Note that this is specific to x86 architecture.
      
      Boot the kernel with command line option 'hugepagesz=256M hugepages=X'.
      After boot, dmesg output shows that X number of hugepages of the size 2M
      is pre-allocated instead of 0.
      
      So, to handle such command line options, introduce new routine
      hugetlb_bad_size.  The routine hugetlb_bad_size sets the global variable
      parsed_valid_hugepagesz.  We are using parsed_valid_hugepagesz to save
      the state when unsupported hugepagesize is found so that we can ignore
      the 'hugepages=' parameters after that and then reset the variable when
      supported hugepage size is found.
      
      The routine hugetlb_bad_size can be called while setting 'hugepagesz='
      parameter in an architecture specific code.
      Signed-off-by: NVaishali Thakkar <vaishali.thakkar@oracle.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9fee021d
  13. 16 1月, 2016 1 次提交
  14. 15 1月, 2016 1 次提交
  15. 22 12月, 2015 1 次提交
    • D
      arm64: hugetlb: add support for PTE contiguous bit · 66b3923a
      David Woods 提交于
      The arm64 MMU supports a Contiguous bit which is a hint that the TTE
      is one of a set of contiguous entries which can be cached in a single
      TLB entry.  Supporting this bit adds new intermediate huge page sizes.
      
      The set of huge page sizes available depends on the base page size.
      Without using contiguous pages the huge page sizes are as follows.
      
       4KB:   2MB  1GB
      64KB: 512MB
      
      With a 4KB granule, the contiguous bit groups together sets of 16 pages
      and with a 64KB granule it groups sets of 32 pages.  This enables two new
      huge page sizes in each case, so that the full set of available sizes
      is as follows.
      
       4KB:  64KB   2MB  32MB  1GB
      64KB:   2MB 512MB  16GB
      
      If a 16KB granule is used then the contiguous bit groups 128 pages
      at the PTE level and 32 pages at the PMD level.
      
      If the base page size is set to 64KB then 2MB pages are enabled by
      default.  It is possible in the future to make 2MB the default huge
      page size for both 4KB and 64KB granules.
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Reviewed-by: NSteve Capper <steve.capper@linaro.org>
      Signed-off-by: NDavid Woods <dwoods@ezchip.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      66b3923a
  16. 06 11月, 2015 1 次提交
  17. 09 9月, 2015 5 次提交
    • M
      hugetlbfs: add hugetlbfs_fallocate() · 70c3547e
      Mike Kravetz 提交于
      This is based on the shmem version, but it has diverged quite a bit.  We
      have no swap to worry about, nor the new file sealing.  Add
      synchronication via the fault mutex table to coordinate page faults,
      fallocate allocation and fallocate hole punch.
      
      What this allows us to do is move physical memory in and out of a
      hugetlbfs file without having it mapped.  This also gives us the ability
      to support MADV_REMOVE since it is currently implemented using
      fallocate().  MADV_REMOVE lets madvise() remove pages from the middle of
      a hugetlbfs file, which wasn't possible before.
      
      hugetlbfs fallocate only operates on whole huge pages.
      
      Based on code by Dave Hansen.
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70c3547e
    • M
      hugetlbfs: New huge_add_to_page_cache helper routine · ab76ad54
      Mike Kravetz 提交于
      Currently, there is only a single place where hugetlbfs pages are added
      to the page cache.  The new fallocate code be adding a second one, so
      break the functionality out into its own helper.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab76ad54
    • M
      hugetlbfs: truncate_hugepages() takes a range of pages · b5cec28d
      Mike Kravetz 提交于
      Modify truncate_hugepages() to take a range of pages (start, end)
      instead of simply start.  If an end value of LLONG_MAX is passed, the
      current "truncate" functionality is maintained.  Existing callers are
      modified to pass LLONG_MAX as end of range.  By keying off end ==
      LLONG_MAX, the routine behaves differently for truncate and hole punch.
      Page removal is now synchronized with page allocation via faults by
      using the fault mutex table.  The hole punch case can experience the
      rare region_del error and must handle accordingly.
      
      Add the routine hugetlb_fix_reserve_counts to fix up reserve counts in
      the case where region_del returns an error.
      
      Since the routine handles more than just the truncate case, it is
      renamed to remove_inode_hugepages().  To be consistent, the routine
      truncate_huge_page() is renamed remove_huge_page().
      
      Downstream of remove_inode_hugepages(), the routine
      hugetlb_unreserve_pages() is also modified to take a range of pages.
      hugetlb_unreserve_pages is modified to detect an error from region_del and
      pass it back to the caller.
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5cec28d
    • M
      mm/hugetlb: expose hugetlb fault mutex for use by fallocate · c672c7f2
      Mike Kravetz 提交于
      hugetlb page faults are currently synchronized by the table of mutexes
      (htlb_fault_mutex_table).  fallocate code will need to synchronize with
      the page fault code when it allocates or deletes pages.  Expose
      interfaces so that fallocate operations can be synchronized with page
      faults.  Minor name changes to be more consistent with other global
      hugetlb symbols.
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c672c7f2
    • M
      mm/hugetlb: add cache of descriptors to resv_map for region_add · 5e911373
      Mike Kravetz 提交于
      hugetlbfs is used today by applications that want a high degree of
      control over huge page usage.  Often, large hugetlbfs files are used to
      map a large number huge pages into the application processes.  The
      applications know when page ranges within these large files will no
      longer be used, and ideally would like to release them back to the
      subpool or global pools for other uses.  The fallocate() system call
      provides an interface for preallocation and hole punching within files.
      This patch set adds fallocate functionality to hugetlbfs.
      
      fallocate hole punch will want to remove a specific range of pages.
      When pages are removed, their associated entries in the region/reserve
      map will also be removed.  This will break an assumption in the
      region_chg/region_add calling sequence.  If a new region descriptor must
      be allocated, it is done as part of the region_chg processing.  In this
      way, region_add can not fail because it does not need to attempt an
      allocation.
      
      To prepare for fallocate hole punch, create a "cache" of descriptors
      that can be used by region_add if necessary.  region_chg will ensure
      there are sufficient entries in the cache.  It will be necessary to
      track the number of in progress add operations to know a sufficient
      number of descriptors reside in the cache.  A new routine region_abort
      is added to adjust this in progress count when add operations are
      aborted.  vma_abort_reservation is also added for callers creating
      reservations with vma_needs_reservation/vma_commit_reservation.
      
      [akpm@linux-foundation.org: fix typo in comment, use more cols]
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e911373
  18. 18 7月, 2015 1 次提交
  19. 16 4月, 2015 2 次提交