1. 26 11月, 2021 1 次提交
  2. 15 10月, 2021 1 次提交
  3. 29 9月, 2021 1 次提交
  4. 14 7月, 2021 2 次提交
  5. 03 6月, 2021 1 次提交
  6. 09 3月, 2021 1 次提交
  7. 13 8月, 2020 1 次提交
  8. 08 8月, 2020 1 次提交
  9. 10 6月, 2020 1 次提交
  10. 04 6月, 2020 1 次提交
    • S
      hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs · 88590253
      Shijie Hu 提交于
      In a 32-bit program, running on arm64 architecture.  When the address
      space below mmap base is completely exhausted, shmat() for huge pages will
      return ENOMEM, but shmat() for normal pages can still success on no-legacy
      mode.  This seems not fair.
      
      For normal pages, the calling trace of get_unmapped_area() is:
      
      	=> mm->get_unmapped_area()
      	if on legacy mode,
      		=> arch_get_unmapped_area()
      			=> vm_unmapped_area()
      	if on no-legacy mode,
      		=> arch_get_unmapped_area_topdown()
      			=> vm_unmapped_area()
      
      For huge pages, the calling trace of get_unmapped_area() is:
      
      	=> file->f_op->get_unmapped_area()
      		=> hugetlb_get_unmapped_area()
      			=> vm_unmapped_area()
      
      To solve this issue, we only need to make hugetlb_get_unmapped_area() take
      the same way as mm->get_unmapped_area().  Add *bottomup() and *topdown()
      for hugetlbfs, and check current mm->get_unmapped_area() to decide which
      one to use.  If mm->get_unmapped_area is equal to
      arch_get_unmapped_area_topdown(), hugetlb_get_unmapped_area() calls
      topdown routine, otherwise calls bottomup routine.
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NShijie Hu <hushijie3@huawei.com>
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xiaoming Ni <nixiaoming@huawei.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: yangerkun <yangerkun@huawei.com>
      Cc: ChenGang <cg.chen@huawei.com>
      Cc: Chen Jie <chenjie6@huawei.com>
      Link: http://lkml.kernel.org/r/20200518065338.113664-1-hushijie3@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88590253
  11. 03 4月, 2020 2 次提交
    • M
      hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race · 87bf91d3
      Mike Kravetz 提交于
      hugetlbfs page faults can race with truncate and hole punch operations.
      Current code in the page fault path attempts to handle this by 'backing
      out' operations if we encounter the race.  One obvious omission in the
      current code is removing a page newly added to the page cache.  This is
      pretty straight forward to address, but there is a more subtle and
      difficult issue of backing out hugetlb reservations.  To handle this
      correctly, the 'reservation state' before page allocation needs to be
      noted so that it can be properly backed out.  There are four distinct
      possibilities for reservation state: shared/reserved, shared/no-resv,
      private/reserved and private/no-resv.  Backing out a reservation may
      require memory allocation which could fail so that needs to be taken
      into account as well.
      
      Instead of writing the required complicated code for this rare
      occurrence, just eliminate the race.  i_mmap_rwsem is now held in read
      mode for the duration of page fault processing.  Hold i_mmap_rwsem in
      write mode when modifying i_size.  In this way, truncation can not
      proceed when page faults are being processed.  In addition, i_size
      will not change during fault processing so a single check can be made
      to ensure faults are not beyond (proposed) end of file.  Faults can
      still race with hole punch, but that race is handled by existing code
      and the use of hugetlb_fault_mutex.
      
      With this modification, checks for races with truncation in the page
      fault path can be simplified and removed.  remove_inode_hugepages no
      longer needs to take hugetlb_fault_mutex in the case of truncation.
      Comments are expanded to explain reasoning behind locking.
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Link: http://lkml.kernel.org/r/20200316205756.146666-3-mike.kravetz@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87bf91d3
    • M
      hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization · c0d0381a
      Mike Kravetz 提交于
      Patch series "hugetlbfs: use i_mmap_rwsem for more synchronization", v2.
      
      While discussing the issue with huge_pte_offset [1], I remembered that
      there were more outstanding hugetlb races.  These issues are:
      
      1) For shared pmds, huge PTE pointers returned by huge_pte_alloc can become
         invalid via a call to huge_pmd_unshare by another thread.
      2) hugetlbfs page faults can race with truncation causing invalid global
         reserve counts and state.
      
      A previous attempt was made to use i_mmap_rwsem in this manner as
      described at [2].  However, those patches were reverted starting with [3]
      due to locking issues.
      
      To effectively use i_mmap_rwsem to address the above issues it needs to be
      held (in read mode) during page fault processing.  However, during fault
      processing we need to lock the page we will be adding.  Lock ordering
      requires we take page lock before i_mmap_rwsem.  Waiting until after
      taking the page lock is too late in the fault process for the
      synchronization we want to do.
      
      To address this lock ordering issue, the following patches change the lock
      ordering for hugetlb pages.  This is not too invasive as hugetlbfs
      processing is done separate from core mm in many places.  However, I don't
      really like this idea.  Much ugliness is contained in the new routine
      hugetlb_page_mapping_lock_write() of patch 1.
      
      The only other way I can think of to address these issues is by catching
      all the races.  After catching a race, cleanup, backout, retry ...  etc,
      as needed.  This can get really ugly, especially for huge page
      reservations.  At one time, I started writing some of the reservation
      backout code for page faults and it got so ugly and complicated I went
      down the path of adding synchronization to avoid the races.  Any other
      suggestions would be welcome.
      
      [1] https://lore.kernel.org/linux-mm/1582342427-230392-1-git-send-email-longpeng2@huawei.com/
      [2] https://lore.kernel.org/linux-mm/20181222223013.22193-1-mike.kravetz@oracle.com/
      [3] https://lore.kernel.org/linux-mm/20190103235452.29335-1-mike.kravetz@oracle.com
      [4] https://lore.kernel.org/linux-mm/1584028670.7365.182.camel@lca.pw/
      [5] https://lore.kernel.org/lkml/20200312183142.108df9ac@canb.auug.org.au/
      
      This patch (of 2):
      
      While looking at BUGs associated with invalid huge page map counts, it was
      discovered and observed that a huge pte pointer could become 'invalid' and
      point to another task's page table.  Consider the following:
      
      A task takes a page fault on a shared hugetlbfs file and calls
      huge_pte_alloc to get a ptep.  Suppose the returned ptep points to a
      shared pmd.
      
      Now, another task truncates the hugetlbfs file.  As part of truncation, it
      unmaps everyone who has the file mapped.  If the range being truncated is
      covered by a shared pmd, huge_pmd_unshare will be called.  For all but the
      last user of the shared pmd, huge_pmd_unshare will clear the pud pointing
      to the pmd.  If the task in the middle of the page fault is not the last
      user, the ptep returned by huge_pte_alloc now points to another task's
      page table or worse.  This leads to bad things such as incorrect page
      map/reference counts or invalid memory references.
      
      To fix, expand the use of i_mmap_rwsem as follows:
      - i_mmap_rwsem is held in read mode whenever huge_pmd_share is called.
        huge_pmd_share is only called via huge_pte_alloc, so callers of
        huge_pte_alloc take i_mmap_rwsem before calling.  In addition, callers
        of huge_pte_alloc continue to hold the semaphore until finished with
        the ptep.
      - i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called.
      
      One problem with this scheme is that it requires taking i_mmap_rwsem
      before taking the page lock during page faults.  This is not the order
      specified in the rest of mm code.  Handling of hugetlbfs pages is mostly
      isolated today.  Therefore, we use this alternative locking order for
      PageHuge() pages.
      
               mapping->i_mmap_rwsem
                 hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
                   page->flags PG_locked (lock_page)
      
      To help with lock ordering issues, hugetlb_page_mapping_lock_write() is
      introduced to write lock the i_mmap_rwsem associated with a page.
      
      In most cases it is easy to get address_space via vma->vm_file->f_mapping.
      However, in the case of migration or memory errors for anon pages we do
      not have an associated vma.  A new routine _get_hugetlb_page_mapping()
      will use anon_vma to get address_space in these cases.
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Link: http://lkml.kernel.org/r/20200316205756.146666-2-mike.kravetz@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0d0381a
  12. 08 2月, 2020 3 次提交
  13. 04 1月, 2020 1 次提交
  14. 02 12月, 2019 4 次提交
  15. 05 7月, 2019 1 次提交
  16. 15 5月, 2019 2 次提交
    • M
      hugetlbfs: always use address space in inode for resv_map pointer · f27a5136
      Mike Kravetz 提交于
      Continuing discussion about 58b6e5e8 ("hugetlbfs: fix memory leak for
      resv_map") brought up the issue that inode->i_mapping may not point to the
      address space embedded within the inode at inode eviction time.  The
      hugetlbfs truncate routine handles this by explicitly using inode->i_data.
      However, code cleaning up the resv_map will still use the address space
      pointed to by inode->i_mapping.  Luckily, private_data is NULL for address
      spaces in all such cases today but, there is no guarantee this will
      continue.
      
      Change all hugetlbfs code getting a resv_map pointer to explicitly get it
      from the address space embedded within the inode.  In addition, add more
      comments in the code to indicate why this is being done.
      
      Link: http://lkml.kernel.org/r/20190419204435.16984-1-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: NYufen Yu <yuyufen@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f27a5136
    • M
      hugetlb: use same fault hash key for shared and private mappings · 1b426bac
      Mike Kravetz 提交于
      hugetlb uses a fault mutex hash table to prevent page faults of the
      same pages concurrently.  The key for shared and private mappings is
      different.  Shared keys off address_space and file index.  Private keys
      off mm and virtual address.  Consider a private mappings of a populated
      hugetlbfs file.  A fault will map the page from the file and if needed
      do a COW to map a writable page.
      
      Hugetlbfs hole punch uses the fault mutex to prevent mappings of file
      pages.  It uses the address_space file index key.  However, private
      mappings will use a different key and could race with this code to map
      the file page.  This causes problems (BUG) for the page cache remove
      code as it expects the page to be unmapped.  A sample stack is:
      
      page dumped because: VM_BUG_ON_PAGE(page_mapped(page))
      kernel BUG at mm/filemap.c:169!
      ...
      RIP: 0010:unaccount_page_cache_page+0x1b8/0x200
      ...
      Call Trace:
      __delete_from_page_cache+0x39/0x220
      delete_from_page_cache+0x45/0x70
      remove_inode_hugepages+0x13c/0x380
      ? __add_to_page_cache_locked+0x162/0x380
      hugetlbfs_fallocate+0x403/0x540
      ? _cond_resched+0x15/0x30
      ? __inode_security_revalidate+0x5d/0x70
      ? selinux_file_permission+0x100/0x130
      vfs_fallocate+0x13f/0x270
      ksys_fallocate+0x3c/0x80
      __x64_sys_fallocate+0x1a/0x20
      do_syscall_64+0x5b/0x180
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      There seems to be another potential COW issue/race with this approach
      of different private and shared keys as noted in commit 8382d914
      ("mm, hugetlb: improve page-fault scalability").
      
      Since every hugetlb mapping (even anon and private) is actually a file
      mapping, just use the address_space index key for all mappings.  This
      results in potentially more hash collisions.  However, this should not
      be the common case.
      
      Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com
      Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5
      Fixes: b5cec28d ("hugetlbfs: truncate_hugepages() takes a range of pages")
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b426bac
  17. 02 5月, 2019 1 次提交
  18. 06 4月, 2019 1 次提交
  19. 06 3月, 2019 1 次提交
    • J
      mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd · ab3948f5
      Joel Fernandes (Google) 提交于
      Android uses ashmem for sharing memory regions.  We are looking forward
      to migrating all usecases of ashmem to memfd so that we can possibly
      remove the ashmem driver in the future from staging while also
      benefiting from using memfd and contributing to it.  Note staging
      drivers are also not ABI and generally can be removed at anytime.
      
      One of the main usecases Android has is the ability to create a region
      and mmap it as writeable, then add protection against making any
      "future" writes while keeping the existing already mmap'ed
      writeable-region active.  This allows us to implement a usecase where
      receivers of the shared memory buffer can get a read-only view, while
      the sender continues to write to the buffer.  See CursorWindow
      documentation in Android for more details:
      
        https://developer.android.com/reference/android/database/CursorWindow
      
      This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
      To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
      which prevents any future mmap and write syscalls from succeeding while
      keeping the existing mmap active.
      
      A better way to do F_SEAL_FUTURE_WRITE seal was discussed [1] last week
      where we don't need to modify core VFS structures to get the same
      behavior of the seal.  This solves several side-effects pointed by Andy.
      self-tests are provided in later patch to verify the expected semantics.
      
      [1] https://lore.kernel.org/lkml/20181111173650.GA256781@google.com/
      
      Thanks a lot to Andy for suggestions to improve code.
      
      Link: http://lkml.kernel.org/r/20190112203816.85534-2-joel@joelfernandes.orgSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Acked-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: Marc-Andr Lureau <marcandre.lureau@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab3948f5
  20. 02 3月, 2019 1 次提交
    • M
      hugetlbfs: fix races and page leaks during migration · cb6acd01
      Mike Kravetz 提交于
      hugetlb pages should only be migrated if they are 'active'.  The
      routines set/clear_page_huge_active() modify the active state of hugetlb
      pages.
      
      When a new hugetlb page is allocated at fault time, set_page_huge_active
      is called before the page is locked.  Therefore, another thread could
      race and migrate the page while it is being added to page table by the
      fault code.  This race is somewhat hard to trigger, but can be seen by
      strategically adding udelay to simulate worst case scheduling behavior.
      Depending on 'how' the code races, various BUG()s could be triggered.
      
      To address this issue, simply delay the set_page_huge_active call until
      after the page is successfully added to the page table.
      
      Hugetlb pages can also be leaked at migration time if the pages are
      associated with a file in an explicitly mounted hugetlbfs filesystem.
      For example, consider a two node system with 4GB worth of huge pages
      available.  A program mmaps a 2G file in a hugetlbfs filesystem.  It
      then migrates the pages associated with the file from one node to
      another.  When the program exits, huge page counts are as follows:
      
        node0
        1024    free_hugepages
        1024    nr_hugepages
      
        node1
        0       free_hugepages
        1024    nr_hugepages
      
        Filesystem                         Size  Used Avail Use% Mounted on
        nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool
      
      That is as expected.  2G of huge pages are taken from the free_hugepages
      counts, and 2G is the size of the file in the explicitly mounted
      filesystem.  If the file is then removed, the counts become:
      
        node0
        1024    free_hugepages
        1024    nr_hugepages
      
        node1
        1024    free_hugepages
        1024    nr_hugepages
      
        Filesystem                         Size  Used Avail Use% Mounted on
        nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool
      
      Note that the filesystem still shows 2G of pages used, while there
      actually are no huge pages in use.  The only way to 'fix' the filesystem
      accounting is to unmount the filesystem
      
      If a hugetlb page is associated with an explicitly mounted filesystem,
      this information in contained in the page_private field.  At migration
      time, this information is not preserved.  To fix, simply transfer
      page_private from old to new page at migration time if necessary.
      
      There is a related race with removing a huge page from a file and
      migration.  When a huge page is removed from the pagecache, the
      page_mapping() field is cleared, yet page_private remains set until the
      page is actually freed by free_huge_page().  A page could be migrated
      while in this state.  However, since page_mapping() is not set the
      hugetlbfs specific routine to transfer page_private is not called and we
      leak the page count in the filesystem.
      
      To fix that, check for this condition before migrating a huge page.  If
      the condition is detected, return EBUSY for the page.
      
      Link: http://lkml.kernel.org/r/74510272-7319-7372-9ea6-ec914734c179@oracle.com
      Link: http://lkml.kernel.org/r/20190212221400.3512-1-mike.kravetz@oracle.com
      Fixes: bcc54222 ("mm: hugetlb: introduce page_huge_active")
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: <stable@vger.kernel.org>
      [mike.kravetz@oracle.com: v2]
        Link: http://lkml.kernel.org/r/7534d322-d782-8ac6-1c8d-a8dc380eb3ab@oracle.com
      [mike.kravetz@oracle.com: update comment and changelog]
        Link: http://lkml.kernel.org/r/420bcfd6-158b-38e4-98da-26d0cd85bd01@oracle.comSigned-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb6acd01
  21. 28 2月, 2019 1 次提交
  22. 09 1月, 2019 1 次提交
  23. 29 12月, 2018 1 次提交
    • M
      hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race · c86aa7bb
      Mike Kravetz 提交于
      hugetlbfs page faults can race with truncate and hole punch operations.
      Current code in the page fault path attempts to handle this by 'backing
      out' operations if we encounter the race.  One obvious omission in the
      current code is removing a page newly added to the page cache.  This is
      pretty straight forward to address, but there is a more subtle and
      difficult issue of backing out hugetlb reservations.  To handle this
      correctly, the 'reservation state' before page allocation needs to be
      noted so that it can be properly backed out.  There are four distinct
      possibilities for reservation state: shared/reserved, shared/no-resv,
      private/reserved and private/no-resv.  Backing out a reservation may
      require memory allocation which could fail so that needs to be taken into
      account as well.
      
      Instead of writing the required complicated code for this rare occurrence,
      just eliminate the race.  i_mmap_rwsem is now held in read mode for the
      duration of page fault processing.  Hold i_mmap_rwsem longer in truncation
      and hold punch code to cover the call to remove_inode_hugepages.
      
      With this modification, code in remove_inode_hugepages checking for races
      becomes 'dead' as it can not longer happen.  Remove the dead code and
      expand comments to explain reasoning.  Similarly, checks for races with
      truncation in the page fault path can be simplified and removed.
      
      [mike.kravetz@oracle.com: incorporat suggestions from Kirill]
        Link: http://lkml.kernel.org/r/20181222223013.22193-3-mike.kravetz@oracle.com
      Link: http://lkml.kernel.org/r/20181218223557.5202-3-mike.kravetz@oracle.com
      Fixes: ebed4bfc ("hugetlb: fix absurd HugePages_Rsvd")
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c86aa7bb
  24. 23 8月, 2018 1 次提交
  25. 27 7月, 2018 1 次提交
  26. 12 7月, 2018 2 次提交
  27. 06 4月, 2018 1 次提交
  28. 23 3月, 2018 1 次提交
  29. 01 2月, 2018 2 次提交
  30. 30 11月, 2017 1 次提交
    • N
      fs/hugetlbfs/inode.c: change put_page/unlock_page order in hugetlbfs_fallocate() · 72639e6d
      Nadav Amit 提交于
      hugetlfs_fallocate() currently performs put_page() before unlock_page().
      This scenario opens a small time window, from the time the page is added
      to the page cache, until it is unlocked, in which the page might be
      removed from the page-cache by another core.  If the page is removed
      during this time windows, it might cause a memory corruption, as the
      wrong page will be unlocked.
      
      It is arguable whether this scenario can happen in a real system, and
      there are several mitigating factors.  The issue was found by code
      inspection (actually grep), and not by actually triggering the flow.
      Yet, since putting the page before unlocking is incorrect it should be
      fixed, if only to prevent future breakage or someone copy-pasting this
      code.
      
      Mike said:
       "I am of the opinion that this does not need to be sent to stable.
        Although the ordering is current code is incorrect, there is no way
        for this to be a problem with current locking. In addition, I verified
        that the perhaps bigger issue with sys_fadvise64(POSIX_FADV_DONTNEED)
        for hugetlbfs and other filesystems is addressed in 3a77d214 ("mm:
        fadvise: avoid fadvise for fs without backing device")"
      
      Link: http://lkml.kernel.org/r/20170826191124.51642-1-namit@vmware.com
      Fixes: 70c3547e ("hugetlbfs: add hugetlbfs_fallocate()")
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72639e6d