1. 08 4月, 2020 18 次提交
  2. 07 4月, 2020 1 次提交
  3. 06 4月, 2020 2 次提交
  4. 04 4月, 2020 11 次提交
  5. 03 4月, 2020 8 次提交
    • V
      dax,iomap: Add helper dax_iomap_zero() to zero a range · 4f3b4f16
      Vivek Goyal 提交于
      Add a helper dax_ioamp_zero() to zero a range. This patch basically
      merges __dax_zero_page_range() and iomap_dax_zero().
      Suggested-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20200228163456.1587-7-vgoyal@redhat.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      4f3b4f16
    • V
      dax: Use new dax zero page method for zeroing a page · 0a23f9ff
      Vivek Goyal 提交于
      Use new dax native zero page method for zeroing page if I/O is page
      aligned. Otherwise fall back to direct_access() + memcpy().
      
      This gets rid of one of the depenendency on block device in dax path.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Link: https://lore.kernel.org/r/20200228163456.1587-6-vgoyal@redhat.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      0a23f9ff
    • T
      NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout · f30a6ea0
      Trond Myklebust 提交于
      Setting nfs_mountpoint_expiry_timeout() to a negative value stops
      mountpoint expiration, while setting it to a positive value restarts
      the scheduler.
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      f30a6ea0
    • T
      NFS: finish_automount() requires us to hold 2 refs to the mount record · 75da9858
      Trond Myklebust 提交于
      We must not return from nfs_d_automount() without holding 2 references
      to the mount record. Doing so, will trigger the BUG() in finish_automount().
      Also ensure that we don't try to reschedule the automount timer with
      a negative or zero timeout value.
      
      Fixes: 22a1ae9a ("NFS: If nfs_mountpoint_expiry_timeout < 0, do not expire submounts")
      Cc: stable@vger.kernel.org # v5.5+
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      75da9858
    • S
      NFS: Fix a few constant_table array definitions · 529af905
      Scott Mayhew 提交于
      nfs_vers_tokens, nfs_xprt_protocol_tokens, and nfs_secflavor_tokens were
      all missing an empty item at the end of the array, allowing
      lookup_constant() to potentially walk off the end and trigger and oops.
      Reported-by: NOlga Kornievskaia <aglo@umich.edu>
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Fixes: e38bb238 ("NFS: Convert mount option parsing to use functionality from fs_parser.h")
      Cc: stable@vger.kernel.org # v5.6
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      529af905
    • M
      hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race · 87bf91d3
      Mike Kravetz 提交于
      hugetlbfs page faults can race with truncate and hole punch operations.
      Current code in the page fault path attempts to handle this by 'backing
      out' operations if we encounter the race.  One obvious omission in the
      current code is removing a page newly added to the page cache.  This is
      pretty straight forward to address, but there is a more subtle and
      difficult issue of backing out hugetlb reservations.  To handle this
      correctly, the 'reservation state' before page allocation needs to be
      noted so that it can be properly backed out.  There are four distinct
      possibilities for reservation state: shared/reserved, shared/no-resv,
      private/reserved and private/no-resv.  Backing out a reservation may
      require memory allocation which could fail so that needs to be taken
      into account as well.
      
      Instead of writing the required complicated code for this rare
      occurrence, just eliminate the race.  i_mmap_rwsem is now held in read
      mode for the duration of page fault processing.  Hold i_mmap_rwsem in
      write mode when modifying i_size.  In this way, truncation can not
      proceed when page faults are being processed.  In addition, i_size
      will not change during fault processing so a single check can be made
      to ensure faults are not beyond (proposed) end of file.  Faults can
      still race with hole punch, but that race is handled by existing code
      and the use of hugetlb_fault_mutex.
      
      With this modification, checks for races with truncation in the page
      fault path can be simplified and removed.  remove_inode_hugepages no
      longer needs to take hugetlb_fault_mutex in the case of truncation.
      Comments are expanded to explain reasoning behind locking.
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Link: http://lkml.kernel.org/r/20200316205756.146666-3-mike.kravetz@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87bf91d3
    • M
      hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization · c0d0381a
      Mike Kravetz 提交于
      Patch series "hugetlbfs: use i_mmap_rwsem for more synchronization", v2.
      
      While discussing the issue with huge_pte_offset [1], I remembered that
      there were more outstanding hugetlb races.  These issues are:
      
      1) For shared pmds, huge PTE pointers returned by huge_pte_alloc can become
         invalid via a call to huge_pmd_unshare by another thread.
      2) hugetlbfs page faults can race with truncation causing invalid global
         reserve counts and state.
      
      A previous attempt was made to use i_mmap_rwsem in this manner as
      described at [2].  However, those patches were reverted starting with [3]
      due to locking issues.
      
      To effectively use i_mmap_rwsem to address the above issues it needs to be
      held (in read mode) during page fault processing.  However, during fault
      processing we need to lock the page we will be adding.  Lock ordering
      requires we take page lock before i_mmap_rwsem.  Waiting until after
      taking the page lock is too late in the fault process for the
      synchronization we want to do.
      
      To address this lock ordering issue, the following patches change the lock
      ordering for hugetlb pages.  This is not too invasive as hugetlbfs
      processing is done separate from core mm in many places.  However, I don't
      really like this idea.  Much ugliness is contained in the new routine
      hugetlb_page_mapping_lock_write() of patch 1.
      
      The only other way I can think of to address these issues is by catching
      all the races.  After catching a race, cleanup, backout, retry ...  etc,
      as needed.  This can get really ugly, especially for huge page
      reservations.  At one time, I started writing some of the reservation
      backout code for page faults and it got so ugly and complicated I went
      down the path of adding synchronization to avoid the races.  Any other
      suggestions would be welcome.
      
      [1] https://lore.kernel.org/linux-mm/1582342427-230392-1-git-send-email-longpeng2@huawei.com/
      [2] https://lore.kernel.org/linux-mm/20181222223013.22193-1-mike.kravetz@oracle.com/
      [3] https://lore.kernel.org/linux-mm/20190103235452.29335-1-mike.kravetz@oracle.com
      [4] https://lore.kernel.org/linux-mm/1584028670.7365.182.camel@lca.pw/
      [5] https://lore.kernel.org/lkml/20200312183142.108df9ac@canb.auug.org.au/
      
      This patch (of 2):
      
      While looking at BUGs associated with invalid huge page map counts, it was
      discovered and observed that a huge pte pointer could become 'invalid' and
      point to another task's page table.  Consider the following:
      
      A task takes a page fault on a shared hugetlbfs file and calls
      huge_pte_alloc to get a ptep.  Suppose the returned ptep points to a
      shared pmd.
      
      Now, another task truncates the hugetlbfs file.  As part of truncation, it
      unmaps everyone who has the file mapped.  If the range being truncated is
      covered by a shared pmd, huge_pmd_unshare will be called.  For all but the
      last user of the shared pmd, huge_pmd_unshare will clear the pud pointing
      to the pmd.  If the task in the middle of the page fault is not the last
      user, the ptep returned by huge_pte_alloc now points to another task's
      page table or worse.  This leads to bad things such as incorrect page
      map/reference counts or invalid memory references.
      
      To fix, expand the use of i_mmap_rwsem as follows:
      - i_mmap_rwsem is held in read mode whenever huge_pmd_share is called.
        huge_pmd_share is only called via huge_pte_alloc, so callers of
        huge_pte_alloc take i_mmap_rwsem before calling.  In addition, callers
        of huge_pte_alloc continue to hold the semaphore until finished with
        the ptep.
      - i_mmap_rwsem is held in write mode whenever huge_pmd_unshare is called.
      
      One problem with this scheme is that it requires taking i_mmap_rwsem
      before taking the page lock during page faults.  This is not the order
      specified in the rest of mm code.  Handling of hugetlbfs pages is mostly
      isolated today.  Therefore, we use this alternative locking order for
      PageHuge() pages.
      
               mapping->i_mmap_rwsem
                 hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
                   page->flags PG_locked (lock_page)
      
      To help with lock ordering issues, hugetlb_page_mapping_lock_write() is
      introduced to write lock the i_mmap_rwsem associated with a page.
      
      In most cases it is easy to get address_space via vma->vm_file->f_mapping.
      However, in the case of migration or memory errors for anon pages we do
      not have an associated vma.  A new routine _get_hugetlb_page_mapping()
      will use anon_vma to get address_space in these cases.
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Link: http://lkml.kernel.org/r/20200316205756.146666-2-mike.kravetz@oracle.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0d0381a
    • P
      mm/userfaultfd: honor FAULT_FLAG_KILLABLE in fault path · 3e69ad08
      Peter Xu 提交于
      Userfaultfd fault path was by default killable even if the caller does not
      have FAULT_FLAG_KILLABLE.  That makes sense before in that when with gup
      we don't have FAULT_FLAG_KILLABLE properly set before.  Now after previous
      patch we've got FAULT_FLAG_KILLABLE applied even for gup code so it should
      also make sense to let userfaultfd to honor the FAULT_FLAG_KILLABLE.
      
      Because we're unconditionally setting FAULT_FLAG_KILLABLE in gup code
      right now, this patch should have no functional change.  It also cleaned
      the code a little bit by introducing some helpers.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NBrian Geffon <bgeffon@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
      Cc: Martin Cracauer <cracauer@cons.org>
      Cc: Marty McFadden <mcfadden8@llnl.gov>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Maya Gokhale <gokhale2@llnl.gov>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Link: http://lkml.kernel.org/r/20200220160300.9941-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e69ad08