1. 15 5月, 2019 1 次提交
  2. 14 3月, 2019 1 次提交
    • A
      fs/dax: Deposit pagetable even when installing zero page · 11cf9d86
      Aneesh Kumar K.V 提交于
      Architectures like ppc64 use the deposited page table to store hardware
      page table slot information. Make sure we deposit a page table when
      using zero page at the pmd level for hash.
      
      Without this we hit
      
      Unable to handle kernel paging request for data at address 0x00000000
      Faulting instruction address: 0xc000000000082a74
      Oops: Kernel access of bad area, sig: 11 [#1]
      ....
      
      NIP [c000000000082a74] __hash_page_thp+0x224/0x5b0
      LR [c0000000000829a4] __hash_page_thp+0x154/0x5b0
      Call Trace:
       hash_page_mm+0x43c/0x740
       do_hash_page+0x2c/0x3c
       copy_from_iter_flushcache+0xa4/0x4a0
       pmem_copy_from_iter+0x2c/0x50 [nd_pmem]
       dax_copy_from_iter+0x40/0x70
       dax_iomap_actor+0x134/0x360
       iomap_apply+0xfc/0x1b0
       dax_iomap_rw+0xac/0x130
       ext4_file_write_iter+0x254/0x460 [ext4]
       __vfs_write+0x120/0x1e0
       vfs_write+0xd8/0x220
       SyS_write+0x6c/0x110
       system_call+0x3c/0x130
      
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      11cf9d86
  3. 02 3月, 2019 1 次提交
  4. 13 2月, 2019 2 次提交
  5. 29 12月, 2018 1 次提交
  6. 22 12月, 2018 1 次提交
  7. 05 12月, 2018 1 次提交
    • M
      dax: Fix unlock mismatch with updated API · 27359fd6
      Matthew Wilcox 提交于
      Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
      store a replacement entry in the Xarray at the given xas-index with the
      DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
      value of the entry relative to the current Xarray state to be specified.
      
      In most contexts dax_unlock_entry() is operating in the same scope as
      the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
      case the implementation needs to recall the original entry. In the case
      where the original entry is a 'pmd' entry it is possible that the pfn
      performed to do the lookup is misaligned to the value retrieved in the
      Xarray.
      
      Change the api to return the unlock cookie from dax_lock_page() and pass
      it to dax_unlock_page(). This fixes a bug where dax_unlock_page() was
      assuming that the page was PMD-aligned if the entry was a PMD entry with
      signatures like:
      
       WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
       RIP: 0010:dax_insert_entry+0x2b2/0x2d0
       [..]
       Call Trace:
        dax_iomap_pte_fault.isra.41+0x791/0xde0
        ext4_dax_huge_fault+0x16f/0x1f0
        ? up_read+0x1c/0xa0
        __do_fault+0x1f/0x160
        __handle_mm_fault+0x1033/0x1490
        handle_mm_fault+0x18b/0x3d0
      
      Link: https://lkml.kernel.org/r/20181130154902.GL10377@bombadil.infradead.org
      Fixes: 9f32d221 ("dax: Convert dax_lock_mapping_entry to XArray")
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      27359fd6
  8. 29 11月, 2018 2 次提交
  9. 19 11月, 2018 1 次提交
  10. 18 11月, 2018 2 次提交
    • M
      dax: Fix huge page faults · 0e40de03
      Matthew Wilcox 提交于
      Using xas_load() with a PMD-sized xa_state would work if either a
      PMD-sized entry was present or a PTE sized entry was present in the
      first 64 entries (of the 512 PTEs in a PMD on x86).  If there was no
      PTE in the first 64 entries, grab_mapping_entry() would believe there
      were no entries present, allocate a PMD-sized entry and overwrite the
      PTE in the page cache.
      
      Use xas_find_conflict() instead which turns out to simplify
      both get_unlocked_entry() and grab_mapping_entry().  Also remove a
      WARN_ON_ONCE from grab_mapping_entry() as it will have already triggered
      in get_unlocked_entry().
      
      Fixes: cfc93c6c ("dax: Convert dax_insert_pfn_mkwrite to XArray")
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      0e40de03
    • M
      dax: Fix dax_unlock_mapping_entry for PMD pages · fda490d3
      Matthew Wilcox 提交于
      Device DAX PMD pages do not set the PageHead bit for compound pages.
      Fix for now by retrieving the PMD bit from the entry, but eventually we
      will be passed the page size by the caller.
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Fixes: 9f32d221 ("dax: Convert dax_lock_mapping_entry to XArray")
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      fda490d3
  11. 17 11月, 2018 3 次提交
  12. 21 10月, 2018 8 次提交
  13. 09 10月, 2018 1 次提交
    • D
      filesystem-dax: Fix dax_layout_busy_page() livelock · d7782145
      Dan Williams 提交于
      In the presence of multi-order entries the typical
      pagevec_lookup_entries() pattern may loop forever:
      
      	while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
      				min(end - index, (pgoff_t)PAGEVEC_SIZE),
      				indices)) {
      		...
      		for (i = 0; i < pagevec_count(&pvec); i++) {
      			index = indices[i];
      			...
      		}
      		index++; /* BUG */
      	}
      
      The loop updates 'index' for each index found and then increments to the
      next possible page to continue the lookup. However, if the last entry in
      the pagevec is multi-order then the next possible page index is more
      than 1 page away. Fix this locally for the filesystem-dax case by
      checking for dax-multi-order entries. Going forward new users of
      multi-order entries need to be similarly careful, or we need a generic
      way to report the page increment in the radix iterator.
      
      Fixes: 5fac7408 ("mm, fs, dax: handle layout changes to pinned dax...")
      Cc: <stable@vger.kernel.org>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d7782145
  14. 30 9月, 2018 1 次提交
    • M
      xarray: Replace exceptional entries · 3159f943
      Matthew Wilcox 提交于
      Introduce xarray value entries and tagged pointers to replace radix
      tree exceptional entries.  This is a slight change in encoding to allow
      the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
      value entry).  It is also a change in emphasis; exceptional entries are
      intimidating and different.  As the comment explains, you can choose
      to store values or pointers in the xarray and they are both first-class
      citizens.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      3159f943
  15. 28 9月, 2018 1 次提交
  16. 12 9月, 2018 1 次提交
  17. 31 7月, 2018 1 次提交
  18. 30 7月, 2018 1 次提交
  19. 24 7月, 2018 1 次提交
    • D
      filesystem-dax: Introduce dax_lock_mapping_entry() · c2a7d2a1
      Dan Williams 提交于
      In preparation for implementing support for memory poison (media error)
      handling via dax mappings, implement a lock_page() equivalent. Poison
      error handling requires rmap and needs guarantees that the page->mapping
      association is maintained / valid (inode not freed) for the duration of
      the lookup.
      
      In the device-dax case it is sufficient to simply hold a dev_pagemap
      reference. In the filesystem-dax case we need to use the entry lock.
      
      Export the entry lock via dax_lock_mapping_entry() that uses
      rcu_read_lock() to protect against the inode being freed, and
      revalidates the page->mapping association under xa_lock().
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      c2a7d2a1
  20. 21 7月, 2018 1 次提交
    • D
      filesystem-dax: Set page->index · 73449daf
      Dan Williams 提交于
      In support of enabling memory_failure() handling for filesystem-dax
      mappings, set ->index to the pgoff of the page. The rmap implementation
      requires ->index to bound the search through the vma interval tree. The
      index is set and cleared at dax_associate_entry() and
      dax_disassociate_entry() time respectively.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      73449daf
  21. 08 6月, 2018 1 次提交
  22. 03 6月, 2018 1 次提交
  23. 23 5月, 2018 2 次提交
    • D
      dax: Report bytes remaining in dax_iomap_actor() · a77d4786
      Dan Williams 提交于
      In preparation for protecting the dax read(2) path from media errors
      with copy_to_iter_mcsafe() (via dax_copy_to_iter()), convert the
      implementation to report the bytes successfully transferred.
      
      Cc: <x86@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      a77d4786
    • D
      dax: Introduce a ->copy_to_iter dax operation · b3a9a0c3
      Dan Williams 提交于
      Similar to the ->copy_from_iter() operation, a platform may want to
      deploy an architecture or device specific routine for handling reads
      from a dax_device like /dev/pmemX. On x86 this routine will point to a
      machine check safe version of copy_to_iter(). For now, add the plumbing
      to device-mapper and the dax core.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b3a9a0c3
  24. 22 5月, 2018 1 次提交
    • D
      mm, fs, dax: handle layout changes to pinned dax mappings · 5fac7408
      Dan Williams 提交于
      Background:
      
      get_user_pages() in the filesystem pins file backed memory pages for
      access by devices performing dma. However, it only pins the memory pages
      not the page-to-file offset association. If a file is truncated the
      pages are mapped out of the file and dma may continue indefinitely into
      a page that is owned by a device driver. This breaks coherency of the
      file vs dma, but the assumption is that if userspace wants the
      file-space truncated it does not matter what data is inbound from the
      device, it is not relevant anymore. The only expectation is that dma can
      safely continue while the filesystem reallocates the block(s).
      
      Problem:
      
      This expectation that dma can safely continue while the filesystem
      changes the block map is broken by dax. With dax the target dma page
      *is* the filesystem block. The model of leaving the page pinned for dma,
      but truncating the file block out of the file, means that the filesytem
      is free to reallocate a block under active dma to another file and now
      the expected data-incoherency situation has turned into active
      data-corruption.
      
      Solution:
      
      Defer all filesystem operations (fallocate(), truncate()) on a dax mode
      file while any page/block in the file is under active dma. This solution
      assumes that dma is transient. Cases where dma operations are known to
      not be transient, like RDMA, have been explicitly disabled via
      commits like 5f1d43de "IB/core: disable memory registration of
      filesystem-dax vmas".
      
      The dax_layout_busy_page() routine is called by filesystems with a lock
      held against mm faults (i_mmap_lock) to find pinned / busy dax pages.
      The process of looking up a busy page invalidates all mappings
      to trigger any subsequent get_user_pages() to block on i_mmap_lock.
      The filesystem continues to call dax_layout_busy_page() until it finally
      returns no more active pages. This approach assumes that the page
      pinning is transient, if that assumption is violated the system would
      have likely hung from the uncompleted I/O.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5fac7408
  25. 17 4月, 2018 1 次提交
  26. 12 4月, 2018 1 次提交
  27. 03 4月, 2018 1 次提交
    • D
      fs, dax: use page->mapping to warn if truncate collides with a busy page · d2c997c0
      Dan Williams 提交于
      Catch cases where extent unmap operations encounter pages that are
      pinned / busy. Typically this is pinned pages that are under active dma.
      This warning is a canary for potential data corruption as truncated
      blocks could be allocated to a new file while the device is still
      performing i/o.
      
      Here is an example of a collision that this implementation catches:
      
       WARNING: CPU: 2 PID: 1286 at fs/dax.c:343 dax_disassociate_entry+0x55/0x80
       [..]
       Call Trace:
        __dax_invalidate_mapping_entry+0x6c/0xf0
        dax_delete_mapping_entry+0xf/0x20
        truncate_exceptional_pvec_entries.part.12+0x1af/0x200
        truncate_inode_pages_range+0x268/0x970
        ? tlb_gather_mmu+0x10/0x20
        ? up_write+0x1c/0x40
        ? unmap_mapping_range+0x73/0x140
        xfs_free_file_space+0x1b6/0x5b0 [xfs]
        ? xfs_file_fallocate+0x7f/0x320 [xfs]
        ? down_write_nested+0x40/0x70
        ? xfs_ilock+0x21d/0x2f0 [xfs]
        xfs_file_fallocate+0x162/0x320 [xfs]
        ? rcu_read_lock_sched_held+0x3f/0x70
        ? rcu_sync_lockdep_assert+0x2a/0x50
        ? __sb_start_write+0xd0/0x1b0
        ? vfs_fallocate+0x20c/0x270
        vfs_fallocate+0x154/0x270
        SyS_fallocate+0x43/0x80
        entry_SYSCALL_64_fastpath+0x1f/0x96
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d2c997c0