1. 16 6月, 2017 3 次提交
    • D
      x86, dax, libnvdimm: remove wb_cache_pmem() indirection · 4e4f00a9
      Dan Williams 提交于
      With all handling of the CONFIG_ARCH_HAS_PMEM_API case being moved to
      libnvdimm and the pmem driver directly we do not need to provide global
      wrappers and fallbacks in the CONFIG_ARCH_HAS_PMEM_API=n case. The pmem
      driver will simply not link to arch_wb_cache_pmem() in that case.  Same
      as before, pmem flushing is only defined for x86_64, via
      clean_cache_range(), but it is straightforward to add other archs in the
      future.
      
      arch_wb_cache_pmem() is an exported function since the pmem module needs
      to find it, but it is privately declared in drivers/nvdimm/pmem.h because
      there are no consumers outside of the pmem driver.
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      4e4f00a9
    • D
      x86, dax: replace clear_pmem() with open coded memset + dax_ops->flush · 81f55870
      Dan Williams 提交于
      The clear_pmem() helper simply combines a memset() plus a cache flush.
      Now that the flush routine is optionally provided by the dax device
      driver we can avoid unnecessary cache management on dax devices fronting
      volatile memory.
      
      With clear_pmem() gone we can follow on with a patch to make pmem cache
      management completely defined within the pmem driver.
      
      Cc: <x86@kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      81f55870
    • D
      filesystem-dax: convert to dax_copy_from_iter() · fec53774
      Dan Williams 提交于
      Now that all possible providers of the dax_operations copy_from_iter
      method are implemented, switch filesytem-dax to call the driver rather
      than copy_to_iter_pmem.
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      fec53774
  2. 10 5月, 2017 1 次提交
  3. 26 4月, 2017 1 次提交
    • D
      x86, dax, pmem: remove indirection around memcpy_from_pmem() · 6abccd1b
      Dan Williams 提交于
      memcpy_from_pmem() maps directly to memcpy_mcsafe(). The wrapper
      serves no real benefit aside from affording a more generic function name
      than the x86-specific 'mcsafe'. However this would not be the first time
      that x86 terminology leaked into the global namespace. For lack of
      better name, just use memcpy_mcsafe() directly.
      
      This conversion also catches a place where we should have been using
      plain memcpy, acpi_nfit_blk_single_io().
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6abccd1b
  4. 13 4月, 2017 1 次提交
    • D
      x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions · 11e63f6d
      Dan Williams 提交于
      Before we rework the "pmem api" to stop abusing __copy_user_nocache()
      for memcpy_to_pmem() we need to fix cases where we may strand dirty data
      in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
      for arbitrary data transfers from userspace. There is no guarantee that
      these transfers, performed by dax_iomap_actor(), will have aligned
      destinations or aligned transfer lengths. Backstop the usage
      __copy_user_nocache() with explicit cache management in these unaligned
      cases.
      
      Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
      that is saved for a later patch that moves the entirety of the "pmem
      api" into the pmem driver directly.
      
      Fixes: 5de490da ("pmem: add copy_from_iter_pmem() and clear_pmem()")
      Cc: <stable@vger.kernel.org>
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      11e63f6d
  5. 05 9月, 2016 1 次提交
  6. 13 7月, 2016 2 次提交
  7. 29 3月, 2016 1 次提交
  8. 10 3月, 2016 1 次提交
  9. 23 1月, 2016 1 次提交
    • R
      pmem: add wb_cache_pmem() to the PMEM API · 3f4a2670
      Ross Zwisler 提交于
      __arch_wb_cache_pmem() was already an internal implementation detail of
      the x86 PMEM API, but this functionality needs to be exported as part of
      the general PMEM API to handle the fsync/msync case for DAX mmaps.
      
      One thing worth noting is that we really do want this to be part of the
      PMEM API as opposed to a stand-alone function like clflush_cache_range()
      because of ordering restrictions.  By having wb_cache_pmem() as part of
      the PMEM API we can leave it unordered, call it multiple times to write
      back large amounts of memory, and then order the multiple calls with a
      single wmb_pmem().
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f4a2670
  10. 16 1月, 2016 1 次提交
    • D
      pmem, dax: clean up clear_pmem() · 52db400f
      Dan Williams 提交于
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 25):
      
      Both __dax_pmd_fault, and clear_pmem() were taking special steps to
      clear memory a page at a time to take advantage of non-temporal
      clear_page() implementations.  However, x86_64 does not use non-temporal
      instructions for clear_page(), and arch_clear_pmem() was always
      incurring the cost of __arch_wb_cache_pmem().
      
      Clean up the assumption that doing clear_pmem() a page at a time is more
      performant.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52db400f
  11. 28 8月, 2015 2 次提交
    • D
      x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB · 96601adb
      Dan Williams 提交于
      Given that a write-back (WB) mapping plus non-temporal stores is
      expected to be the most efficient way to access PMEM, update the
      definition of ARCH_HAS_PMEM_API to imply arch support for
      WB-mapped-PMEM.  This is needed as a pre-requisite for adding PMEM to
      the direct map and mapping it with struct page.
      
      The above clarification for X86_64 means that memcpy_to_pmem() is
      permitted to use the non-temporal arch_memcpy_to_pmem() rather than
      needlessly fall back to default_memcpy_to_pmem() when the pcommit
      instruction is not available.  When arch_memcpy_to_pmem() is not
      guaranteed to flush writes out of cache, i.e. on older X86_32
      implementations where non-temporal stores may just dirty cache,
      ARCH_HAS_PMEM_API is simply disabled.
      
      The default fall back for persistent memory handling remains.  Namely,
      map it with the WT (write-through) cache-type and hope for the best.
      
      arch_has_pmem_api() is updated to only indicate whether the arch
      provides the proper helpers to meet the minimum "writes are visible
      outside the cache hierarchy after memcpy_to_pmem() + wmb_pmem()".  Code
      that cares whether wmb_pmem() actually flushes writes to pmem must now
      call arch_has_wmb_pmem() directly.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      [hch: set ARCH_HAS_PMEM_API=n on x86_32]
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      [toshi: x86_32 compile fixes]
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      96601adb
    • R
      nd_blk: change aperture mapping from WC to WB · 67a3e8fe
      Ross Zwisler 提交于
      This should result in a pretty sizeable performance gain for reads.  For
      rough comparison I did some simple read testing using PMEM to compare
      reads of write combining (WC) mappings vs write-back (WB).  This was
      done on a random lab machine.
      
      PMEM reads from a write combining mapping:
      	# dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000
      	100000+0 records in
      	100000+0 records out
      	409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s
      
      PMEM reads from a write-back mapping:
      	# dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000
      	1000000+0 records in
      	1000000+0 records out
      	4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s
      
      To be able to safely support a write-back aperture I needed to add
      support for the "read flush" _DSM flag, as outlined in the DSM spec:
      
      http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
      
      This flag tells the ND BLK driver that it needs to flush the cache lines
      associated with the aperture after the aperture is moved but before any
      new data is read.  This ensures that any stale cache lines from the
      previous contents of the aperture will be discarded from the processor
      cache, and the new data will be read properly from the DIMM.  We know
      that the cache lines are clean and will be discarded without any
      writeback because either a) the previous aperture operation was a read,
      and we never modified the contents of the aperture, or b) the previous
      aperture operation was a write and we must have written back the dirtied
      contents of the aperture to the DIMM before the I/O was completed.
      
      In order to add support for the "read flush" flag I needed to add a
      generic routine to invalidate cache lines, mmio_flush_range().  This is
      protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently
      only supported on x86.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      67a3e8fe
  12. 21 8月, 2015 4 次提交