1. 01 9月, 2017 7 次提交
    • R
      libnvdimm, nd_blk: remove mmio_flush_range() · 5deb67f7
      Robin Murphy 提交于
      mmio_flush_range() suffers from a lack of clearly-defined semantics,
      and is somewhat ambiguous to port to other architectures where the
      scope of the writeback implied by "flush" and ordering might matter,
      but MMIO would tend to imply non-cacheable anyway. Per the rationale
      in 67a3e8fe ("nd_blk: change aperture mapping from WC to WB"), the
      only existing use is actually to invalidate clean cache lines for
      ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
      cleanup of the pmem API, that also now happens to be the exact purpose
      of arch_invalidate_pmem(), which would be a far more well-defined tool
      for the job.
      
      Rather than risk potentially inconsistent implementations of
      mmio_flush_range() for the sake of one callsite, streamline things by
      removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
      definitions up to the libnvdimm level, so they can be shared by NFIT
      as well. This allows NFIT to be enabled for arm64.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5deb67f7
    • V
      libnvdimm, btt: rework error clearing · d9b83c75
      Vishal Verma 提交于
      Clearing errors or badblocks during a BTT write requires sending an ACPI
      DSM, which means potentially sleeping. Since a BTT IO happens in atomic
      context (preemption disabled, spinlocks may be held), we cannot perform
      error clearing in the course of an IO. Due to this error clearing for
      BTT IOs has hitherto been disabled.
      
      In this patch we move error clearing out of the atomic section, and thus
      re-enable error clearing with BTTs. When we are about to add a block to
      the free list, we check if it was previously marked as an error, and if
      it was, we add it to the freelist, but also set a flag that says error
      clearing will be required. We then drop the lane (ending the atomic
      context), and send a zero buffer so that the error can be cleared. The
      error flag in the free list is protected by the nd 'lane', and is set
      only be a thread while it holds that lane. When the error is cleared,
      the flag is cleared, but while holding a mutex for that freelist index.
      
      When writing, we check for two things -
      1/ If the freelist mutex is held or if the error flag is set. If so,
      this is an error block that is being (or about to be) cleared.
      2/ If the block is a known badblock based on nsio->bb
      
      The second check is required because the BTT map error flag for a map
      entry only gets set when an error LBA is read. If we write to a new
      location that may not have the map error flag set, but still might be in
      the region's badblock list, we can trigger an EIO on the write, which is
      undesirable and completely avoidable.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d9b83c75
    • V
      libnvdimm: fix potential deadlock while clearing errors · 0930a750
      Vishal Verma 提交于
      With the ACPI NFIT 'DSM' methods, acpi can be called from IO paths.
      Specifically, the DSM to clear media errors is called during writes, so
      that we can provide a writes-fix-errors model.
      
      However it is easy to imagine a scenario like:
       -> write through the nvdimm driver
         -> acpi allocation
           -> writeback, causes more IO through the nvdimm driver
             -> deadlock
      
      Fix this by using memalloc_noio_{save,restore}, which sets the GFP_NOIO
      flag for the current scope when issuing commands/IOs that are expected
      to clear errors.
      
      Cc: <linux-acpi@vger.kernel.org>
      Cc: <linux-nvdimm@lists.01.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Robert Moore <robert.moore@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0930a750
    • V
      libnvdimm, btt: cache sector_size in arena_info · 75892004
      Vishal Verma 提交于
      In preparation for the error clearing rework, add sector_size in the
      arena_info struct.
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      75892004
    • V
      libnvdimm, btt: ensure that flags were also unchanged during a map_read · 1398199d
      Vishal Verma 提交于
      In btt_map_read, we read the map twice to make sure that the map entry
      didn't change after we added it to the read tracking table. In
      anticipation of expanding the use of the error bit, also make sure that
      the error and zero flags are constant across the two map reads.
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1398199d
    • V
      libnvdimm, btt: refactor map entry operations with macros · 0595d539
      Vishal Verma 提交于
      Add helpers for converting a raw map entry to just the block number, or
      either of the 'e' or 'z' flags in preparation for actually using the
      error flag to mark blocks with media errors.
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0595d539
    • V
      libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path · 1db1f3ce
      Vishal Verma 提交于
      The IO context conversion for rw_bytes missed a case in the BTT write
      path (btt_map_write) which should've been marked as atomic.
      
      In reality this should not cause a problem, because map writes are to
      small for nsio_rw_bytes to attempt error clearing, but it should be
      fixed for posterity.
      
      Add a might_sleep() in the non-atomic section of nsio_rw_bytes so that
      things like the nfit unit tests, which don't actually sleep, can catch
      bugs like this.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1db1f3ce
  2. 30 8月, 2017 2 次提交
  3. 16 8月, 2017 1 次提交
  4. 12 8月, 2017 2 次提交
  5. 05 8月, 2017 1 次提交
  6. 26 7月, 2017 1 次提交
    • O
      libnvdimm: Stop using HPAGE_SIZE · 0dd69643
      Oliver O'Halloran 提交于
      Currently libnvdimm uses HPAGE_SIZE as the default alignment for DAX and
      PFN devices. HPAGE_SIZE is the default hugetlbfs page size and when
      hugetlbfs is disabled it defaults to PAGE_SIZE. Given DAX has more
      in common with THP than hugetlbfs we should proably be using
      HPAGE_PMD_SIZE, but this is undefined when THP is disabled so lets just
      give it a new name.
      
      The other usage of HPAGE_SIZE in libnvdimm is when determining how large
      the altmap should be. For the reasons mentioned above it doesn't really
      make sense to use HPAGE_SIZE here either. PMD_SIZE seems to be safe to
      use in generic code and it happens to match the vmemmap allocation block
      on x86 and Power. It's still a hack, but it's a slightly nicer hack.
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0dd69643
  7. 18 7月, 2017 1 次提交
  8. 04 7月, 2017 3 次提交
  9. 01 7月, 2017 4 次提交
  10. 30 6月, 2017 5 次提交
    • V
      libnvdimm, btt: fix btt_rw_page not returning errors · c13c43d5
      Vishal Verma 提交于
      btt_rw_page was not propagating errors frm btt_do_bvec, resulting in any
      IO errors via the rw_page path going unnoticed. the pmem driver recently
      fixed this in e10624f8 pmem: fail io-requests to known bad blocks
      but same problem in BTT went neglected.
      
      Fixes: 5212e11f ("nd_btt: atomic sector updates")
      Cc: <stable@vger.kernel.org>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c13c43d5
    • D
      acpi, nfit: quiet invalid block-aperture-region warnings · d5d51fec
      Dan Williams 提交于
      This state is already visible by userspace since the BLK region will not
      be enabled, and it is otherwise benign as it usually indicates that the
      DIMM is not configured.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d5d51fec
    • V
      libnvdimm, btt: BTT updates for UEFI 2.7 format · 14e49454
      Vishal Verma 提交于
      The UEFI 2.7 specification defines an updated BTT metadata format,
      bumping the revision to 2.0. Add support for the new format, while
      retaining compatibility for the old 1.1 format.
      
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Linda Knippers <linda.knippers@hpe.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      14e49454
    • D
      libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region · 0b277961
      Dan Williams 提交于
      The pmem driver attaches to both persistent and volatile memory ranges
      advertised by the ACPI NFIT. When the region is volatile it is redundant
      to spend cycles flushing caches at fsync(). Check if the hosting region
      is volatile and do not set dax_write_cache() if it is.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0b277961
    • D
      libnvdimm, pmem, dax: export a cache control attribute · 6e0c90d6
      Dan Williams 提交于
      The dax_flush() operation can be turned into a nop on platforms where
      firmware arranges for cpu caches to be flushed on a power-fail event.
      The ACPI 6.2 specification defines a mechanism for the platform to
      indicate this capability so the kernel can select the proper default.
      However, for other platforms, the administrator must toggle this setting
      manually.
      
      Given this flush setting is a dax-specific mechanism we advertise it
      through a 'dax' attribute group hanging off a host device. For example,
      a 'pmem0' block-device gets a 'dax' sysfs-subdirectory with a
      'write_cache' attribute to control response to dax cache flush requests.
      This is similar to the 'queue/write_cache' attribute that appears under
      block devices.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6e0c90d6
  11. 28 6月, 2017 5 次提交
    • D
      libnvdimm, nfit: enable support for volatile ranges · c9e582aa
      Dan Williams 提交于
      Allow volatile nfit ranges to participate in all the same infrastructure
      provided for persistent memory regions. A resulting resulting namespace
      device will still be called "pmem", but the parent region type will be
      "nd_volatile". This is in preparation for disabling the dax ->flush()
      operation in the pmem driver when it is hosted on a volatile range.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c9e582aa
    • D
      libnvdimm, pmem: fix persistence warning · c00b396e
      Dan Williams 提交于
      The pmem driver assumes if platform firmware describes the memory
      devices associated with a persistent memory range and
      CONFIG_ARCH_HAS_PMEM_API=y that it has all the mechanism necessary to
      flush data to a power-fail safe zone. We warn if the firmware does not
      describe memory devices, but we also need to warn if the architecture
      does not claim pmem support.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c00b396e
    • D
      x86, libnvdimm, pmem: remove global pmem api · ca6a4657
      Dan Williams 提交于
      Now that all callers of the pmem api have been converted to dax helpers that
      call back to the pmem driver, we can remove include/linux/pmem.h and
      asm/pmem.h.
      
      Cc: <x86@kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ca6a4657
    • D
      x86, libnvdimm, pmem: move arch_invalidate_pmem() to libnvdimm · f2b61257
      Dan Williams 提交于
      Kill this globally defined wrapper and move to libnvdimm so that we can
      ultimately remove include/linux/pmem.h and asm/pmem.h.
      
      Cc: <x86@kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f2b61257
    • C
      block: don't bother with bounce limits for make_request drivers · 0b0bcacc
      Christoph Hellwig 提交于
      We only call blk_queue_bounce for request-based drivers, so stop messing
      with it for make_request based drivers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0b0bcacc
  12. 16 6月, 2017 8 次提交
    • D
      x86, dax, libnvdimm: remove wb_cache_pmem() indirection · 4e4f00a9
      Dan Williams 提交于
      With all handling of the CONFIG_ARCH_HAS_PMEM_API case being moved to
      libnvdimm and the pmem driver directly we do not need to provide global
      wrappers and fallbacks in the CONFIG_ARCH_HAS_PMEM_API=n case. The pmem
      driver will simply not link to arch_wb_cache_pmem() in that case.  Same
      as before, pmem flushing is only defined for x86_64, via
      clean_cache_range(), but it is straightforward to add other archs in the
      future.
      
      arch_wb_cache_pmem() is an exported function since the pmem module needs
      to find it, but it is privately declared in drivers/nvdimm/pmem.h because
      there are no consumers outside of the pmem driver.
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      4e4f00a9
    • D
      dax, pmem: introduce an optional 'flush' dax_operation · 3c1cebff
      Dan Williams 提交于
      Filesystem-DAX flushes caches whenever it writes to the address returned
      through dax_direct_access() and when writing back dirty radix entries.
      That flushing is only required in the pmem case, so add a dax operation
      to allow pmem to take this extra action, but skip it for other dax
      capable devices that do not provide a flush routine.
      
      An example for this differentiation might be a volatile ram disk where
      there is no expectation of persistence. In fact the pmem driver itself might
      front such an address range specified by the NFIT. So, this "no flush"
      property might be something passed down by the bus / libnvdimm.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      3c1cebff
    • T
      libnvdimm, pmem: Add sysfs notifications to badblocks · 975750a9
      Toshi Kani 提交于
      Sysfs "badblocks" information may be updated during run-time that:
       - MCE, SCI, and sysfs "scrub" may add new bad blocks
       - Writes and ioctl() may clear bad blocks
      
      Add support to send sysfs notifications to sysfs "badblocks" file
      under region and pmem directories when their badblocks information
      is re-evaluated (but is not necessarily changed) during run-time.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Linda Knippers <linda.knippers@hpe.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      975750a9
    • D
      libnvdimm, label: switch to using v1.2 labels by default · 8990cdf1
      Dan Williams 提交于
      The rules for which version of the label specification are in effect at
      any given point in time are as follows:
      
      1/ If a DIMM has an existing / valid index block then the version
         specified is used regardless if it is a previous version.
      
      2/ By default when the kernel is initializing new index blocks the
         latest specification version (v1.2 at time of writing) is used.
      
      3/ An environment that wants to force create v1.1 label-sets must
         arrange for userspace to disable all active regions / namespaces /
         dimms and write a valid set of v1.1 index blocks to the dimms.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      8990cdf1
    • D
      libnvdimm, label: add address abstraction identifiers · b3fde74e
      Dan Williams 提交于
      Starting with v1.2 labels, 'address abstractions' can be hinted via an
      address abstraction id that implies an info-block format. The standard
      address abstraction in the specification is the v2 format of the
      Block-Translation-Table (BTT). Support for that is saved for a later
      patch, for now we add support for the Linux supported address
      abstractions BTT (v1), PFN, and DAX.
      
      The new 'holder_class' attribute for namespace devices is added for
      tooling to specify the 'abstraction_guid' to store in the namespace label.
      For v1.1 labels this field is undefined and any setting of
      'holder_class' away from the default 'none' value will only have effect
      until the driver is unloaded. Setting 'holder_class' requires that
      whatever device tries to claim the namespace must be of the specified
      class.
      
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b3fde74e
    • D
      libnvdimm, label: add v1.2 label checksum support · 355d8388
      Dan Williams 提交于
      The v1.2 namespace label specification adds a fletcher checksum to each
      label instance. Add generation and validation support for the new field.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      355d8388
    • D
      libnvdimm, label: update 'nlabel' and 'position' handling for local namespaces · 3934d841
      Dan Williams 提交于
      The v1.2 namespace label specification requires 'nlabel' and 'position'
      to be valid for the first ("lowest dpa") label in the set. It also
      requires all non-first labels to set those fields to 0xff.
      
      Linux does not much care if these values are correct, because we can
      just trust the count of labels with the matching uuid like the v1.1
      case. However, we set them correctly in case other environments care.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      3934d841
    • D
      libnvdimm, label: populate 'isetcookie' for blk-aperture namespaces · 8f2bc243
      Dan Williams 提交于
      Starting with the v1.2 definition of namespace labels, the isetcookie
      field is populated and validated for blk-aperture namespaces. This adds
      some safety against inadvertent copying of namespace labels from one
      DIMM-device to another.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      8f2bc243