1. 01 9月, 2017 8 次提交
    • R
      libnvdimm, nd_blk: remove mmio_flush_range() · 5deb67f7
      Robin Murphy 提交于
      mmio_flush_range() suffers from a lack of clearly-defined semantics,
      and is somewhat ambiguous to port to other architectures where the
      scope of the writeback implied by "flush" and ordering might matter,
      but MMIO would tend to imply non-cacheable anyway. Per the rationale
      in 67a3e8fe ("nd_blk: change aperture mapping from WC to WB"), the
      only existing use is actually to invalidate clean cache lines for
      ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
      cleanup of the pmem API, that also now happens to be the exact purpose
      of arch_invalidate_pmem(), which would be a far more well-defined tool
      for the job.
      
      Rather than risk potentially inconsistent implementations of
      mmio_flush_range() for the sake of one callsite, streamline things by
      removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
      definitions up to the libnvdimm level, so they can be shared by NFIT
      as well. This allows NFIT to be enabled for arm64.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5deb67f7
    • V
      libnvdimm, btt: rework error clearing · d9b83c75
      Vishal Verma 提交于
      Clearing errors or badblocks during a BTT write requires sending an ACPI
      DSM, which means potentially sleeping. Since a BTT IO happens in atomic
      context (preemption disabled, spinlocks may be held), we cannot perform
      error clearing in the course of an IO. Due to this error clearing for
      BTT IOs has hitherto been disabled.
      
      In this patch we move error clearing out of the atomic section, and thus
      re-enable error clearing with BTTs. When we are about to add a block to
      the free list, we check if it was previously marked as an error, and if
      it was, we add it to the freelist, but also set a flag that says error
      clearing will be required. We then drop the lane (ending the atomic
      context), and send a zero buffer so that the error can be cleared. The
      error flag in the free list is protected by the nd 'lane', and is set
      only be a thread while it holds that lane. When the error is cleared,
      the flag is cleared, but while holding a mutex for that freelist index.
      
      When writing, we check for two things -
      1/ If the freelist mutex is held or if the error flag is set. If so,
      this is an error block that is being (or about to be) cleared.
      2/ If the block is a known badblock based on nsio->bb
      
      The second check is required because the BTT map error flag for a map
      entry only gets set when an error LBA is read. If we write to a new
      location that may not have the map error flag set, but still might be in
      the region's badblock list, we can trigger an EIO on the write, which is
      undesirable and completely avoidable.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d9b83c75
    • V
      libnvdimm: fix potential deadlock while clearing errors · 0930a750
      Vishal Verma 提交于
      With the ACPI NFIT 'DSM' methods, acpi can be called from IO paths.
      Specifically, the DSM to clear media errors is called during writes, so
      that we can provide a writes-fix-errors model.
      
      However it is easy to imagine a scenario like:
       -> write through the nvdimm driver
         -> acpi allocation
           -> writeback, causes more IO through the nvdimm driver
             -> deadlock
      
      Fix this by using memalloc_noio_{save,restore}, which sets the GFP_NOIO
      flag for the current scope when issuing commands/IOs that are expected
      to clear errors.
      
      Cc: <linux-acpi@vger.kernel.org>
      Cc: <linux-nvdimm@lists.01.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Robert Moore <robert.moore@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0930a750
    • V
      libnvdimm, btt: cache sector_size in arena_info · 75892004
      Vishal Verma 提交于
      In preparation for the error clearing rework, add sector_size in the
      arena_info struct.
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      75892004
    • V
      libnvdimm, btt: ensure that flags were also unchanged during a map_read · 1398199d
      Vishal Verma 提交于
      In btt_map_read, we read the map twice to make sure that the map entry
      didn't change after we added it to the read tracking table. In
      anticipation of expanding the use of the error bit, also make sure that
      the error and zero flags are constant across the two map reads.
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1398199d
    • V
      libnvdimm, btt: refactor map entry operations with macros · 0595d539
      Vishal Verma 提交于
      Add helpers for converting a raw map entry to just the block number, or
      either of the 'e' or 'z' flags in preparation for actually using the
      error flag to mark blocks with media errors.
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0595d539
    • V
      libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path · 1db1f3ce
      Vishal Verma 提交于
      The IO context conversion for rw_bytes missed a case in the BTT write
      path (btt_map_write) which should've been marked as atomic.
      
      In reality this should not cause a problem, because map writes are to
      small for nsio_rw_bytes to attempt error clearing, but it should be
      fixed for posterity.
      
      Add a might_sleep() in the non-atomic section of nsio_rw_bytes so that
      things like the nfit unit tests, which don't actually sleep, can catch
      bugs like this.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1db1f3ce
    • D
      libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute · a15797f4
      Dan Williams 提交于
      When the nfit driver initializes it runs an ARS (Address Range Scrub)
      operation across every pmem range. Part of that process involves
      determining the ARS capabilities of a given address range. One of the
      capabilities that is reported is the 'Clear Uncorrectable Error Range
      Length Unit Size' (see: ACPI 6.2 section 9.20.7.4 Function Index 1 -
      Query ARS Capabilities). This property is of interest to userspace
      software as it indicates the boundary at which the NVDIMM may need to
      perform read-modify-write cycles to maintain ECC blocks.
      
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      a15797f4
  2. 30 8月, 2017 2 次提交
  3. 16 8月, 2017 1 次提交
  4. 12 8月, 2017 2 次提交
  5. 08 8月, 2017 1 次提交
  6. 05 8月, 2017 1 次提交
  7. 26 7月, 2017 1 次提交
    • O
      libnvdimm: Stop using HPAGE_SIZE · 0dd69643
      Oliver O'Halloran 提交于
      Currently libnvdimm uses HPAGE_SIZE as the default alignment for DAX and
      PFN devices. HPAGE_SIZE is the default hugetlbfs page size and when
      hugetlbfs is disabled it defaults to PAGE_SIZE. Given DAX has more
      in common with THP than hugetlbfs we should proably be using
      HPAGE_PMD_SIZE, but this is undefined when THP is disabled so lets just
      give it a new name.
      
      The other usage of HPAGE_SIZE in libnvdimm is when determining how large
      the altmap should be. For the reasons mentioned above it doesn't really
      make sense to use HPAGE_SIZE here either. PMD_SIZE seems to be safe to
      use in generic code and it happens to match the vmemmap allocation block
      on x86 and Power. It's still a hack, but it's a slightly nicer hack.
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0dd69643
  8. 23 7月, 2017 2 次提交
  9. 21 7月, 2017 4 次提交
  10. 20 7月, 2017 18 次提交