1. 22 3月, 2018 2 次提交
    • D
      libnvdimm, nfit: fix persistence domain reporting · fe9a552e
      Dan Williams 提交于
      The persistence domain is a point in the platform where once writes
      reach that destination the platform claims it will make them persistent
      relative to power loss. In the ACPI NFIT this is currently communicated
      as 2 bits in the "NFIT - Platform Capabilities Structure". The bits
      comprise a hierarchy, i.e. bit0 "CPU Cache Flush to NVDIMM Durability on
      Power Loss Capable" implies bit1 "Memory Controller Flush to NVDIMM
      Durability on Power Loss Capable".
      
      Commit 96c3a239 "libnvdimm: expose platform persistence attr..."
      shows the persistence domain as flags, but it's really an enumerated
      hierarchy.
      
      Fix this newly introduced user ABI to show the closest available
      persistence domain before userspace develops dependencies on seeing, or
      needing to develop code to tolerate, the raw NFIT flags communicated
      through the libnvdimm-generic region attribute.
      
      Fixes: 96c3a239 ("libnvdimm: expose platform persistence attr...")
      Reviewed-by: NDave Jiang <dave.jiang@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      fe9a552e
    • D
      libnvdimm, region: hide persistence_domain when unknown · 896196dc
      Dan Williams 提交于
      Similar to other region attributes, do not emit the persistence_domain
      attribute if its contents are empty.
      
      Fixes: 96c3a239 ("libnvdimm: expose platform persistence attr...")
      Cc: Dave Jiang <dave.jiang@intel.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      896196dc
  2. 14 3月, 2018 1 次提交
  3. 08 3月, 2018 1 次提交
  4. 03 3月, 2018 1 次提交
    • D
      libnvdimm: re-enable deep flush for pmem devices via fsync() · 5fdf8e5b
      Dave Jiang 提交于
      Re-enable deep flush so that users always have a way to be sure that a
      write makes it all the way out to media. Writes from the PMEM driver
      always arrive at the NVDIMM since movnt is used to bypass the cache, and
      the driver relies on the ADR (Asynchronous DRAM Refresh) mechanism to
      flush write buffers on power failure. The Deep Flush mechanism is there
      to explicitly write buffers to protect against (rare) ADR failure.  This
      change prevents a regression in deep flush behavior so that applications
      can continue to depend on fsync() as a mechanism to trigger deep flush
      in the filesystem-DAX case.
      
      Fixes: 06e8ccda ("acpi: nfit: Add support for detect platform CPU cache...")
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5fdf8e5b
  5. 03 2月, 2018 1 次提交
  6. 02 2月, 2018 2 次提交
  7. 20 1月, 2018 1 次提交
    • J
      libnvdimm, btt: fix uninitialized err_lock · d08cd5e0
      Jeff Moyer 提交于
      When a sector mode namespace is initially created, the arena's err_lock
      is not initialized.  If, on the other hand, the namespace already
      exists, the mutex is initialized.  To fix the issue, I moved the mutex
      initialization into the arena_alloc, which is called by both
      discover_arenas and create_arenas.
      
      This was discovered on an older kernel where mutex_trylock checks the
      count to determine whether the lock is held.  Because the data structure
      is kzalloc-d, that count was 0 (held), and I/O to the device would hang
      forever waiting for the lock to be released (see btt_write_pg, for
      example).  Current kernels have a different mutex implementation that
      checks for a non-null owner, and so this doesn't show up as a problem.
      If that lock were ever contended, it might cause issues, but you'd have
      to be really unlucky, I think.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d08cd5e0
  8. 09 1月, 2018 1 次提交
  9. 22 12月, 2017 2 次提交
  10. 20 12月, 2017 2 次提交
    • D
      libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment · 41fce90f
      Dan Williams 提交于
      The following namespace configuration attempt:
      
          # ndctl create-namespace -e namespace0.0 -m devdax -a 1G -f
          libndctl: ndctl_dax_enable: dax0.1: failed to enable
            Error: namespace0.0: failed to enable
      
          failed to reconfigure namespace: No such device or address
      
      ...fails when the backing memory range is not physically aligned to 1G:
      
          # cat /proc/iomem | grep Persistent
          210000000-30fffffff : Persistent Memory (legacy)
      
      In the above example the 4G persistent memory range starts and ends on a
      256MB boundary.
      
      We handle this case correctly when needing to handle cases that violate
      section alignment (128MB) collisions against "System RAM", and we simply
      need to extend that padding/truncation for the 1GB alignment use case.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 315c5625 ("libnvdimm, pfn: add 'align' attribute...")
      Reported-and-tested-by: NJane Chu <jane.chu@oracle.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      41fce90f
    • D
      libnvdimm, pfn: fix start_pad handling for aligned namespaces · 19deaa21
      Dan Williams 提交于
      The alignment checks at pfn driver startup fail to properly account for
      the 'start_pad' in the case where the namespace is misaligned relative
      to its internal alignment. This is typically triggered in 1G aligned
      namespace, but could theoretically trigger with small namespace
      alignments. When this triggers the kernel reports messages of the form:
      
          dax2.1: bad offset: 0x3c000000 dax disabled align: 0x40000000
      
      Cc: <stable@vger.kernel.org>
      Fixes: 1ee6667c ("libnvdimm, pfn, dax: fix initialization vs autodetect...")
      Reported-by: NJane Chu <jane.chu@oracle.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      19deaa21
  11. 05 12月, 2017 1 次提交
    • D
      nfit, libnvdimm: deprecate the generic SMART ioctl · cdd77d3e
      Dan Williams 提交于
      The kernel's ND_IOCTL_SMART_THRESHOLD command is based on a payload
      definition that has become broken / out-of-sync with recent versions of
      the NVDIMM_FAMILY_INTEL definition. Deprecate the use of the
      ND_IOCTL_SMART_THRESHOLD command in favor of the ND_CMD_CALL approach
      taken by NVDIMM_FAMILY_{HPE,MSFT}, where we can manage the per-vendor
      variance in userspace.
      
      In a couple years, when the new scheme is widely deployed in userspace
      packages, the ND_IOCTL_SMART_THRESHOLD support can be removed. For now
      we prevent new binaries from compiling against the kernel header
      definitions, but kernel still compatible with old binaries. The
      libndctl.h [1] header is now the authoritative interface definition for
      NVDIMM SMART.
      
      [1]: https://github.com/pmem/ndctlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      cdd77d3e
  12. 16 11月, 2017 1 次提交
  13. 03 11月, 2017 2 次提交
  14. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  15. 12 10月, 2017 1 次提交
  16. 08 10月, 2017 3 次提交
    • C
      libnvdimm, namespace: make a couple of functions static · 65853a1d
      Colin Ian King 提交于
      The functions create_namespace_pmem and create_namespace_blk are local
      to the source and do not need to be in global scope, so make them static.
      
      Cleans up sparse warnings:
      symbol 'create_namespace_pmem' was not declared. Should it be static?
      symbol 'create_namespace_blk' was not declared. Should it be static?
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      65853a1d
    • D
      libnvdimm: introduce 'flags' attribute for DIMM 'lock' and 'alias' status · efbf6f50
      Dan Williams 提交于
      Given that we now how have two mechanisms for a DIMM to indicate that it
      is locked:
      
          * NVDIMM_FAMILY_INTEL 'get_config_size' _DSM command
      
          * ACPI 6.2 Label Storage Read / Write commands
      
      ...export the generic libnvdimm DIMM status in a new 'flags' attribute.
      
      This attribute can also reflect the 'alias' state which indicates
      whether the nvdimm core is enforcing labels for aliased-region-capacity
      that the given dimm is an interleave-set member.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      efbf6f50
    • D
      acpi, nfit: add support for the _LSI, _LSR, and _LSW label methods · 4b27db7e
      Dan Williams 提交于
      ACPI 6.2 adds support for named methods to access the label storage area
      of an NVDIMM. We prefer these new methods if available and otherwise
      fallback to the NVDIMM_FAMILY_INTEL _DSMs. The kernel ioctls,
      ND_IOCTL_{GET,SET}_CONFIG_{SIZE,DATA}, remain generic and the driver
      translates the 'package' payloads into the NVDIMM_FAMILY_INTEL 'buffer'
      format to maintain compatibility with existing userspace and keep the
      output buffer parsing code in the driver common.
      
      The output payloads are mostly compatible save for the 'label area
      locked' status that moves from the 'config_size' (_LSI) command to the
      'config_read' (_LSR) command status.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      4b27db7e
  17. 29 9月, 2017 5 次提交
  18. 19 9月, 2017 1 次提交
    • D
      libnvdimm, namespace: fix btt claim class crash · 33a56086
      Dan Williams 提交于
      Maurice reports:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
          IP: holder_class_store+0x253/0x2b0 [libnvdimm]
      
      ...while trying to reconfigure an NVDIMM-N namespace into 'sector' /
      'btt' mode. The crash points to this line:
      
          (gdb) li *(holder_class_store+0x253)
          0x7773 is in holder_class_store (drivers/nvdimm/namespace_devs.c:1420).
          1415            for (i = 0; i < nd_region->ndr_mappings; i++) {
          1416                    struct nd_mapping *nd_mapping = &nd_region->mapping[i];
          1417                    struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
          1418                    struct nd_namespace_index *nsindex;
          1419
          1420                    nsindex = to_namespace_index(ndd, ndd->ns_current);
      
      ...where we are failing because ndd is NULL due to NVDIMM-N dimms not
      supporting labels.
      
      Long story short, default to the BTTv1 format in the label-less /
      NVDIMM-N case.
      
      Fixes: 14e49454 ("libnvdimm, btt: BTT updates for UEFI 2.7 format")
      Cc: <stable@vger.kernel.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Reported-by: NMaurice A. Saldivar <maurice.a.saldivar@hpe.com>
      Tested-by: NMaurice A. Saldivar <maurice.a.saldivar@hpe.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      33a56086
  19. 11 9月, 2017 1 次提交
    • M
      dax: remove the pmem_dax_ops->flush abstraction · c3ca015f
      Mikulas Patocka 提交于
      Commit abebfbe2 ("dm: add ->flush() dax operation support") is
      buggy. A DM device may be composed of multiple underlying devices and
      all of them need to be flushed. That commit just routes the flush
      request to the first device and ignores the other devices.
      
      It could be fixed by adding more complex logic to the device mapper. But
      there is only one implementation of the method pmem_dax_ops->flush - that
      is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
      don't need the pmem_dax_ops->flush abstraction at all, we can call
      arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
      can't ever reach anything different from arch_wb_cache_pmem().
      
      It should be also pointed out that for some uses of persistent memory it
      is needed to flush only a very small amount of data (such as 1 cacheline),
      and it would be overkill if we go through that device mapper machinery for
      a single flushed cache line.
      
      Fix this by removing the pmem_dax_ops->flush abstraction and call
      arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
      mapper code that forwards the flushes.
      
      Fixes: abebfbe2 ("dm: add ->flush() dax operation support")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      c3ca015f
  20. 10 9月, 2017 1 次提交
  21. 08 9月, 2017 1 次提交
  22. 07 9月, 2017 1 次提交
  23. 05 9月, 2017 1 次提交
    • M
      libnvdimm, nfit: move the check on nd_reserved2 to the endpoint · 9edcad53
      Meng Xu 提交于
      Delay the check of nd_reserved2 to the actual endpoint (acpi_nfit_ctl)
      that uses it, as a prevention of a potential double-fetch bug.
      
      While examining the kernel source code, I found a dangerous operation that
      could turn into a double-fetch situation (a race condition bug) where
      the same userspace memory region are fetched twice into kernel with sanity
      checks after the first fetch while missing checks after the second fetch.
      
      In the case of _IOC_NR(ioctl_cmd) == ND_CMD_CALL:
      
      1. The first fetch happens in line 935 copy_from_user(&pkg, p, sizeof(pkg)
      
      2. subsequently `pkg.nd_reserved2` is asserted to be all zeroes
      (line 984 to 986).
      
      3. The second fetch happens in line 1022 copy_from_user(buf, p, buf_len)
      
      4. Given that `p` can be fully controlled in userspace, an attacker can
      race condition to override the header part of `p`, say,
      `((struct nd_cmd_pkg *)p)->nd_reserved2` to arbitrary value
      (say nine 0xFFFFFFFF for `nd_reserved2`) after the first fetch but before the
      second fetch. The changed value will be copied to `buf`.
      
      5. There is no checks on the second fetches until the use of it in
      line 1034: nd_cmd_clear_to_send(nvdimm_bus, nvdimm, cmd, buf) and
      line 1038: nd_desc->ndctl(nd_desc, nvdimm, cmd, buf, buf_len, &cmd_rc)
      which means that the assumed relation, `p->nd_reserved2` are all zeroes might
      not hold after the second fetch. And once the control goes to these functions
      we lose the context to assert the assumed relation.
      
      6. Based on my manual analysis, `p->nd_reserved2` is not used in function
      `nd_cmd_clear_to_send` and potential implementations of `nd_desc->ndctl`
      so there is no working exploit against it right now. However, this could
      easily turns to an exploitable one if careless developers start to use
      `p->nd_reserved2` later and assume that they are all zeroes.
      
      Move the validation of the nd_reserved2 field to the ->ndctl()
      implementation where it has a stable buffer to evaluate.
      Signed-off-by: NMeng Xu <mengxu.gatech@gmail.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9edcad53
  24. 01 9月, 2017 6 次提交
    • D
      libnvdimm: fix integer overflow static analysis warning · 58738c49
      Dan Williams 提交于
      Dan reports:
          The patch 62232e45: "libnvdimm: control (ioctl) messages for
          nvdimm_bus and nvdimm devices" from Jun 8, 2015, leads to the
          following static checker warning:
      
                  drivers/nvdimm/bus.c:1018 __nd_ioctl()
                  warn: integer overflows 'buf_len'
      
          From a casual review, this seems like it might be a real bug.  On
          the first iteration we load some data into in_env[].  On the second
          iteration we read a use controlled "in_size" from nd_cmd_in_size().
          It can go up to UINT_MAX - 1.  A high number means we will fill the
          whole in_env[] buffer.  But we potentially keep looping and adding
          more to in_len so now it can be any value.
      
          It simple enough to change, but it feels weird that we keep looping
          even though in_env is totally full.  Shouldn't we just return an
          error if we don't have space for desc->in_num.
      
      We keep looping because the size of the total input is allowed to be
      bigger than the 'envelope' which is a subset of the payload that tells
      us how much data to expect. For safety explicitly check that buf_len
      does not overflow which is what the checker flagged.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 62232e45: "libnvdimm: control (ioctl) messages for nvdimm_bus..."
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      58738c49
    • R
      libnvdimm, nd_blk: remove mmio_flush_range() · 5deb67f7
      Robin Murphy 提交于
      mmio_flush_range() suffers from a lack of clearly-defined semantics,
      and is somewhat ambiguous to port to other architectures where the
      scope of the writeback implied by "flush" and ordering might matter,
      but MMIO would tend to imply non-cacheable anyway. Per the rationale
      in 67a3e8fe ("nd_blk: change aperture mapping from WC to WB"), the
      only existing use is actually to invalidate clean cache lines for
      ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
      cleanup of the pmem API, that also now happens to be the exact purpose
      of arch_invalidate_pmem(), which would be a far more well-defined tool
      for the job.
      
      Rather than risk potentially inconsistent implementations of
      mmio_flush_range() for the sake of one callsite, streamline things by
      removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
      definitions up to the libnvdimm level, so they can be shared by NFIT
      as well. This allows NFIT to be enabled for arm64.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5deb67f7
    • V
      libnvdimm, btt: rework error clearing · d9b83c75
      Vishal Verma 提交于
      Clearing errors or badblocks during a BTT write requires sending an ACPI
      DSM, which means potentially sleeping. Since a BTT IO happens in atomic
      context (preemption disabled, spinlocks may be held), we cannot perform
      error clearing in the course of an IO. Due to this error clearing for
      BTT IOs has hitherto been disabled.
      
      In this patch we move error clearing out of the atomic section, and thus
      re-enable error clearing with BTTs. When we are about to add a block to
      the free list, we check if it was previously marked as an error, and if
      it was, we add it to the freelist, but also set a flag that says error
      clearing will be required. We then drop the lane (ending the atomic
      context), and send a zero buffer so that the error can be cleared. The
      error flag in the free list is protected by the nd 'lane', and is set
      only be a thread while it holds that lane. When the error is cleared,
      the flag is cleared, but while holding a mutex for that freelist index.
      
      When writing, we check for two things -
      1/ If the freelist mutex is held or if the error flag is set. If so,
      this is an error block that is being (or about to be) cleared.
      2/ If the block is a known badblock based on nsio->bb
      
      The second check is required because the BTT map error flag for a map
      entry only gets set when an error LBA is read. If we write to a new
      location that may not have the map error flag set, but still might be in
      the region's badblock list, we can trigger an EIO on the write, which is
      undesirable and completely avoidable.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d9b83c75
    • V
      libnvdimm: fix potential deadlock while clearing errors · 0930a750
      Vishal Verma 提交于
      With the ACPI NFIT 'DSM' methods, acpi can be called from IO paths.
      Specifically, the DSM to clear media errors is called during writes, so
      that we can provide a writes-fix-errors model.
      
      However it is easy to imagine a scenario like:
       -> write through the nvdimm driver
         -> acpi allocation
           -> writeback, causes more IO through the nvdimm driver
             -> deadlock
      
      Fix this by using memalloc_noio_{save,restore}, which sets the GFP_NOIO
      flag for the current scope when issuing commands/IOs that are expected
      to clear errors.
      
      Cc: <linux-acpi@vger.kernel.org>
      Cc: <linux-nvdimm@lists.01.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Robert Moore <robert.moore@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0930a750
    • V
      libnvdimm, btt: cache sector_size in arena_info · 75892004
      Vishal Verma 提交于
      In preparation for the error clearing rework, add sector_size in the
      arena_info struct.
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      75892004
    • V
      libnvdimm, btt: ensure that flags were also unchanged during a map_read · 1398199d
      Vishal Verma 提交于
      In btt_map_read, we read the map twice to make sure that the map entry
      didn't change after we added it to the read tracking table. In
      anticipation of expanding the use of the error bit, also make sure that
      the error and zero flags are constant across the two map reads.
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1398199d