1. 07 4月, 2018 2 次提交
    • D
      nfit, address-range-scrub: rework and simplify ARS state machine · bc6ba808
      Dan Williams 提交于
      ARS is an operation that can take 10s to 100s of seconds to find media
      errors that should rarely be present. If the platform crashes due to
      media errors in persistent memory, the expectation is that the BIOS will
      report those known errors in a 'short' ARS request.
      
      A 'short' ARS request asks platform firmware to return an ARS payload
      with all known errors, but without issuing a 'long' scrub. At driver
      init a short request is issued to all PMEM ranges before registering
      regions. Then, in the background, a long ARS is scheduled for each
      region.
      
      The ARS implementation is simplified to centralize ARS completion work
      in the ars_complete() helper. The timeout is removed since there is no
      facility to cancel ARS, and this otherwise arranges for system init to
      never be blocked waiting for a 'long' ARS. The ars_state flags are used
      to coordinate ARS requests from driver init, ARS requests from
      userspace, and ARS requests in response to media error notifications.
      
      Given that there is no notification of ARS completion the implementation
      still needs to poll. It backs off exponentially to a maximum poll period
      of 30 minutes.
      Suggested-by: NToshi Kani <toshi.kani@hpe.com>
      Co-developed-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      bc6ba808
    • D
      nfit, address-range-scrub: determine one platform max_ars value · 459d0ddb
      Dan Williams 提交于
      acpi_nfit_query_poison() is awkward in that it requires an nfit_spa
      argument in order to determine what max_ars value to use. Instead probe
      for the minimum max_ars across all scrub-capable ranges in the system
      and drop the nfit_spa argument.
      
      This enables a larger rework / simplification of the ARS state machine
      whereby the status can be retrieved once and then iterated over all
      address ranges to reap completions.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      459d0ddb
  2. 06 4月, 2018 1 次提交
  3. 04 4月, 2018 1 次提交
  4. 03 4月, 2018 1 次提交
  5. 29 3月, 2018 1 次提交
    • D
      acpi, nfit: rework NVDIMM leaf method detection · 466d1493
      Dan Williams 提交于
      Some BIOSen do not handle 0-byte transfer lengths for the _LSR and _LSW
      (label storage read/write) methods. This causes Linux to fallback to the
      deprecated _DSM path, or otherwise disable label support.
      
      Introduce acpi_nvdimm_has_method() to detect whether a method is
      available rather than calling the method, require _LSI and _LSR to be
      paired, and require read support before enabling write support.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 4b27db7e ("acpi, nfit: add support for the _LS...")
      Suggested-by: NErik Schmauss <erik.schmauss@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      466d1493
  6. 22 3月, 2018 1 次提交
  7. 06 3月, 2018 1 次提交
  8. 03 2月, 2018 1 次提交
  9. 02 2月, 2018 2 次提交
  10. 05 12月, 2017 1 次提交
    • D
      acpi, nfit: fix health event notification · adf68957
      Dan Williams 提交于
      Integration testing with a BIOS that generates injected health event
      notifications fails to communicate those events to userspace. The nfit
      driver neglects to link the ACPI DIMM device with the necessary driver
      data so acpi_nvdimm_notify() fails this lookup:
      
              nfit_mem = dev_get_drvdata(dev);
              if (nfit_mem && nfit_mem->flags_attr)
                      sysfs_notify_dirent(nfit_mem->flags_attr);
      
      Add the necessary linkage when installing the notification handler and
      clean it up when the nfit driver instance is torn down.
      
      Cc: <stable@vger.kernel.org>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Fixes: ba9c8dd3 ("acpi, nfit: add dimm device notification support")
      Reported-by: NDaniel Osawa <daniel.k.osawa@intel.com>
      Tested-by: NDaniel Osawa <daniel.k.osawa@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      adf68957
  11. 13 11月, 2017 1 次提交
    • D
      acpi, nfit: validate commands against the device type · 0e7f0741
      Dan Williams 提交于
      Fix occasions in acpi_nfit_ctl where we check the command type without
      validating whether we are parsing dimm vs bus level commands. Where the
      command numbers alias between dimms and bus we can make the wrong
      assumption just checking the raw command number. For example, with a
      simple nfit_test mock up of the clear-error command we trigger the
      following:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000094
          IP: acpi_nfit_ctl+0x29b/0x930 [nfit]
          [..]
          Call Trace:
           nfit_test_probe+0xb85/0xc09 [nfit_test]
           platform_drv_probe+0x3b/0xa0
           ? platform_drv_probe+0x3b/0xa0
           driver_probe_device+0x29c/0x450
           ? test_alloc+0x180/0x180 [nfit_test]
           __driver_attach+0xe3/0xf0
           ? driver_probe_device+0x450/0x450
           bus_for_each_dev+0x73/0xc0
           driver_attach+0x1e/0x20
           bus_add_driver+0x173/0x270
           driver_register+0x60/0xe0
           __platform_driver_register+0x36/0x40
           nfit_test_init+0x2a1/0x1000 [nfit_test]
      
      Fixes: 4b27db7e ("acpi, nfit: add support for the _LSI, _LSR, and...")
      Reported-by: NVishal Verma <vishal.l.verma@intel.com>
      Tested-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0e7f0741
  12. 03 11月, 2017 1 次提交
  13. 31 10月, 2017 1 次提交
  14. 30 10月, 2017 1 次提交
  15. 08 10月, 2017 2 次提交
  16. 05 9月, 2017 1 次提交
    • M
      libnvdimm, nfit: move the check on nd_reserved2 to the endpoint · 9edcad53
      Meng Xu 提交于
      Delay the check of nd_reserved2 to the actual endpoint (acpi_nfit_ctl)
      that uses it, as a prevention of a potential double-fetch bug.
      
      While examining the kernel source code, I found a dangerous operation that
      could turn into a double-fetch situation (a race condition bug) where
      the same userspace memory region are fetched twice into kernel with sanity
      checks after the first fetch while missing checks after the second fetch.
      
      In the case of _IOC_NR(ioctl_cmd) == ND_CMD_CALL:
      
      1. The first fetch happens in line 935 copy_from_user(&pkg, p, sizeof(pkg)
      
      2. subsequently `pkg.nd_reserved2` is asserted to be all zeroes
      (line 984 to 986).
      
      3. The second fetch happens in line 1022 copy_from_user(buf, p, buf_len)
      
      4. Given that `p` can be fully controlled in userspace, an attacker can
      race condition to override the header part of `p`, say,
      `((struct nd_cmd_pkg *)p)->nd_reserved2` to arbitrary value
      (say nine 0xFFFFFFFF for `nd_reserved2`) after the first fetch but before the
      second fetch. The changed value will be copied to `buf`.
      
      5. There is no checks on the second fetches until the use of it in
      line 1034: nd_cmd_clear_to_send(nvdimm_bus, nvdimm, cmd, buf) and
      line 1038: nd_desc->ndctl(nd_desc, nvdimm, cmd, buf, buf_len, &cmd_rc)
      which means that the assumed relation, `p->nd_reserved2` are all zeroes might
      not hold after the second fetch. And once the control goes to these functions
      we lose the context to assert the assumed relation.
      
      6. Based on my manual analysis, `p->nd_reserved2` is not used in function
      `nd_cmd_clear_to_send` and potential implementations of `nd_desc->ndctl`
      so there is no working exploit against it right now. However, this could
      easily turns to an exploitable one if careless developers start to use
      `p->nd_reserved2` later and assume that they are all zeroes.
      
      Move the validation of the nd_reserved2 field to the ->ndctl()
      implementation where it has a stable buffer to evaluate.
      Signed-off-by: NMeng Xu <mengxu.gatech@gmail.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9edcad53
  17. 01 9月, 2017 2 次提交
    • R
      libnvdimm, nd_blk: remove mmio_flush_range() · 5deb67f7
      Robin Murphy 提交于
      mmio_flush_range() suffers from a lack of clearly-defined semantics,
      and is somewhat ambiguous to port to other architectures where the
      scope of the writeback implied by "flush" and ordering might matter,
      but MMIO would tend to imply non-cacheable anyway. Per the rationale
      in 67a3e8fe ("nd_blk: change aperture mapping from WC to WB"), the
      only existing use is actually to invalidate clean cache lines for
      ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
      cleanup of the pmem API, that also now happens to be the exact purpose
      of arch_invalidate_pmem(), which would be a far more well-defined tool
      for the job.
      
      Rather than risk potentially inconsistent implementations of
      mmio_flush_range() for the sake of one callsite, streamline things by
      removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
      definitions up to the libnvdimm level, so they can be shared by NFIT
      as well. This allows NFIT to be enabled for arm64.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5deb67f7
    • D
      libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute · a15797f4
      Dan Williams 提交于
      When the nfit driver initializes it runs an ARS (Address Range Scrub)
      operation across every pmem range. Part of that process involves
      determining the ARS capabilities of a given address range. One of the
      capabilities that is reported is the 'Clear Uncorrectable Error Range
      Length Unit Size' (see: ACPI 6.2 section 9.20.7.4 Function Index 1 -
      Query ARS Capabilities). This property is of interest to userspace
      software as it indicates the boundary at which the NVDIMM may need to
      perform read-modify-write cycles to maintain ECC blocks.
      
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      a15797f4
  18. 29 8月, 2017 1 次提交
    • B
      acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse · 1c322ac0
      Boqun Feng 提交于
      COMPLETION_INITIALIZER_ONSTACK() is supposed to be used as an initializer,
      in other words, it should only be used in assignment expressions or
      compound literals. So the usage in drivers/acpi/nfit/core.c:
      
      	COMPLETION_INITIALIZER_ONSTACK(flush.cmp);
      
      ... is inappropriate.
      
      Besides, this usage could also break the build for another fix that
      reduces stack sizes caused by COMPLETION_INITIALIZER_ONSTACK(), because
      that fix changes COMPLETION_INITIALIZER_ONSTACK() from rvalue to lvalue,
      and usage as above will report the following error:
      
      	drivers/acpi/nfit/core.c: In function 'acpi_nfit_flush_probe':
      	include/linux/completion.h:77:3: error: value computed is not used [-Werror=unused-value]
      	  (*({ init_completion(&work); &work; }))
      
      This patch fixes this by replacing COMPLETION_INITIALIZER_ONSTACK()
      with init_completion() in acpi_nfit_flush_probe(), which does the
      same initialization without any other problems.
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: walken@google.com
      Cc: willy@infradead.org
      Link: http://lkml.kernel.org/r/20170824142239.15178-1-boqun.feng@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1c322ac0
  19. 08 8月, 2017 1 次提交
  20. 05 8月, 2017 1 次提交
  21. 18 7月, 2017 1 次提交
  22. 03 7月, 2017 1 次提交
  23. 01 7月, 2017 3 次提交
  24. 30 6月, 2017 1 次提交
  25. 28 6月, 2017 2 次提交
    • D
      libnvdimm, nfit: enable support for volatile ranges · c9e582aa
      Dan Williams 提交于
      Allow volatile nfit ranges to participate in all the same infrastructure
      provided for persistent memory regions. A resulting resulting namespace
      device will still be called "pmem", but the parent region type will be
      "nd_volatile". This is in preparation for disabling the dax ->flush()
      operation in the pmem driver when it is hosted on a volatile range.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c9e582aa
    • D
      x86, libnvdimm, pmem: remove global pmem api · ca6a4657
      Dan Williams 提交于
      Now that all callers of the pmem api have been converted to dax helpers that
      call back to the pmem driver, we can remove include/linux/pmem.h and
      asm/pmem.h.
      
      Cc: <x86@kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ca6a4657
  26. 16 6月, 2017 4 次提交
  27. 10 6月, 2017 1 次提交
    • D
      x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations · 0aed55af
      Dan Williams 提交于
      The pmem driver has a need to transfer data with a persistent memory
      destination and be able to rely on the fact that the destination writes are not
      cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
      (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
      to ensure data-writes have reached a power-fail-safe zone in the platform. The
      fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
      around and fence previous writes with an "sfence".
      
      Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
      memcpy_flushcache, that guarantee that the destination buffer is not dirty in
      the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
      will be used to replace the "pmem api" (include/linux/pmem.h +
      arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
      and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
      config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
      otherwise.
      
      This is meant to satisfy the concern from Linus that if a driver wants to do
      something beyond the normal nocache semantics it should be something private to
      that driver [1], and Al's concern that anything uaccess related belongs with
      the rest of the uaccess code [2].
      
      The first consumer of this interface is a new 'copy_from_iter' dax operation so
      that pmem can inject cache maintenance operations without imposing this
      overhead on other dax-capable drivers.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0aed55af
  28. 07 6月, 2017 1 次提交
  29. 06 6月, 2017 1 次提交
  30. 05 5月, 2017 1 次提交