1. 05 9月, 2017 1 次提交
    • M
      libnvdimm, nfit: move the check on nd_reserved2 to the endpoint · 9edcad53
      Meng Xu 提交于
      Delay the check of nd_reserved2 to the actual endpoint (acpi_nfit_ctl)
      that uses it, as a prevention of a potential double-fetch bug.
      
      While examining the kernel source code, I found a dangerous operation that
      could turn into a double-fetch situation (a race condition bug) where
      the same userspace memory region are fetched twice into kernel with sanity
      checks after the first fetch while missing checks after the second fetch.
      
      In the case of _IOC_NR(ioctl_cmd) == ND_CMD_CALL:
      
      1. The first fetch happens in line 935 copy_from_user(&pkg, p, sizeof(pkg)
      
      2. subsequently `pkg.nd_reserved2` is asserted to be all zeroes
      (line 984 to 986).
      
      3. The second fetch happens in line 1022 copy_from_user(buf, p, buf_len)
      
      4. Given that `p` can be fully controlled in userspace, an attacker can
      race condition to override the header part of `p`, say,
      `((struct nd_cmd_pkg *)p)->nd_reserved2` to arbitrary value
      (say nine 0xFFFFFFFF for `nd_reserved2`) after the first fetch but before the
      second fetch. The changed value will be copied to `buf`.
      
      5. There is no checks on the second fetches until the use of it in
      line 1034: nd_cmd_clear_to_send(nvdimm_bus, nvdimm, cmd, buf) and
      line 1038: nd_desc->ndctl(nd_desc, nvdimm, cmd, buf, buf_len, &cmd_rc)
      which means that the assumed relation, `p->nd_reserved2` are all zeroes might
      not hold after the second fetch. And once the control goes to these functions
      we lose the context to assert the assumed relation.
      
      6. Based on my manual analysis, `p->nd_reserved2` is not used in function
      `nd_cmd_clear_to_send` and potential implementations of `nd_desc->ndctl`
      so there is no working exploit against it right now. However, this could
      easily turns to an exploitable one if careless developers start to use
      `p->nd_reserved2` later and assume that they are all zeroes.
      
      Move the validation of the nd_reserved2 field to the ->ndctl()
      implementation where it has a stable buffer to evaluate.
      Signed-off-by: NMeng Xu <mengxu.gatech@gmail.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9edcad53
  2. 01 9月, 2017 2 次提交
    • R
      libnvdimm, nd_blk: remove mmio_flush_range() · 5deb67f7
      Robin Murphy 提交于
      mmio_flush_range() suffers from a lack of clearly-defined semantics,
      and is somewhat ambiguous to port to other architectures where the
      scope of the writeback implied by "flush" and ordering might matter,
      but MMIO would tend to imply non-cacheable anyway. Per the rationale
      in 67a3e8fe ("nd_blk: change aperture mapping from WC to WB"), the
      only existing use is actually to invalidate clean cache lines for
      ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
      cleanup of the pmem API, that also now happens to be the exact purpose
      of arch_invalidate_pmem(), which would be a far more well-defined tool
      for the job.
      
      Rather than risk potentially inconsistent implementations of
      mmio_flush_range() for the sake of one callsite, streamline things by
      removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
      definitions up to the libnvdimm level, so they can be shared by NFIT
      as well. This allows NFIT to be enabled for arm64.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5deb67f7
    • D
      libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute · a15797f4
      Dan Williams 提交于
      When the nfit driver initializes it runs an ARS (Address Range Scrub)
      operation across every pmem range. Part of that process involves
      determining the ARS capabilities of a given address range. One of the
      capabilities that is reported is the 'Clear Uncorrectable Error Range
      Length Unit Size' (see: ACPI 6.2 section 9.20.7.4 Function Index 1 -
      Query ARS Capabilities). This property is of interest to userspace
      software as it indicates the boundary at which the NVDIMM may need to
      perform read-modify-write cycles to maintain ECC blocks.
      
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      a15797f4
  3. 08 8月, 2017 1 次提交
  4. 05 8月, 2017 1 次提交
  5. 18 7月, 2017 1 次提交
  6. 03 7月, 2017 1 次提交
  7. 01 7月, 2017 3 次提交
  8. 30 6月, 2017 1 次提交
  9. 28 6月, 2017 2 次提交
    • D
      libnvdimm, nfit: enable support for volatile ranges · c9e582aa
      Dan Williams 提交于
      Allow volatile nfit ranges to participate in all the same infrastructure
      provided for persistent memory regions. A resulting resulting namespace
      device will still be called "pmem", but the parent region type will be
      "nd_volatile". This is in preparation for disabling the dax ->flush()
      operation in the pmem driver when it is hosted on a volatile range.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c9e582aa
    • D
      x86, libnvdimm, pmem: remove global pmem api · ca6a4657
      Dan Williams 提交于
      Now that all callers of the pmem api have been converted to dax helpers that
      call back to the pmem driver, we can remove include/linux/pmem.h and
      asm/pmem.h.
      
      Cc: <x86@kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ca6a4657
  10. 16 6月, 2017 4 次提交
  11. 10 6月, 2017 1 次提交
    • D
      x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations · 0aed55af
      Dan Williams 提交于
      The pmem driver has a need to transfer data with a persistent memory
      destination and be able to rely on the fact that the destination writes are not
      cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
      (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
      to ensure data-writes have reached a power-fail-safe zone in the platform. The
      fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
      around and fence previous writes with an "sfence".
      
      Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
      memcpy_flushcache, that guarantee that the destination buffer is not dirty in
      the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
      will be used to replace the "pmem api" (include/linux/pmem.h +
      arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
      and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
      config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
      otherwise.
      
      This is meant to satisfy the concern from Linus that if a driver wants to do
      something beyond the normal nocache semantics it should be something private to
      that driver [1], and Al's concern that anything uaccess related belongs with
      the rest of the uaccess code [2].
      
      The first consumer of this interface is a new 'copy_from_iter' dax operation so
      that pmem can inject cache maintenance operations without imposing this
      overhead on other dax-capable drivers.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0aed55af
  12. 07 6月, 2017 1 次提交
  13. 06 6月, 2017 1 次提交
  14. 05 5月, 2017 2 次提交
  15. 29 4月, 2017 1 次提交
    • D
      acpi, nfit: kill ACPI_NFIT_DEBUG · 7699a6a3
      Dan Williams 提交于
      Inevitably when one actually needs to debug a DSM issue it's on a
      distribution kernel that has CONFIG_ACPI_NFIT_DEBUG=n. The config symbol
      was only there to avoid the compile error due to the missing fallback for
      print_hex_dump_debug in the CONFIG_DYNAMIC_DEBUG=n case. That was fixed
      with commit cdf17449 "hexdump: do not print debug dumps for
      !CONFIG_DEBUG", so the config symbol can just be dropped.
      
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      7699a6a3
  16. 26 4月, 2017 1 次提交
    • D
      x86, dax, pmem: remove indirection around memcpy_from_pmem() · 6abccd1b
      Dan Williams 提交于
      memcpy_from_pmem() maps directly to memcpy_mcsafe(). The wrapper
      serves no real benefit aside from affording a more generic function name
      than the x86-specific 'mcsafe'. However this would not be the first time
      that x86 terminology leaked into the global namespace. For lack of
      better name, just use memcpy_mcsafe() directly.
      
      This conversion also catches a place where we should have been using
      plain memcpy, acpi_nfit_blk_single_io().
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6abccd1b
  17. 19 4月, 2017 1 次提交
  18. 18 4月, 2017 3 次提交
  19. 15 4月, 2017 1 次提交
  20. 13 4月, 2017 4 次提交
  21. 28 3月, 2017 1 次提交
  22. 01 3月, 2017 1 次提交
    • D
      nfit, libnvdimm: fix interleave set cookie calculation · 86ef58a4
      Dan Williams 提交于
      The interleave-set cookie is a sum that sanity checks the composition of
      an interleave set has not changed from when the namespace was initially
      created.  The checksum is calculated by sorting the DIMMs by their
      location in the interleave-set. The comparison for the sort must be
      64-bit wide, not byte-by-byte as performed by memcmp() in the broken
      case.
      
      Fix the implementation to accept correct cookie values in addition to
      the Linux "memcmp" order cookies, but only allow correct cookies to be
      generated going forward. It does mean that namespaces created by
      third-party-tooling, or created by newer kernels with this fix, will not
      validate on older kernels. However, there are a couple mitigating
      conditions:
      
          1/ platforms with namespace-label capable NVDIMMs are not widely
             available.
      
          2/ interleave-sets with a single-dimm are by definition not affected
             (nothing to sort). This covers the QEMU-KVM NVDIMM emulation case.
      
      The cookie stored in the namespace label will be fixed by any write the
      namespace label, the most straightforward way to achieve this is to
      write to the "alt_name" attribute of a namespace in sysfs.
      
      Cc: <stable@vger.kernel.org>
      Fixes: eaf96153 ("libnvdimm, nfit: add interleave-set state-tracking infrastructure")
      Reported-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com>
      Tested-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      86ef58a4
  23. 04 2月, 2017 1 次提交
    • D
      acpi, nfit: fix acpi_nfit_flush_probe() crash · e471486c
      Dan Williams 提交于
      We queue an on-stack work item to 'nfit_wq' and wait for it to complete
      as part of a 'flush_probe' request. However, if the user cancels the
      wait we need to make sure the item is flushed from the queue otherwise
      we are leaving an out-of-scope stack address on the work list.
      
       BUG: unable to handle kernel paging request at ffffbcb3c72f7cd0
       IP: [<ffffffffa9413a7b>] __list_add+0x1b/0xb0
       [..]
       RIP: 0010:[<ffffffffa9413a7b>]  [<ffffffffa9413a7b>] __list_add+0x1b/0xb0
       RSP: 0018:ffffbcb3c7ba7c00  EFLAGS: 00010046
       [..]
       Call Trace:
        [<ffffffffa90bb11a>] insert_work+0x3a/0xc0
        [<ffffffffa927fdda>] ? seq_open+0x5a/0xa0
        [<ffffffffa90bb30a>] __queue_work+0x16a/0x460
        [<ffffffffa90bbb08>] queue_work_on+0x38/0x40
        [<ffffffffc0cf2685>] acpi_nfit_flush_probe+0x95/0xc0 [nfit]
        [<ffffffffc0cf25d0>] ? nfit_visible+0x40/0x40 [nfit]
        [<ffffffffa9571495>] wait_probe_show+0x25/0x60
        [<ffffffffa9546b30>] dev_attr_show+0x20/0x50
      
      Fixes: 7ae0fa43 ("nfit, libnvdimm: async region scrub workqueue")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e471486c
  24. 21 12月, 2016 1 次提交
    • L
      ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users · 6b11d1d6
      Lv Zheng 提交于
      This patch removes the users of the deprectated APIs:
       acpi_get_table_with_size()
       early_acpi_os_unmap_memory()
      The following APIs should be used instead of:
       acpi_get_table()
       acpi_put_table()
      
      The deprecated APIs are invented to be a replacement of acpi_get_table()
      during the early stage so that the early mapped pointer will not be stored
      in ACPICA core and thus the late stage acpi_get_table() won't return a
      wrong pointer. The mapping size is returned just because it is required by
      early_acpi_os_unmap_memory() to unmap the pointer during early stage.
      
      But as the mapping size equals to the acpi_table_header.length
      (see acpi_tb_init_table_descriptor() and acpi_tb_validate_table()), when
      such a convenient result is returned, driver code will start to use it
      instead of accessing acpi_table_header to obtain the length.
      
      Thus this patch cleans up the drivers by replacing returned table size with
      acpi_table_header.length, and should be a no-op.
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6b11d1d6
  25. 07 12月, 2016 3 次提交
    • D
      tools/testing/nvdimm: unit test acpi_nfit_ctl() · a7de92da
      Dan Williams 提交于
      A recent flurry of bug discoveries in the nfit driver's DSM marshalling
      routine has highlighted the fact that we do not have unit test coverage
      for this routine. Add a self-test of acpi_nfit_ctl() routine before
      probing the "nfit_test.0" device. This mocks stimulus to acpi_nfit_ctl()
      and if any of the tests fail "nfit_test.0" will be unavailable causing
      the rest of the tests to not run / fail.
      
      This unit test will also be a place to land reproductions of quirky BIOS
      behavior discovered in the field and ensure the kernel does not regress
      against implementations it has seen in practice.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      a7de92da
    • D
      acpi, nfit: fix bus vs dimm confusion in xlat_status · d6eb270c
      Dan Williams 提交于
      Given dimms and bus commands share the same command number space we need
      to be careful that we are translating status in the correct context.
      Otherwise we can, for example, fail an ND_CMD_GET_CONFIG_SIZE command
      because max_xfer is zero. It fails because that condition erroneously
      correlates with the 'cleared == 0' failure of ND_CMD_CLEAR_ERROR.
      
      Cc: <stable@vger.kernel.org>
      Fixes: aef25338 ("libnvdimm, nfit: centralize command status translation")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d6eb270c
    • D
      acpi, nfit: validate ars_status output buffer size · 82aa37cf
      Dan Williams 提交于
      If an ARS Status command returns truncated output, do not process
      partial records or otherwise consume non-status fields.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 0caeef63 ("libnvdimm: Add a poison list and export badblocks")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      82aa37cf