1. 30 4月, 2020 2 次提交
  2. 15 1月, 2020 1 次提交
    • D
      acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node · a0a4e71f
      Dan Williams 提交于
      commit 8fc5c73554db0ac18c0c6ac5b2099ab917f83bdf upstream
      
      Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware
      Interface Table), is the first known instance of a memory range
      described by a unique "target" proximity domain. Where "initiator" and
      "target" proximity domains is an approach that the ACPI HMAT
      (Heterogeneous Memory Attributes Table) uses to described the unique
      performance properties of a memory range relative to a given initiator
      (e.g. CPU or DMA device).
      
      Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y
      char-device follows the traditional notion of 'numa-node' where the
      attribute conveys the closest online numa-node. That numa-node attribute
      is useful for cpu-binding and memory-binding processes *near* the
      device. However, when the memory range backing a 'pmem', or 'dax' device
      is onlined (memory hot-add) the memory-only-numa-node representing that
      address needs to be differentiated from the set of online nodes. In
      other words, the numa-node association of the device depends on whether
      you can bind processes *near* the cpu-numa-node in the offline
      device-case, or bind process *on* the memory-range directly after the
      backing address range is onlined.
      
      Allow for the case that platform firmware describes persistent memory
      with a unique proximity domain, i.e. when it is distinct from the
      proximity of DRAM and CPUs that are on the same socket. Plumb the Linux
      numa-node translation of that proximity through the libnvdimm region
      device to namespaces that are in device-dax mode. With this in place the
      proposed kmem driver [1] can optionally discover a unique numa-node
      number for the address range as it transitions the memory from an
      offline state managed by a device-driver to an online memory range
      managed by the core-mm.
      
      [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.comReported-by: NFan Du <fan.du@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Oliver O'Halloran" <oohall@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      [yshi: Removed PowerPC stuff which is not applicable 4.19]
      Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: NGavin Shan <shan.gavin@linux.alibaba.com>
      a0a4e71f
  3. 12 10月, 2019 1 次提交
    • A
      libnvdimm/region: Initialize bad block for volatile namespaces · 2e93d24a
      Aneesh Kumar K.V 提交于
      [ Upstream commit c42adf87e4e7ed77f6ffe288dc90f980d07d68df ]
      
      We do check for a bad block during namespace init and that use
      region bad block list. We need to initialize the bad block
      for volatile regions for this to work. We also observe a lockdep
      warning as below because the lock is not initialized correctly
      since we skip bad block init for volatile regions.
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc1-15699-g3dee241c937e #149
       Call Trace:
       [c0000000f95cb250] [c00000000147dd84] dump_stack+0xe8/0x164 (unreliable)
       [c0000000f95cb2a0] [c00000000022ccd8] register_lock_class+0x308/0xa60
       [c0000000f95cb3a0] [c000000000229cc0] __lock_acquire+0x170/0x1ff0
       [c0000000f95cb4c0] [c00000000022c740] lock_acquire+0x220/0x270
       [c0000000f95cb580] [c000000000a93230] badblocks_check+0xc0/0x290
       [c0000000f95cb5f0] [c000000000d97540] nd_pfn_validate+0x5c0/0x7f0
       [c0000000f95cb6d0] [c000000000d98300] nd_dax_probe+0xd0/0x1f0
       [c0000000f95cb760] [c000000000d9b66c] nd_pmem_probe+0x10c/0x160
       [c0000000f95cb790] [c000000000d7f5ec] nvdimm_bus_probe+0x10c/0x240
       [c0000000f95cb820] [c000000000d0f844] really_probe+0x254/0x4e0
       [c0000000f95cb8b0] [c000000000d0fdfc] driver_probe_device+0x16c/0x1e0
       [c0000000f95cb930] [c000000000d10238] device_driver_attach+0x68/0xa0
       [c0000000f95cb970] [c000000000d1040c] __driver_attach+0x19c/0x1c0
       [c0000000f95cb9f0] [c000000000d0c4c4] bus_for_each_dev+0x94/0x130
       [c0000000f95cba50] [c000000000d0f014] driver_attach+0x34/0x50
       [c0000000f95cba70] [c000000000d0e208] bus_add_driver+0x178/0x2f0
       [c0000000f95cbb00] [c000000000d117c8] driver_register+0x108/0x170
       [c0000000f95cbb70] [c000000000d7edb0] __nd_driver_register+0xe0/0x100
       [c0000000f95cbbd0] [c000000001a6baa4] nd_pmem_driver_init+0x34/0x48
       [c0000000f95cbbf0] [c0000000000106f4] do_one_initcall+0x1d4/0x4b0
       [c0000000f95cbcd0] [c0000000019f499c] kernel_init_freeable+0x544/0x65c
       [c0000000f95cbdb0] [c000000000010d6c] kernel_init+0x2c/0x180
       [c0000000f95cbe20] [c00000000000b954] ret_from_kernel_thread+0x5c/0x68
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Link: https://lore.kernel.org/r/20190919083355.26340-1-aneesh.kumar@linux.ibm.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2e93d24a
  4. 09 8月, 2019 1 次提交
    • D
      libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock · 2364ed0d
      Dan Williams 提交于
      commit ca6bf264f6d856f959c4239cda1047b587745c67 upstream.
      
      A multithreaded namespace creation/destruction stress test currently
      deadlocks with the following lockup signature:
      
          INFO: task ndctl:2924 blocked for more than 122 seconds.
                Tainted: G           OE     5.2.0-rc4+ #3382
          "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
          ndctl           D    0  2924   1176 0x00000000
          Call Trace:
           ? __schedule+0x27e/0x780
           schedule+0x30/0xb0
           wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
           ? finish_wait+0x80/0x80
           uuid_store+0xe6/0x2e0 [libnvdimm]
           kernfs_fop_write+0xf0/0x1a0
           vfs_write+0xb7/0x1b0
           ksys_write+0x5c/0xd0
           do_syscall_64+0x60/0x240
      
           INFO: task ndctl:2923 blocked for more than 122 seconds.
                 Tainted: G           OE     5.2.0-rc4+ #3382
           "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
           ndctl           D    0  2923   1175 0x00000000
           Call Trace:
            ? __schedule+0x27e/0x780
            ? __mutex_lock+0x489/0x910
            schedule+0x30/0xb0
            schedule_preempt_disabled+0x11/0x20
            __mutex_lock+0x48e/0x910
            ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
            ? __lock_acquire+0x23f/0x1710
            ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
            nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
            __dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
            ? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
            dax_pmem_probe+0xc/0x20 [dax_pmem]
            nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
            really_probe+0xef/0x390
            driver_probe_device+0xb4/0x100
      
      In this sequence an 'nd_dax' device is being probed and trying to take
      the lock on its backing namespace to validate that the 'nd_dax' device
      indeed has exclusive access to the backing namespace. Meanwhile, another
      thread is trying to update the uuid property of that same backing
      namespace. So one thread is in the probe path trying to acquire the
      lock, and the other thread has acquired the lock and tries to flush the
      probe path.
      
      Fix this deadlock by not holding the namespace device_lock over the
      wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
      the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
      subsequently dropped internally to wait_nvdimm_bus_probe_idle().
      
      Cc: <stable@vger.kernel.org>
      Fixes: bf9bccc1 ("libnvdimm: pmem label sets and namespace instantiation")
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Tested-by: NJane Chu <jane.chu@oracle.com>
      Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2364ed0d
  5. 13 12月, 2018 1 次提交
  6. 14 11月, 2018 1 次提交
    • D
      libnvdimm, region: Fail badblocks listing for inactive regions · 8f696986
      Dan Williams 提交于
      commit 5d394eee upstream.
      
      While experimenting with region driver loading the following backtrace
      was triggered:
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       [..]
       Call Trace:
        dump_stack+0x85/0xcb
        register_lock_class+0x571/0x580
        ? __lock_acquire+0x2ba/0x1310
        ? kernfs_seq_start+0x2a/0x80
        __lock_acquire+0xd4/0x1310
        ? dev_attr_show+0x1c/0x50
        ? __lock_acquire+0x2ba/0x1310
        ? kernfs_seq_start+0x2a/0x80
        ? lock_acquire+0x9e/0x1a0
        lock_acquire+0x9e/0x1a0
        ? dev_attr_show+0x1c/0x50
        badblocks_show+0x70/0x190
        ? dev_attr_show+0x1c/0x50
        dev_attr_show+0x1c/0x50
      
      This results from a missing successful call to devm_init_badblocks()
      from nd_region_probe(). Block attempts to show badblocks while the
      region is not enabled.
      
      Fixes: 6a6bef90 ("libnvdimm: add mechanism to publish badblocks...")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f696986
  7. 26 7月, 2018 2 次提交
  8. 07 6月, 2018 1 次提交
    • R
      libnvdimm, pmem: Do not flush power-fail protected CPU caches · 546eb031
      Ross Zwisler 提交于
      This commit:
      
      5fdf8e5b ("libnvdimm: re-enable deep flush for pmem devices via fsync()")
      
      intended to make sure that deep flush was always available even on
      platforms which support a power-fail protected CPU cache.  An unintended
      side effect of this change was that we also lost the ability to skip
      flushing CPU caches on those power-fail protected CPU cache.
      
      Fix this by skipping the low level cache flushing in dax_flush() if we have
      CPU caches which are power-fail protected.  The user can still override this
      behavior by manually setting the write_cache state of a namespace.  See
      libndctl's ndctl_namespace_write_cache_is_enabled(),
      ndctl_namespace_enable_write_cache() and
      ndctl_namespace_disable_write_cache() functions.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 5fdf8e5b ("libnvdimm: re-enable deep flush for pmem devices via fsync()")
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      546eb031
  9. 07 4月, 2018 1 次提交
  10. 04 4月, 2018 1 次提交
  11. 22 3月, 2018 2 次提交
    • D
      libnvdimm, nfit: fix persistence domain reporting · fe9a552e
      Dan Williams 提交于
      The persistence domain is a point in the platform where once writes
      reach that destination the platform claims it will make them persistent
      relative to power loss. In the ACPI NFIT this is currently communicated
      as 2 bits in the "NFIT - Platform Capabilities Structure". The bits
      comprise a hierarchy, i.e. bit0 "CPU Cache Flush to NVDIMM Durability on
      Power Loss Capable" implies bit1 "Memory Controller Flush to NVDIMM
      Durability on Power Loss Capable".
      
      Commit 96c3a239 "libnvdimm: expose platform persistence attr..."
      shows the persistence domain as flags, but it's really an enumerated
      hierarchy.
      
      Fix this newly introduced user ABI to show the closest available
      persistence domain before userspace develops dependencies on seeing, or
      needing to develop code to tolerate, the raw NFIT flags communicated
      through the libnvdimm-generic region attribute.
      
      Fixes: 96c3a239 ("libnvdimm: expose platform persistence attr...")
      Reviewed-by: NDave Jiang <dave.jiang@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      fe9a552e
    • D
      libnvdimm, region: hide persistence_domain when unknown · 896196dc
      Dan Williams 提交于
      Similar to other region attributes, do not emit the persistence_domain
      attribute if its contents are empty.
      
      Fixes: 96c3a239 ("libnvdimm: expose platform persistence attr...")
      Cc: Dave Jiang <dave.jiang@intel.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      896196dc
  12. 02 2月, 2018 1 次提交
  13. 29 9月, 2017 1 次提交
  14. 05 8月, 2017 1 次提交
  15. 30 6月, 2017 2 次提交
  16. 28 6月, 2017 3 次提交
    • D
      libnvdimm, nfit: enable support for volatile ranges · c9e582aa
      Dan Williams 提交于
      Allow volatile nfit ranges to participate in all the same infrastructure
      provided for persistent memory regions. A resulting resulting namespace
      device will still be called "pmem", but the parent region type will be
      "nd_volatile". This is in preparation for disabling the dax ->flush()
      operation in the pmem driver when it is hosted on a volatile range.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c9e582aa
    • D
      libnvdimm, pmem: fix persistence warning · c00b396e
      Dan Williams 提交于
      The pmem driver assumes if platform firmware describes the memory
      devices associated with a persistent memory range and
      CONFIG_ARCH_HAS_PMEM_API=y that it has all the mechanism necessary to
      flush data to a power-fail safe zone. We warn if the firmware does not
      describe memory devices, but we also need to warn if the architecture
      does not claim pmem support.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c00b396e
    • D
      x86, libnvdimm, pmem: remove global pmem api · ca6a4657
      Dan Williams 提交于
      Now that all callers of the pmem api have been converted to dax helpers that
      call back to the pmem driver, we can remove include/linux/pmem.h and
      asm/pmem.h.
      
      Cc: <x86@kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ca6a4657
  17. 16 6月, 2017 1 次提交
  18. 10 6月, 2017 1 次提交
    • D
      x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations · 0aed55af
      Dan Williams 提交于
      The pmem driver has a need to transfer data with a persistent memory
      destination and be able to rely on the fact that the destination writes are not
      cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
      (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
      to ensure data-writes have reached a power-fail-safe zone in the platform. The
      fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
      around and fence previous writes with an "sfence".
      
      Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
      memcpy_flushcache, that guarantee that the destination buffer is not dirty in
      the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
      will be used to replace the "pmem api" (include/linux/pmem.h +
      arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
      and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
      config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
      otherwise.
      
      This is meant to satisfy the concern from Linus that if a driver wants to do
      something beyond the normal nocache semantics it should be something private to
      that driver [1], and Al's concern that anything uaccess related belongs with
      the rest of the uaccess code [2].
      
      The first consumer of this interface is a new 'copy_from_iter' dax operation so
      that pmem can inject cache maintenance operations without imposing this
      overhead on other dax-capable drivers.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0aed55af
  19. 05 5月, 2017 1 次提交
  20. 30 4月, 2017 1 次提交
    • D
      libnvdimm: rework region badblocks clearing · 23f49844
      Dan Williams 提交于
      Toshi noticed that the new support for a region-level badblocks missed
      the case where errors are cleared due to BTT I/O.
      
      An initial attempt to fix this ran into a "sleeping while atomic"
      warning due to taking the nvdimm_bus_lock() in the BTT I/O path to
      satisfy the locking requirements of __nvdimm_bus_badblocks_clear().
      However, that lock is not needed since we are not acting on any data that
      is subject to change under that lock. The badblocks instance has its own
      internal lock to handle mutations of the error list.
      
      So, in order to make it clear that we are just acting on region devices,
      rename __nvdimm_bus_badblocks_clear() to nvdimm_clear_badblocks_regions().
      Eliminate the lock and consolidate all support routines for the new
      nvdimm_account_cleared_poison() in drivers/nvdimm/bus.c. Finally, to the
      opportunity to cleanup to some unnecessary casts, make the calling
      convention of nvdimm_clear_badblocks_regions() clearer by replacing struct
      resource with the minimal struct clear_badblocks_context, and use the
      DEVICE_ATTR macro.
      
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Reported-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      23f49844
  21. 29 4月, 2017 1 次提交
    • D
      libnvdimm, region: sysfs trigger for nvdimm_flush() · ab630891
      Dan Williams 提交于
      The nvdimm_flush() mechanism helps to reduce the impact of an ADR
      (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing
      platform WPQ (write-pending-queue) buffers when power is removed. The
      nvdimm_flush() mechanism performs that same function on-demand.
      
      When a pmem namespace is associated with a block device, an
      nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH
      request. These requests are typically associated with filesystem
      metadata updates. However, when a namespace is in device-dax mode,
      userspace (think database metadata) needs another path to perform the
      same flushing. In other words this is not required to make data
      persistent, but in the case of metadata it allows for a smaller failure
      domain in the unlikely event of an ADR failure.
      
      The new 'deep_flush' attribute is visible when the individual DIMMs
      backing a given interleave-set are described by platform firmware. In
      ACPI terms this is "NVDIMM Region Mapping Structures" and associated
      "Flush Hint Address Structures". Reads return "1" if the region supports
      triggering WPQ flushes on all DIMMs. Reads return "0" the flush
      operation is a platform nop, and in that case the attribute is
      read-only.
      
      Why sysfs and not an ioctl? An ioctl requires establishing a new
      ioctl function number space for device-dax. Given that this would be
      called on a device-dax fd an application could be forgiven for
      accidentally calling this on a filesystem-dax fd. Placing this interface
      in libnvdimm sysfs removes that potential for collision with a
      filesystem ioctl, and it keeps ioctls out of the generic device-dax
      implementation.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ab630891
  22. 25 4月, 2017 1 次提交
    • D
      libnvdimm, region: fix flush hint detection crash · bc042fdf
      Dan Williams 提交于
      In the case where a dimm does not have any associated flush hints the
      ndrd->flush_wpq array may be uninitialized leading to crashes with the
      following signature:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
       IP: region_visible+0x10f/0x160 [libnvdimm]
      
       Call Trace:
        internal_create_group+0xbe/0x2f0
        sysfs_create_groups+0x40/0x80
        device_add+0x2d8/0x650
        nd_async_device_register+0x12/0x40 [libnvdimm]
        async_run_entry_fn+0x39/0x170
        process_one_work+0x212/0x6c0
        ? process_one_work+0x197/0x6c0
        worker_thread+0x4e/0x4a0
        kthread+0x10c/0x140
        ? process_one_work+0x6c0/0x6c0
        ? kthread_create_on_node+0x60/0x60
        ret_from_fork+0x31/0x40
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Fixes: f284a4f2 ("libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      bc042fdf
  23. 13 4月, 2017 2 次提交
  24. 01 3月, 2017 1 次提交
    • D
      nfit, libnvdimm: fix interleave set cookie calculation · 86ef58a4
      Dan Williams 提交于
      The interleave-set cookie is a sum that sanity checks the composition of
      an interleave set has not changed from when the namespace was initially
      created.  The checksum is calculated by sorting the DIMMs by their
      location in the interleave-set. The comparison for the sort must be
      64-bit wide, not byte-by-byte as performed by memcmp() in the broken
      case.
      
      Fix the implementation to accept correct cookie values in addition to
      the Linux "memcmp" order cookies, but only allow correct cookies to be
      generated going forward. It does mean that namespaces created by
      third-party-tooling, or created by newer kernels with this fix, will not
      validate on older kernels. However, there are a couple mitigating
      conditions:
      
          1/ platforms with namespace-label capable NVDIMMs are not widely
             available.
      
          2/ interleave-sets with a single-dimm are by definition not affected
             (nothing to sort). This covers the QEMU-KVM NVDIMM emulation case.
      
      The cookie stored in the namespace label will be fixed by any write the
      namespace label, the most straightforward way to achieve this is to
      write to the "alt_name" attribute of a namespace in sysfs.
      
      Cc: <stable@vger.kernel.org>
      Fixes: eaf96153 ("libnvdimm, nfit: add interleave-set state-tracking infrastructure")
      Reported-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com>
      Tested-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      86ef58a4
  25. 16 12月, 2016 1 次提交
  26. 08 10月, 2016 2 次提交
    • D
      libnvdimm, namespace: allow creation of multiple pmem-namespaces per region · 98a29c39
      Dan Williams 提交于
      Similar to BLK regions, publish new seed namespace devices to allow
      unused PMEM region capacity to be consumed by additional namespaces.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      98a29c39
    • D
      libnvdimm, region: update nd_region_available_dpa() for multi-pmem support · a1f3e4d6
      Dan Williams 提交于
      The free dpa (dimm-physical-address) space calculation reports how much
      free space is available with consideration for aliased BLK + PMEM
      regions.  Recall that BLK capacity is allocated from high addresses and
      PMEM is allocated from low addresses in their respective regions.
      
      nd_region_available_dpa() accounts for the fact that the largest
      encroachment (lowest starting address) into PMEM capacity by a BLK
      allocation limits the available capacity to that point, regardless if
      there is BLK allocation hole at a higher address.  Similarly, for the
      multi-pmem case we need to track the largest encroachment (highest
       ending address) of a PMEM allocation in BLK capacity regardless of
      whether there is an allocation hole that a BLK allocation could fill at
      a lower address.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      a1f3e4d6
  27. 01 10月, 2016 3 次提交
  28. 25 9月, 2016 1 次提交
  29. 19 9月, 2016 1 次提交
  30. 12 7月, 2016 1 次提交
    • D
      libnvdimm: cycle flush hints · 0c27af60
      Dan Williams 提交于
      When the NFIT provides multiple flush hint addresses per-dimm it is
      expressing that the platform is capable of processing multiple flush
      requests in parallel.  There is some fixed cost per flush request, let
      the cost be shared in parallel on multiple cpus.
      
      Since there may not be enough flush hint addresses for each cpu to have
      one, keep a per-cpu index of the last used hint, hash it with current
      pid, and assume that access pattern and scheduler randomness will keep
      the flush-hint usage somewhat staggered across cpus.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0c27af60