- 30 4月, 2020 2 次提交
-
-
由 Pankaj Gupta 提交于
fix #27138800 commit fefc1d97fa4b5e016bbe15447dc3edcd9e1bcb9f upstream. This patch adds 'DAXDEV_SYNC' flag which is set for nd_region doing synchronous flush. This later is used to disable MAP_SYNC functionality for ext4 & xfs filesystem for devices don't support synchronous flush. Signed-off-by: NPankaj Gupta <pagupta@redhat.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
由 Pankaj Gupta 提交于
fix #27138800 commit c5d4355d10d414a96ca870b731756b89d068d57a upstream. This patch adds functionality to perform flush from guest to host over VIRTIO. We are registering a callback based on 'nd_region' type. virtio_pmem driver requires this special flush function. For rest of the region types we are registering existing flush function. Report error returned by host fsync failure to userspace. Signed-off-by: NPankaj Gupta <pagupta@redhat.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
- 15 1月, 2020 1 次提交
-
-
由 Dan Williams 提交于
commit 8fc5c73554db0ac18c0c6ac5b2099ab917f83bdf upstream Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware Interface Table), is the first known instance of a memory range described by a unique "target" proximity domain. Where "initiator" and "target" proximity domains is an approach that the ACPI HMAT (Heterogeneous Memory Attributes Table) uses to described the unique performance properties of a memory range relative to a given initiator (e.g. CPU or DMA device). Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y char-device follows the traditional notion of 'numa-node' where the attribute conveys the closest online numa-node. That numa-node attribute is useful for cpu-binding and memory-binding processes *near* the device. However, when the memory range backing a 'pmem', or 'dax' device is onlined (memory hot-add) the memory-only-numa-node representing that address needs to be differentiated from the set of online nodes. In other words, the numa-node association of the device depends on whether you can bind processes *near* the cpu-numa-node in the offline device-case, or bind process *on* the memory-range directly after the backing address range is onlined. Allow for the case that platform firmware describes persistent memory with a unique proximity domain, i.e. when it is distinct from the proximity of DRAM and CPUs that are on the same socket. Plumb the Linux numa-node translation of that proximity through the libnvdimm region device to namespaces that are in device-dax mode. With this in place the proposed kmem driver [1] can optionally discover a unique numa-node number for the address range as it transitions the memory from an offline state managed by a device-driver to an online memory range managed by the core-mm. [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.comReported-by: NFan Du <fan.du@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> [yshi: Removed PowerPC stuff which is not applicable 4.19] Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com> Reviewed-by: NGavin Shan <shan.gavin@linux.alibaba.com>
-
- 12 10月, 2019 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
[ Upstream commit c42adf87e4e7ed77f6ffe288dc90f980d07d68df ] We do check for a bad block during namespace init and that use region bad block list. We need to initialize the bad block for volatile regions for this to work. We also observe a lockdep warning as below because the lock is not initialized correctly since we skip bad block init for volatile regions. INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc1-15699-g3dee241c937e #149 Call Trace: [c0000000f95cb250] [c00000000147dd84] dump_stack+0xe8/0x164 (unreliable) [c0000000f95cb2a0] [c00000000022ccd8] register_lock_class+0x308/0xa60 [c0000000f95cb3a0] [c000000000229cc0] __lock_acquire+0x170/0x1ff0 [c0000000f95cb4c0] [c00000000022c740] lock_acquire+0x220/0x270 [c0000000f95cb580] [c000000000a93230] badblocks_check+0xc0/0x290 [c0000000f95cb5f0] [c000000000d97540] nd_pfn_validate+0x5c0/0x7f0 [c0000000f95cb6d0] [c000000000d98300] nd_dax_probe+0xd0/0x1f0 [c0000000f95cb760] [c000000000d9b66c] nd_pmem_probe+0x10c/0x160 [c0000000f95cb790] [c000000000d7f5ec] nvdimm_bus_probe+0x10c/0x240 [c0000000f95cb820] [c000000000d0f844] really_probe+0x254/0x4e0 [c0000000f95cb8b0] [c000000000d0fdfc] driver_probe_device+0x16c/0x1e0 [c0000000f95cb930] [c000000000d10238] device_driver_attach+0x68/0xa0 [c0000000f95cb970] [c000000000d1040c] __driver_attach+0x19c/0x1c0 [c0000000f95cb9f0] [c000000000d0c4c4] bus_for_each_dev+0x94/0x130 [c0000000f95cba50] [c000000000d0f014] driver_attach+0x34/0x50 [c0000000f95cba70] [c000000000d0e208] bus_add_driver+0x178/0x2f0 [c0000000f95cbb00] [c000000000d117c8] driver_register+0x108/0x170 [c0000000f95cbb70] [c000000000d7edb0] __nd_driver_register+0xe0/0x100 [c0000000f95cbbd0] [c000000001a6baa4] nd_pmem_driver_init+0x34/0x48 [c0000000f95cbbf0] [c0000000000106f4] do_one_initcall+0x1d4/0x4b0 [c0000000f95cbcd0] [c0000000019f499c] kernel_init_freeable+0x544/0x65c [c0000000f95cbdb0] [c000000000010d6c] kernel_init+0x2c/0x180 [c0000000f95cbe20] [c00000000000b954] ret_from_kernel_thread+0x5c/0x68 Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20190919083355.26340-1-aneesh.kumar@linux.ibm.comSigned-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 09 8月, 2019 1 次提交
-
-
由 Dan Williams 提交于
commit ca6bf264f6d856f959c4239cda1047b587745c67 upstream. A multithreaded namespace creation/destruction stress test currently deadlocks with the following lockup signature: INFO: task ndctl:2924 blocked for more than 122 seconds. Tainted: G OE 5.2.0-rc4+ #3382 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ndctl D 0 2924 1176 0x00000000 Call Trace: ? __schedule+0x27e/0x780 schedule+0x30/0xb0 wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm] ? finish_wait+0x80/0x80 uuid_store+0xe6/0x2e0 [libnvdimm] kernfs_fop_write+0xf0/0x1a0 vfs_write+0xb7/0x1b0 ksys_write+0x5c/0xd0 do_syscall_64+0x60/0x240 INFO: task ndctl:2923 blocked for more than 122 seconds. Tainted: G OE 5.2.0-rc4+ #3382 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ndctl D 0 2923 1175 0x00000000 Call Trace: ? __schedule+0x27e/0x780 ? __mutex_lock+0x489/0x910 schedule+0x30/0xb0 schedule_preempt_disabled+0x11/0x20 __mutex_lock+0x48e/0x910 ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm] ? __lock_acquire+0x23f/0x1710 ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm] nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm] __dax_pmem_probe+0x5e/0x210 [dax_pmem_core] ? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm] dax_pmem_probe+0xc/0x20 [dax_pmem] nvdimm_bus_probe+0x90/0x2c0 [libnvdimm] really_probe+0xef/0x390 driver_probe_device+0xb4/0x100 In this sequence an 'nd_dax' device is being probed and trying to take the lock on its backing namespace to validate that the 'nd_dax' device indeed has exclusive access to the backing namespace. Meanwhile, another thread is trying to update the uuid property of that same backing namespace. So one thread is in the probe path trying to acquire the lock, and the other thread has acquired the lock and tries to flush the probe path. Fix this deadlock by not holding the namespace device_lock over the wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and subsequently dropped internally to wait_nvdimm_bus_probe_idle(). Cc: <stable@vger.kernel.org> Fixes: bf9bccc1 ("libnvdimm: pmem label sets and namespace instantiation") Cc: Vishal Verma <vishal.l.verma@intel.com> Tested-by: NJane Chu <jane.chu@oracle.com> Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 13 12月, 2018 1 次提交
-
-
由 Dan Williams 提交于
commit ae86cbfef3818300f1972e52f67a93211acb0e24 upstream. Commit cfe30b87 "libnvdimm, pmem: adjust for section collisions with 'System RAM'" enabled Linux to workaround occasions where platform firmware arranges for "System RAM" and "Persistent Memory" to collide within a single section boundary. Unfortunately, as reported in this issue [1], platform firmware can inflict the same collision between persistent memory regions. The approach of interrogating iomem_resource does not work in this case because platform firmware may merge multiple regions into a single iomem_resource range. Instead provide a method to interrogate regions that share the same parent bus. This is a stop-gap until the core-MM can grow support for hotplug on sub-section boundaries. [1]: https://github.com/pmem/ndctl/issues/76 Fixes: cfe30b87 ("libnvdimm, pmem: adjust for section collisions with...") Cc: <stable@vger.kernel.org> Reported-by: NPatrick Geary <patrickg@supermicro.com> Tested-by: NPatrick Geary <patrickg@supermicro.com> Reviewed-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 14 11月, 2018 1 次提交
-
-
由 Dan Williams 提交于
commit 5d394eee upstream. While experimenting with region driver loading the following backtrace was triggered: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. [..] Call Trace: dump_stack+0x85/0xcb register_lock_class+0x571/0x580 ? __lock_acquire+0x2ba/0x1310 ? kernfs_seq_start+0x2a/0x80 __lock_acquire+0xd4/0x1310 ? dev_attr_show+0x1c/0x50 ? __lock_acquire+0x2ba/0x1310 ? kernfs_seq_start+0x2a/0x80 ? lock_acquire+0x9e/0x1a0 lock_acquire+0x9e/0x1a0 ? dev_attr_show+0x1c/0x50 badblocks_show+0x70/0x190 ? dev_attr_show+0x1c/0x50 dev_attr_show+0x1c/0x50 This results from a missing successful call to devm_init_badblocks() from nd_region_probe(). Block attempts to show badblocks while the region is not enabled. Fixes: 6a6bef90 ("libnvdimm: add mechanism to publish badblocks...") Cc: <stable@vger.kernel.org> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NDave Jiang <dave.jiang@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 26 7月, 2018 2 次提交
-
-
由 Keith Busch 提交于
The 'available_size' attribute showing the combined total of all unallocated space isn't always useful to know how large of a namespace a user may be able to allocate if the region is fragmented. This patch will export the largest extent of unallocated space that may be allocated to create a new namespace. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDave Jiang <dave.jiang@intel.com>
-
由 Keith Busch 提交于
This patch will find the max contiguous area to determine the largest pmem namespace size that can be created. If the requested size exceeds the largest available, ENOSPC error will be returned. This fixes the allocation underrun error and wrong error return code that have otherwise been observed as the following kernel warning: WARNING: CPU: <CPU> PID: <PID> at drivers/nvdimm/namespace_devs.c:913 size_store Fixes: a1f3e4d6 ("libnvdimm, region: update nd_region_available_dpa() for multi-pmem support") Cc: <stable@vger.kernel.org> Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDave Jiang <dave.jiang@intel.com>
-
- 07 6月, 2018 1 次提交
-
-
由 Ross Zwisler 提交于
This commit: 5fdf8e5b ("libnvdimm: re-enable deep flush for pmem devices via fsync()") intended to make sure that deep flush was always available even on platforms which support a power-fail protected CPU cache. An unintended side effect of this change was that we also lost the ability to skip flushing CPU caches on those power-fail protected CPU cache. Fix this by skipping the low level cache flushing in dax_flush() if we have CPU caches which are power-fail protected. The user can still override this behavior by manually setting the write_cache state of a namespace. See libndctl's ndctl_namespace_write_cache_is_enabled(), ndctl_namespace_enable_write_cache() and ndctl_namespace_disable_write_cache() functions. Cc: <stable@vger.kernel.org> Fixes: 5fdf8e5b ("libnvdimm: re-enable deep flush for pmem devices via fsync()") Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 07 4月, 2018 1 次提交
-
-
由 Oliver O'Halloran 提交于
We want to be able to cross reference the region and bus devices with the device tree node that they were spawned from. libNVDIMM handles creating the actual devices for these internally, so we need to pass in a pointer to the relevant node in the descriptor. Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Acked-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 04 4月, 2018 1 次提交
-
-
由 Dan Williams 提交于
For debug, it is useful for bus providers to be able to retrieve the 'struct device' associated with an nd_region instance that it registered. We already have to_nd_region() to perform the reverse cast operation, in fact its duplicate declaration can be removed from the private drivers/nvdimm/nd.h header. Reviewed-by: NDave Jiang <dave.jiang@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 22 3月, 2018 2 次提交
-
-
由 Dan Williams 提交于
The persistence domain is a point in the platform where once writes reach that destination the platform claims it will make them persistent relative to power loss. In the ACPI NFIT this is currently communicated as 2 bits in the "NFIT - Platform Capabilities Structure". The bits comprise a hierarchy, i.e. bit0 "CPU Cache Flush to NVDIMM Durability on Power Loss Capable" implies bit1 "Memory Controller Flush to NVDIMM Durability on Power Loss Capable". Commit 96c3a239 "libnvdimm: expose platform persistence attr..." shows the persistence domain as flags, but it's really an enumerated hierarchy. Fix this newly introduced user ABI to show the closest available persistence domain before userspace develops dependencies on seeing, or needing to develop code to tolerate, the raw NFIT flags communicated through the libnvdimm-generic region attribute. Fixes: 96c3a239 ("libnvdimm: expose platform persistence attr...") Reviewed-by: NDave Jiang <dave.jiang@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Similar to other region attributes, do not emit the persistence_domain attribute if its contents are empty. Fixes: 96c3a239 ("libnvdimm: expose platform persistence attr...") Cc: Dave Jiang <dave.jiang@intel.com> Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 02 2月, 2018 1 次提交
-
-
由 Dave Jiang 提交于
Providing a sysfs attribute for nd_region that shows the persistence capabilities for the platform. Signed-off-by: NDave Jiang <dave.jiang@intel.com> Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
-
- 29 9月, 2017 1 次提交
-
-
由 Dan Williams 提交于
For the same reason that /proc/iomem returns 0's for non-root readers and acpi tables are root-only, make the 'resource' attribute for region devices only readable by root. Otherwise we disclose physical address information. Fixes: 802f4be6 ("libnvdimm: Add 'resource' sysfs attribute to regions") Cc: <stable@vger.kernel.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reported-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 05 8月, 2017 1 次提交
-
-
由 Dan Williams 提交于
It is useful to be able to know the position of a DIMM in an interleave-set. Consider the case where the order of the DIMMs changes causing a namespace to be invalidated because the interleave-set cookie no longer matches. If the before and after state of each DIMM position is known this state debugged by the system owner. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 30 6月, 2017 2 次提交
-
-
由 Dan Williams 提交于
This state is already visible by userspace since the BLK region will not be enabled, and it is otherwise benign as it usually indicates that the DIMM is not configured. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
The pmem driver attaches to both persistent and volatile memory ranges advertised by the ACPI NFIT. When the region is volatile it is redundant to spend cycles flushing caches at fsync(). Check if the hosting region is volatile and do not set dax_write_cache() if it is. Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 28 6月, 2017 3 次提交
-
-
由 Dan Williams 提交于
Allow volatile nfit ranges to participate in all the same infrastructure provided for persistent memory regions. A resulting resulting namespace device will still be called "pmem", but the parent region type will be "nd_volatile". This is in preparation for disabling the dax ->flush() operation in the pmem driver when it is hosted on a volatile range. Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
The pmem driver assumes if platform firmware describes the memory devices associated with a persistent memory range and CONFIG_ARCH_HAS_PMEM_API=y that it has all the mechanism necessary to flush data to a power-fail safe zone. We warn if the firmware does not describe memory devices, but we also need to warn if the architecture does not claim pmem support. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Now that all callers of the pmem api have been converted to dax helpers that call back to the pmem driver, we can remove include/linux/pmem.h and asm/pmem.h. Cc: <x86@kernel.org> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Oliver O'Halloran <oohall@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 16 6月, 2017 1 次提交
-
-
由 Dan Williams 提交于
The interleave-set-cookie algorithm is extended to incorporate all the same components that are used to generate an nvdimm unique-id. For backwards compatibility we still maintain the old v1.1 definition. Reported-by: NNicholas Moulin <nicholas.w.moulin@intel.com> Reported-by: NKaushik Kanetkar <kaushik.a.kanetkar@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 10 6月, 2017 1 次提交
-
-
由 Dan Williams 提交于
The pmem driver has a need to transfer data with a persistent memory destination and be able to rely on the fact that the destination writes are not cached. It is sufficient for the writes to be flushed to a cpu-store-buffer (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync() to ensure data-writes have reached a power-fail-safe zone in the platform. The fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn around and fence previous writes with an "sfence". Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and memcpy_flushcache, that guarantee that the destination buffer is not dirty in the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines will be used to replace the "pmem api" (include/linux/pmem.h + arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache() and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE config symbol, and fallback to copy_from_iter_nocache() and plain memcpy() otherwise. This is meant to satisfy the concern from Linus that if a driver wants to do something beyond the normal nocache semantics it should be something private to that driver [1], and Al's concern that anything uaccess related belongs with the rest of the uaccess code [2]. The first consumer of this interface is a new 'copy_from_iter' dax operation so that pmem can inject cache maintenance operations without imposing this overhead on other dax-capable drivers. [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html Cc: <x86@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 05 5月, 2017 1 次提交
-
-
由 Dan Williams 提交于
This is a preparation patch for handling locked nvdimm label regions, a new concept as introduced by the latest DSM document on pmem.io [1]. A future patch will leverage nvdimm_set_locked() at DIMM probe time to flag regions that can not be enabled. There should be no functional difference resulting from this change. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example-V1.3.pdfSigned-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 30 4月, 2017 1 次提交
-
-
由 Dan Williams 提交于
Toshi noticed that the new support for a region-level badblocks missed the case where errors are cleared due to BTT I/O. An initial attempt to fix this ran into a "sleeping while atomic" warning due to taking the nvdimm_bus_lock() in the BTT I/O path to satisfy the locking requirements of __nvdimm_bus_badblocks_clear(). However, that lock is not needed since we are not acting on any data that is subject to change under that lock. The badblocks instance has its own internal lock to handle mutations of the error list. So, in order to make it clear that we are just acting on region devices, rename __nvdimm_bus_badblocks_clear() to nvdimm_clear_badblocks_regions(). Eliminate the lock and consolidate all support routines for the new nvdimm_account_cleared_poison() in drivers/nvdimm/bus.c. Finally, to the opportunity to cleanup to some unnecessary casts, make the calling convention of nvdimm_clear_badblocks_regions() clearer by replacing struct resource with the minimal struct clear_badblocks_context, and use the DEVICE_ATTR macro. Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Reported-by: NToshi Kani <toshi.kani@hpe.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 29 4月, 2017 1 次提交
-
-
由 Dan Williams 提交于
The nvdimm_flush() mechanism helps to reduce the impact of an ADR (asynchronous-dimm-refresh) failure. The ADR mechanism handles flushing platform WPQ (write-pending-queue) buffers when power is removed. The nvdimm_flush() mechanism performs that same function on-demand. When a pmem namespace is associated with a block device, an nvdimm_flush() is triggered with every block-layer REQ_FUA, or REQ_FLUSH request. These requests are typically associated with filesystem metadata updates. However, when a namespace is in device-dax mode, userspace (think database metadata) needs another path to perform the same flushing. In other words this is not required to make data persistent, but in the case of metadata it allows for a smaller failure domain in the unlikely event of an ADR failure. The new 'deep_flush' attribute is visible when the individual DIMMs backing a given interleave-set are described by platform firmware. In ACPI terms this is "NVDIMM Region Mapping Structures" and associated "Flush Hint Address Structures". Reads return "1" if the region supports triggering WPQ flushes on all DIMMs. Reads return "0" the flush operation is a platform nop, and in that case the attribute is read-only. Why sysfs and not an ioctl? An ioctl requires establishing a new ioctl function number space for device-dax. Given that this would be called on a device-dax fd an application could be forgiven for accidentally calling this on a filesystem-dax fd. Placing this interface in libnvdimm sysfs removes that potential for collision with a filesystem ioctl, and it keeps ioctls out of the generic device-dax implementation. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 25 4月, 2017 1 次提交
-
-
由 Dan Williams 提交于
In the case where a dimm does not have any associated flush hints the ndrd->flush_wpq array may be uninitialized leading to crashes with the following signature: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: region_visible+0x10f/0x160 [libnvdimm] Call Trace: internal_create_group+0xbe/0x2f0 sysfs_create_groups+0x40/0x80 device_add+0x2d8/0x650 nd_async_device_register+0x12/0x40 [libnvdimm] async_run_entry_fn+0x39/0x170 process_one_work+0x212/0x6c0 ? process_one_work+0x197/0x6c0 worker_thread+0x4e/0x4a0 kthread+0x10c/0x140 ? process_one_work+0x6c0/0x6c0 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x31/0x40 Cc: <stable@vger.kernel.org> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Fixes: f284a4f2 ("libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush()") Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 13 4月, 2017 2 次提交
-
-
由 Dave Jiang 提交于
Adding sysfs attribute in order to export the physical address of the region. This is for supporting of user app poison clear via ND_IOCTL_CLEAR_ERROR. Signed-off-by: NDave Jiang <dave.jiang@intel.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dave Jiang 提交于
badblocks sysfs file will be export at region level. When nvdimm event notifier happens for NVDIMM_REVALIATE_POISON, the badblocks in the region will be updated. Signed-off-by: NDave Jiang <dave.jiang@intel.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 01 3月, 2017 1 次提交
-
-
由 Dan Williams 提交于
The interleave-set cookie is a sum that sanity checks the composition of an interleave set has not changed from when the namespace was initially created. The checksum is calculated by sorting the DIMMs by their location in the interleave-set. The comparison for the sort must be 64-bit wide, not byte-by-byte as performed by memcmp() in the broken case. Fix the implementation to accept correct cookie values in addition to the Linux "memcmp" order cookies, but only allow correct cookies to be generated going forward. It does mean that namespaces created by third-party-tooling, or created by newer kernels with this fix, will not validate on older kernels. However, there are a couple mitigating conditions: 1/ platforms with namespace-label capable NVDIMMs are not widely available. 2/ interleave-sets with a single-dimm are by definition not affected (nothing to sort). This covers the QEMU-KVM NVDIMM emulation case. The cookie stored in the namespace label will be fixed by any write the namespace label, the most straightforward way to achieve this is to write to the "alt_name" attribute of a namespace in sysfs. Cc: <stable@vger.kernel.org> Fixes: eaf96153 ("libnvdimm, nfit: add interleave-set state-tracking infrastructure") Reported-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com> Tested-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 16 12月, 2016 1 次提交
-
-
由 Dan Williams 提交于
For warnings that should only ever trigger during development and testing replace WARN statements with lockdep_assert_held. The lockdep pattern is prevalent, and these paths are are well covered by libnvdimm unit tests. Reported-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 08 10月, 2016 2 次提交
-
-
由 Dan Williams 提交于
Similar to BLK regions, publish new seed namespace devices to allow unused PMEM region capacity to be consumed by additional namespaces. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
The free dpa (dimm-physical-address) space calculation reports how much free space is available with consideration for aliased BLK + PMEM regions. Recall that BLK capacity is allocated from high addresses and PMEM is allocated from low addresses in their respective regions. nd_region_available_dpa() accounts for the fact that the largest encroachment (lowest starting address) into PMEM capacity by a BLK allocation limits the available capacity to that point, regardless if there is BLK allocation hole at a higher address. Similarly, for the multi-pmem case we need to track the largest encroachment (highest ending address) of a PMEM allocation in BLK capacity regardless of whether there is an allocation hole that a BLK allocation could fill at a lower address. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 01 10月, 2016 3 次提交
-
-
由 Dan Williams 提交于
In preparation for enabling multiple namespaces per pmem region, convert the label tracking to use a linked list. In particular this will allow select_pmem_id() to move labels from the unvalidated state to the validated state. Currently we only track one validated set per-region. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Before we add more libnvdimm-private fields to nd_mapping make it clear which parameters are input vs libnvdimm internals. Use struct nd_mapping_desc instead of struct nd_mapping in nd_region_desc and make struct nd_mapping private to libnvdimm. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dave Jiang 提交于
Existing implemenetation writes to all the flush hint addresses for a given ND region. This is not necessary as the flushes are per imc and not per DIMM. Search the mappings and clear out the duplicates at init to avoid multiple flush to the same imc. Signed-off-by: NDave Jiang <dave.jiang@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 25 9月, 2016 1 次提交
-
-
由 Dan Williams 提交于
The definition of the flush hint table as: void __iomem *flush_wpq[0][0]; ...passed the unit test, but is broken as flush_wpq[0][1] and flush_wpq[1][0] refer to the same entry. Fix this to use a helper that calculates a slot in the table based on the geometry of flush hints in the region. This is important to get right since virtualization solutions use this mechanism to trigger hypervisor flushes to platform persistence. Reported-by: NDave Jiang <dave.jiang@intel.com> Tested-by: NDave Jiang <dave.jiang@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 19 9月, 2016 1 次提交
-
-
由 Oliver O'Halloran 提交于
nd_activate_region() iomaps any hint addresses required when activating a region. To prevent duplicate mappings it checks the PFN of the hint to be mapped against the PFNs of the already mapped hints. Unfortunately it doesn't convert the PFN back into a physical address before passing it to devm_nvdimm_ioremap(). Instead it applies PHYS_PFN a second time which ends about as well as you would imagine. Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 12 7月, 2016 1 次提交
-
-
由 Dan Williams 提交于
When the NFIT provides multiple flush hint addresses per-dimm it is expressing that the platform is capable of processing multiple flush requests in parallel. There is some fixed cost per flush request, let the cost be shared in parallel on multiple cpus. Since there may not be enough flush hint addresses for each cpu to have one, keep a per-cpu index of the last used hint, hash it with current pid, and assume that access pattern and scheduler randomness will keep the flush-hint usage somewhat staggered across cpus. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-