- 13 4月, 2017 1 次提交
-
-
由 Dave Jiang 提交于
badblocks sysfs file will be export at region level. When nvdimm event notifier happens for NVDIMM_REVALIATE_POISON, the badblocks in the region will be updated. Signed-off-by: NDave Jiang <dave.jiang@intel.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 01 3月, 2017 1 次提交
-
-
由 Dan Williams 提交于
The interleave-set cookie is a sum that sanity checks the composition of an interleave set has not changed from when the namespace was initially created. The checksum is calculated by sorting the DIMMs by their location in the interleave-set. The comparison for the sort must be 64-bit wide, not byte-by-byte as performed by memcmp() in the broken case. Fix the implementation to accept correct cookie values in addition to the Linux "memcmp" order cookies, but only allow correct cookies to be generated going forward. It does mean that namespaces created by third-party-tooling, or created by newer kernels with this fix, will not validate on older kernels. However, there are a couple mitigating conditions: 1/ platforms with namespace-label capable NVDIMMs are not widely available. 2/ interleave-sets with a single-dimm are by definition not affected (nothing to sort). This covers the QEMU-KVM NVDIMM emulation case. The cookie stored in the namespace label will be fixed by any write the namespace label, the most straightforward way to achieve this is to write to the "alt_name" attribute of a namespace in sysfs. Cc: <stable@vger.kernel.org> Fixes: eaf96153 ("libnvdimm, nfit: add interleave-set state-tracking infrastructure") Reported-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com> Tested-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 19 10月, 2016 2 次提交
-
-
由 Dan Williams 提交于
Platforms like QEMU-KVM implement an NFIT table and label DSMs. However, since that environment does not define an aliased configuration, the labels are currently ignored and the kernel registers a single full-sized pmem-namespace per region. Now that the kernel supports sub-divisions of pmem regions the labels have a purpose. Arrange for the labels to be honored when we find an existing / valid namespace index block. Cc: <qemu-devel@nongnu.org> Cc: Haozhong Zhang <haozhong.zhang@intel.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Toshi Kani 提交于
nd_iostat_start() and nd_iostat_end() implement the same functionality that generic_start_io_acct() and generic_end_io_acct() already provide. Change nd_iostat_start() and nd_iostat_end() to call the generic iostat interfaces. There is no change in the nd interfaces. Signed-off-by: NToshi Kani <toshi.kani@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 01 10月, 2016 2 次提交
-
-
由 Dan Williams 提交于
In preparation for enabling multiple namespaces per pmem region, convert the label tracking to use a linked list. In particular this will allow select_pmem_id() to move labels from the unvalidated state to the validated state. Currently we only track one validated set per-region. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Before we add more libnvdimm-private fields to nd_mapping make it clear which parameters are input vs libnvdimm internals. Use struct nd_mapping_desc instead of struct nd_mapping in nd_region_desc and make struct nd_mapping private to libnvdimm. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 25 9月, 2016 1 次提交
-
-
由 Dan Williams 提交于
The definition of the flush hint table as: void __iomem *flush_wpq[0][0]; ...passed the unit test, but is broken as flush_wpq[0][1] and flush_wpq[1][0] refer to the same entry. Fix this to use a helper that calculates a slot in the table based on the geometry of flush hints in the region. This is important to get right since virtualization solutions use this mechanism to trigger hypervisor flushes to platform persistence. Reported-by: NDave Jiang <dave.jiang@intel.com> Tested-by: NDave Jiang <dave.jiang@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 02 9月, 2016 1 次提交
-
-
由 Toshi Kani 提交于
'ndctl list --buses --dimms' does not list any NVDIMM-Ns since they are considered as idle. ndctl checks if any driver is attached to nmem device. nvdimm_probe() always fails in nvdimm_init_nsarea() since NVDIMM-Ns do not implement optinal ND_CMD_GET_CONFIG_DATA command. Change nvdimm_probe() to accept the case that the CONFIG_DATA command is not implemented for NVDIMM-Ns. The driver attaches without ndd, which keeps it no-op to the device. Reported-by: NBrian Boylston <brian.boylston@hpe.com> Signed-off-by: NToshi Kani <toshi.kani@hpe.com> Cc: Dan Williams <dan.j.williams@intel.com> Tested-by: NJohannes Thumshirn <jthumshirn@suse.de> Acked-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 09 8月, 2016 1 次提交
-
-
由 Vishal Verma 提交于
To be consistent with other namespaces, expose a 'size' attribute for BTT devices also. Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: NLinda Knippers <linda.knippers@hpe.com> Signed-off-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 12 7月, 2016 3 次提交
-
-
由 Dan Williams 提交于
When the NFIT provides multiple flush hint addresses per-dimm it is expressing that the platform is capable of processing multiple flush requests in parallel. There is some fixed cost per flush request, let the cost be shared in parallel on multiple cpus. Since there may not be enough flush hint addresses for each cpu to have one, keep a per-cpu index of the last used hint, hash it with current pid, and assume that access pattern and scheduler randomness will keep the flush-hint usage somewhat staggered across cpus. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
In preparation for triggering flushes of a DIMM's writes-posted-queue (WPQ) via the pmem driver move mapping of flush hint addresses to the region driver. Since this uses devm_nvdimm_memremap() the flush addresses will remain mapped while any region to which the dimm belongs is active. We need to communicate more information to the nvdimm core to facilitate this mapping, namely each dimm object now carries an array of flush hint address resources. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Now that all shared mappings are handled by devm_nvdimm_memremap() we no longer need nfit_spa_map() nor do we need to trigger a callback to the bus provider at region disable time. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 21 5月, 2016 1 次提交
-
-
由 Dan Williams 提交于
For autodetecting a previously established dax configuration we need the info block to indicate block-device vs device-dax mode, and we need to have the default namespace probe hand-off the configuration to the dax_pmem driver. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 10 5月, 2016 1 次提交
-
-
由 Dan Williams 提交于
Device DAX is the device-centric analogue of Filesystem DAX (CONFIG_FS_DAX). It allows persistent memory ranges to be allocated and mapped without need of an intervening file system. This initial infrastructure arranges for a libnvdimm pfn-device to be represented as a different device-type so that it can be attached to a driver other than the pmem driver. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 23 4月, 2016 5 次提交
-
-
由 Dan Williams 提交于
Now that pmem internals have been disentangled from pfn setup, that code can move to the core. This is in preparation for adding another user of the pfn-device capabilities. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
In preparation for providing an alternative (to block device) access mechanism to persistent memory, convert pmem_rw_bytes() to nsio_rw_bytes(). This allows ->rw_bytes() functionality without requiring a 'struct pmem_device' to be instantiated. In other words, when ->rw_bytes() is in use i/o is driven through 'struct nd_namespace_io', otherwise it is driven through 'struct pmem_device' and the block layer. This consolidates the disjoint calls to devm_exit_badblocks() and devm_memunmap() into a common devm_nsio_disable() and cleans up the init path to use a unified pmem_attach_disk() implementation. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Pass the device performing the probe so we can use a devm allocation for the btt superblock. Cc: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Pass the device performing the probe so we can use a devm allocation for the pfn superblock. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
We can derive the common namespace from other information. We also do not need to cache it because all the usages are in slow paths. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 08 4月, 2016 1 次提交
-
-
由 Dan Williams 提交于
When section alignment padding is in effect we need to shift / truncate the range that is queried for poison by the 'start_pad' or 'end_trunc' reservations. It's easiest if we just pass in an adjusted resource range rather than deriving it from the passed in namespace. With the resource range resolution pushed out to the caller we can also push the namespace-to-region lookup to the caller and drop the implicit pmem-type assumption about the passed in namespace object. Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 10 3月, 2016 1 次提交
-
-
由 Dan Williams 提交于
If a write is directed at a known bad block perform the following: 1/ write the data 2/ send a clear poison command 3/ invalidate the poison out of the cache hierarchy Cc: <x86@kernel.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 06 3月, 2016 1 次提交
-
-
由 Dan Williams 提交于
In preparation for asynchronous address range scrub support add an ability for the pmem driver to dynamically consume address range scrub results. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 10 1月, 2016 3 次提交
-
-
由 Dan Williams 提交于
If a device will ever have badblocks it should always have a badblocks instance available. So, similar to md, embed a badblocks instance in pmem_device. This reduces pointer chasing in the i/o fast path, and simplifies the init path. Reported-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
nd-core.h is private to the libnvdimm core internals and should not be used by drivers. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Vishal Verma 提交于
During region creation, perform Address Range Scrubs (ARS) for the SPA (System Physical Address) ranges to retrieve known poison locations from firmware. Add a new data structure 'nd_poison' which is used as a list in nvdimm_bus to store these poison locations. When creating a pmem namespace, if there is any known poison associated with its physical address space, convert the poison ranges to bad sectors that are exposed using the badblocks interface. Signed-off-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 13 12月, 2015 1 次提交
-
-
由 Dan Williams 提交于
When setting aside capacity for struct page it must be aligned to the largest mapping size that is to be made available via DAX. Make the alignment configurable to enable support for 1GiB page-size mappings. The offset for PFN_MODE_RAM may now be larger than SZ_8K, so fixup the offset check in nvdimm_namespace_attach_pfn(). Reported-by: NToshi Kani <toshi.kani@hpe.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 11 12月, 2015 1 次提交
-
-
由 Dan Williams 提交于
The alignment constraint isn't necessary now that devm_memremap_pages() allows for unaligned mappings. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 29 8月, 2015 3 次提交
-
-
由 Dan Williams 提交于
The expectation is that the legacy / non-standard pmem discovery method (e820 type-12) will only ever be used to describe small quantities of persistent memory. Larger capacities will be described via the ACPI NFIT. When "allocate struct page from pmem" support is added this default policy can be overridden by assigning a legacy pmem namespace to a pfn device, however this would be only be necessary if a platform used the legacy mechanism to define a very large range. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Enable the pmem driver to handle PFN device instances. Attaching a pmem namespace to a pfn device triggers the driver to allocate and initialize struct page entries for pmem. Memory capacity for this allocation comes exclusively from RAM for now which is suitable for low PMEM to RAM ratios. This mechanism will be expanded later for setting an "allocate from PMEM" policy. Cc: Boaz Harrosh <boaz@plexistor.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Implement the base infrastructure for libnvdimm PFN devices. Similar to BTT devices they take a namespace as a backing device and layer functionality on top. In this case the functionality is reserving space for an array of 'struct page' entries to be handed out through pfn_to_page(). For now this is just the basic libnvdimm-device-model for configuring the base PFN device. As the namespace claiming mechanism for PFN devices is mostly identical to BTT devices drivers/nvdimm/claim.c is created to house the common bits. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 15 8月, 2015 1 次提交
-
-
由 Vishal Verma 提交于
When a BTT is instantiated on a namespace it must validate the namespace uuid matches the 'parent_uuid' stored in the btt superblock. This property enforces that changing the namespace UUID invalidates all former BTT instances on that storage. For "IO namespaces" that don't have a label or UUID, the parent_uuid is set to zero, and this validation is skipped. For such cases, old BTTs have to be invalidated by forcing the namespace to raw mode, and overwriting the BTT info blocks. Based on a patch by Dan Williams <dan.j.williams@intel.com> Signed-off-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 01 8月, 2015 1 次提交
-
-
由 Randy Dunlap 提交于
Fix multiple build warnings when CONFIG_BTT is not enabled: In file included from ../drivers/nvdimm/bus.c:29:0: ../drivers/nvdimm/nd.h:169:15: warning: return type defaults to 'int' [-Wreturn-type] static inline nd_btt_probe(struct nd_namespace_common *ndns, void *drvdata) ^ Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-nvdimm@lists.01.org Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 26 6月, 2015 7 次提交
-
-
由 Toshi Kani 提交于
ACPI NFIT table has System Physical Address Range Structure entries that describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is set in the flags. Change acpi_nfit_register_region() to map a proximity ID to its node ID, and set it to a new numa_node field of nd_region_desc, which is then conveyed to the nd_region device. The device core arranges for btt and namespace devices to inherit their node from their parent region. Signed-off-by: NToshi Kani <toshi.kani@hp.com> [djbw: move set_dev_node() from region.c to bus.c] Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Upon detection of an unarmed dimm in a region, arrange for descendant BTT, PMEM, or BLK instances to be read-only. A dimm is primarily marked "unarmed" via flags passed by platform firmware (NFIT). The flags in the NFIT memory device sub-structure indicate the state of the data on the nvdimm relative to its energy source or last "flush to persistence". For the most part there is nothing the driver can do but advertise the state of these flags in sysfs and emit a message if firmware indicates that the contents of the device may be corrupted. However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for the block devices incorporating that nvdimm to be marked read-only. This is a safe default as the data is still available and new writes are held off until the administrator either forces read-write mode, or the energy source becomes armed. A 'read_only' attribute is added to REGION devices to allow for overriding the default read-only policy of all descendant block devices. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
This is disabled by default as the overhead is prohibitive, but if the user takes the action to turn it on we'll oblige. Reviewed-by: NVishal Verma <vishal.l.verma@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Vishal Verma 提交于
Support multiple block sizes (sector + metadata) for nd_blk in the same way as done for the BTT. Add the idea of an 'internal' lbasize, which is properly aligned and padded, and store metadata in this space. Signed-off-by: NVishal Verma <vishal.l.verma@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Vishal Verma 提交于
Support multiple block sizes (sector + metadata) using the blk integrity framework. This registers a new integrity template that defines the protection information tuple size based on the configured metadata size, and simply acts as a passthrough for protection information generated by another layer. The metadata is written to the storage as-is, and read back with each sector. Signed-off-by: NVishal Verma <vishal.l.verma@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Ross Zwisler 提交于
The libnvdimm implementation handles allocating dimm address space (DPA) between PMEM and BLK mode interfaces. After DPA has been allocated from a BLK-region to a BLK-namespace the nd_blk driver attaches to handle I/O as a struct bio based block device. Unlike PMEM, BLK is required to handle platform specific details like mmio register formats and memory controller interleave. For this reason the libnvdimm generic nd_blk driver calls back into the bus provider to carry out the I/O. This initial implementation handles the BLK interface defined by the ACPI 6 NFIT [1] and the NVDIMM DSM Interface Example [2] composed from DCR (dimm control region), BDW (block data window), IDT (interleave descriptor) NFIT structures and the hardware register format. [1]: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf [2]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jens Axboe <axboe@fb.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Vishal Verma 提交于
BTT stands for Block Translation Table, and is a way to provide power fail sector atomicity semantics for block devices that have the ability to perform byte granularity IO. It relies on the capability of libnvdimm namespace devices to do byte aligned IO. The BTT works as a stacked blocked device, and reserves a chunk of space from the backing device for its accounting metadata. It is a bio-based driver because all IO is done synchronously, and there is no queuing or asynchronous completions at either the device or the driver level. The BTT uses 'lanes' to index into various 'on-disk' data structures, and lanes also act as a synchronization mechanism in case there are more CPUs than available lanes. We did a comparison between two lane lock strategies - first where we kept an atomic counter around that tracked which was the last lane that was used, and 'our' lane was determined by atomically incrementing that. That way, for the nr_cpus > nr_lanes case, theoretically, no CPU would be blocked waiting for a lane. The other strategy was to use the cpu number we're scheduled on to and hash it to a lane number. Theoretically, this could block an IO that could've otherwise run using a different, free lane. But some fio workloads showed that the direct cpu -> lane hash performed faster than tracking 'last lane' - my reasoning is the cache thrash caused by moving the atomic variable made that approach slower than simply waiting out the in-progress IO. This supports the conclusion that the driver can be a very simple bio-based one that does synchronous IOs instead of queuing. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jens Axboe <axboe@fb.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg KH <gregkh@linuxfoundation.org> [jmoyer: fix nmi watchdog timeout in btt_map_init] [jmoyer: move btt initialization to module load path] [jmoyer: fix memory leak in the btt initialization path] [jmoyer: Don't overwrite corrupted arenas] Signed-off-by: NVishal Verma <vishal.l.verma@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 25 6月, 2015 1 次提交
-
-
由 Dan Williams 提交于
NVDIMM namespaces, in addition to accepting "struct bio" based requests, also have the capability to perform byte-aligned accesses. By default only the bio/block interface is used. However, if another driver can make effective use of the byte-aligned capability it can claim namespace interface and use the byte-aligned ->rw_bytes() interface. The BTT driver is the initial first consumer of this mechanism to allow adding atomic sector update semantics to a pmem or blk namespace. This patch is the sysfs infrastructure to allow configuring a BTT instance for a namespace. Enabling that BTT and performing i/o is in a subsequent patch. Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-