- 15 8月, 2020 1 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
This function returns the number of bytes in a THP. It is like page_size(), but compiles to just PAGE_SIZE if CONFIG_TRANSPARENT_HUGEPAGE is disabled. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com> Reviewed-by: NZi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20200629151959.15779-5-willy@infradead.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 7月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
The make_request_fn is a little weird in that it sits directly in struct request_queue instead of an operation vector. Replace it with a block_device_operations method called submit_bio (which describes much better what it does). Also remove the request_queue argument to it, as the queue can be derived pretty trivially from the bio. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 6月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
This seems to lead to some crazy include loops when using asm-generic/cacheflush.h on more architectures, so leave it to the arch header for now. [hch@lst.de: fix warning] Link: http://lkml.kernel.org/r/20200520173520.GA11199@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Will Deacon <will@kernel.org> Cc: Nick Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Link: http://lkml.kernel.org/r/20200515143646.3857579-7-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 5月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Switch dm to use the nicer bio accounting helpers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 5月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
In preparation for removing queuedata as an argument to make_request_fn() drop the dependency ->queuedata. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20200508161517.252308-16-hch@lst.deSigned-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 03 4月, 2020 3 次提交
-
-
由 Vivek Goyal 提交于
zero_page_range() dax operation is mandatory for dax devices. Right now that check happens in dax_zero_page_range() function. Dan thinks that's too late and its better to do the check earlier in alloc_dax(). I also modified alloc_dax() to return pointer with error code in it in case of failure. Right now it returns NULL and caller assumes failure happened due to -ENOMEM. But with this ->zero_page_range() check, I need to return -EINVAL instead. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20200401161125.GB9398@redhat.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Vivek Goyal 提交于
Add a dax operation zero_page_range, to zero a page. This will also clear any known poison in the page being zeroed. As of now, zeroing of one page is allowed in a single call. There are no callers which are trying to zero more than a page in a single call. Once we grow the callers which zero more than a page in single call, we can add that support. Primary reason for not doing that yet is that this will add little complexity in dm implementation where a range might be spanning multiple underlying targets and one will have to split the range into multiple sub ranges and call zero_page_range() on individual targets. Suggested-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Reviewed-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com> Link: https://lore.kernel.org/r/20200228163456.1587-3-vgoyal@redhat.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Vivek Goyal 提交于
This splits pmem_do_bvec() into pmem_do_read() and pmem_do_write(). pmem_do_write() will be used by pmem zero_page_range() as well. Hence sharing the same code. Suggested-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com> Link: https://lore.kernel.org/r/20200228163456.1587-2-vgoyal@redhat.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 28 3月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Current make_request based drivers use either blk_alloc_queue_node or blk_alloc_queue to allocate a queue, and then set up the make_request_fn function pointer and a few parameters using the blk_queue_make_request helper. Simplify this by passing the make_request pointer to blk_alloc_queue, and while at it merge the _node variant into the main helper by always passing a node_id, and remove the superfluous gfp_mask parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 2月, 2020 1 次提交
-
-
由 Dan Williams 提交于
After the removal of the device-public infrastructure there are only 2 ->page_free() call backs in the kernel. One of those is a device-private callback in the nouveau driver, the other is a generic wakeup needed in the DAX case. In the hopes that all ->page_free() callbacks can be migrated to common core kernel functionality, move the device-private specific actions in __put_devmap_managed_page() under the is_device_private_page() conditional, including the ->page_free() callback. For the other page types just open-code the generic wakeup. Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA case. Link: http://lkml.kernel.org/r/20200107224558.2362728-4-jhubbard@nvidia.comSigned-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 11月, 2019 1 次提交
-
-
由 Dan Williams 提交于
The entire point of nd-core.h is to hide functionality that no leaf driver should touch. In fact, the commit that added it had no need to include it. Fixes: 06e8ccda ("acpi: nfit: Add support for detect platform...") Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 15 11月, 2019 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
The nvdimm core currently maps the full namespace to an ioremap range while probing the namespace mode. This can result in probe failures on architectures that have limited ioremap space. For example, with a large btt namespace that consumes most of I/O remap range, depending on the sequence of namespace initialization, the user can find a pfn namespace initialization failure due to unavailable I/O remap space which nvdimm core uses for temporary mapping. nvdimm core can avoid this failure by only mapping the reserved info block area to check for pfn superblock type and map the full namespace resource only before using the namespace. Given that personalities like BTT can be layered on top of any namespace type create a generic form of devm_nsio_enable (devm_namespace_enable) and use it inside the per-personality attach routines. Now devm_namespace_enable() is always paired with disable unless the mapping is going to be used for long term runtime access. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20191017073308.32645-1-aneesh.kumar@linux.ibm.com [djbw: reworks to move devm_namespace_{en,dis}able into *attach helpers] Reported-by: Nkbuild test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20191031105741.102793-2-aneesh.kumar@linux.ibm.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 06 9月, 2019 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
In order to support marking namespaces with unsupported feature/versions disabled, nvdimm core should advance the namespace seed on these probe failures. Otherwise, these failed namespaces will be considered a seed namespace and will be wrongly used while creating new namespaces. Add -EOPNOTSUPP as return from pmem probe callback to indicate a namespace initialization failures due to pfn superblock feature/version mismatch. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20190905154603.10349-3-aneesh.kumar@linux.ibm.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 19 7月, 2019 1 次提交
-
-
由 Dan Williams 提交于
For good reason, the standard device_lock() is marked lockdep_set_novalidate_class() because there is simply no sane way to describe the myriad ways the device_lock() ordered with other locks. However, that leaves subsystems that know their own local device_lock() ordering rules to find lock ordering mistakes manually. Instead, introduce an optional / additional lockdep-enabled lock that a subsystem can acquire in all the same paths that the device_lock() is acquired. A conversion of the NFIT driver and NVDIMM subsystem to a lockdep-validate device_lock() scheme is included. The debug_nvdimm_lock() implementation implements the correct lock-class and stacking order for the libnvdimm device topology hierarchy. Yes, this is a hack, but hopefully it is a useful hack for other subsystems device_lock() debug sessions. Quoting Greg: "Yeah, it feels a bit hacky but it's really up to a subsystem to mess up using it as much as anything else, so user beware :) I don't object to it if it makes things easier for you to debug." Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/156341210661.292348.7014034644265455704.stgit@dwillia2-desk3.amr.corp.intel.com
-
- 06 7月, 2019 2 次提交
-
-
由 Pankaj Gupta 提交于
This patch adds 'DAXDEV_SYNC' flag which is set for nd_region doing synchronous flush. This later is used to disable MAP_SYNC functionality for ext4 & xfs filesystem for devices don't support synchronous flush. Signed-off-by: NPankaj Gupta <pagupta@redhat.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Pankaj Gupta 提交于
This patch adds functionality to perform flush from guest to host over VIRTIO. We are registering a callback based on 'nd_region' type. virtio_pmem driver requires this special flush function. For rest of the region types we are registering existing flush function. Report error returned by host fsync failure to userspace. Signed-off-by: NPankaj Gupta <pagupta@redhat.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 03 7月, 2019 5 次提交
-
-
由 Christoph Hellwig 提交于
Add a flags field to struct dev_pagemap to replace the altmap_valid boolean to be a little more extensible. Also add a pgmap_altmap() helper to find the optional altmap and clean up the code using the altmap using it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NDan Williams <dan.j.williams@intel.com> Tested-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Christoph Hellwig 提交于
struct dev_pagemap is always embedded into a containing structure, so there is no need to an additional private data field. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NDan Williams <dan.j.williams@intel.com> Tested-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Christoph Hellwig 提交于
Just check if there is a ->page_free operation set and take care of the static key enable, as well as the put using device managed resources. Also check that a ->page_free is provided for the pgmaps types that require it, and check for a valid type as well while we are at it. Note that this also fixes the fact that hmm never called dev_pagemap_put_ops and thus would leave the slow path enabled forever, even after a device driver unload or disable. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NDan Williams <dan.j.williams@intel.com> Tested-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Christoph Hellwig 提交于
Passing the actual typed structure leads to more understandable code vs just passing the ref member. Reported-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NDan Williams <dan.j.williams@intel.com> Tested-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Christoph Hellwig 提交于
The dev_pagemap is a growing too many callbacks. Move them into a separate ops structure so that they are not duplicated for multiple instances, and an attacker can't easily overwrite them. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NDan Williams <dan.j.williams@intel.com> Tested-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 14 6月, 2019 1 次提交
-
-
由 Dan Williams 提交于
Logan noticed that devm_memremap_pages_release() kills the percpu_ref drops all the page references that were acquired at init and then immediately proceeds to unplug, arch_remove_memory(), the backing pages for the pagemap. If for some reason device shutdown actually collides with a busy / elevated-ref-count page then arch_remove_memory() should be deferred until after that reference is dropped. As it stands the "wait for last page ref drop" happens *after* devm_memremap_pages_release() returns, which is obviously too late and can lead to crashes. Fix this situation by assigning the responsibility to wait for the percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup() callback. Implement the new cleanup callback for all devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma. Link: http://lkml.kernel.org/r/155727339156.292046.5432007428235387859.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 41e94a85 ("add devm_memremap_pages") Signed-off-by: NDan Williams <dan.j.williams@intel.com> Reported-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 6月, 2019 1 次提交
-
-
由 Thomas Gleixner 提交于
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms and conditions of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 263 file(s). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NAllison Randal <allison@lohutok.net> Reviewed-by: NAlexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 21 5月, 2019 2 次提交
-
-
由 Dan Williams 提交于
Jeff discovered that performance improves from ~375K iops to ~519K iops on a simple psync-write fio workload when moving the location of 'struct page' from the default PMEM location to DRAM. This result is surprising because the expectation is that 'struct page' for dax is only needed for third party references to dax mappings. For example, a dax-mapped buffer passed to another system call for direct-I/O requires 'struct page' for sending the request down the driver stack and pinning the page. There is no usage of 'struct page' for first party access to a file via read(2)/write(2) and friends. However, this "no page needed" expectation is violated by CONFIG_HARDENED_USERCOPY and the check_copy_size() performed in copy_from_iter_full_nocache() and copy_to_iter_mcsafe(). The check_heap_object() helper routine assumes the buffer is backed by a slab allocator (DRAM) page and applies some checks. Those checks are invalid, dax pages do not originate from the slab, and redundant, dax_iomap_actor() has already validated that the I/O is within bounds. Specifically that routine validates that the logical file offset is within bounds of the file, then it does a sector-to-pfn translation which validates that the physical mapping is within bounds of the block device. Bypass additional hardened usercopy overhead and call the 'no check' versions of the copy_{to,from}_iter operations directly. Fixes: 0aed55af ("x86, uaccess: introduce copy_from_iter_flushcache...") Cc: <stable@vger.kernel.org> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <willy@infradead.org> Reported-and-tested-by: NJeff Smits <jeff.smits@intel.com> Acked-by: NKees Cook <keescook@chromium.org> Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Pankaj reports that starting with commit ad428cdb "dax: Check the end of the block-device capacity with dax_direct_access()" device-mapper no longer allows dax operation. This results from the stricter checks in __bdev_dax_supported() that validate that the start and end of a block-device map to the same 'pagemap' instance. Teach the dax-core and device-mapper to validate the 'pagemap' on a per-target basis. This is accomplished by refactoring the bdev_dax_supported() internals into generic_fsdax_supported() which takes a sector range to validate. Consequently generic_fsdax_supported() is suitable to be used in a device-mapper ->iterate_devices() callback. A new ->dax_supported() operation is added to allow composite devices to split and route upper-level bdev_dax_supported() requests. Fixes: ad428cdb ("dax: Check the end of the block-device...") Cc: <stable@vger.kernel.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: NJan Kara <jack@suse.cz> Reported-by: NPankaj Gupta <pagupta@redhat.com> Reviewed-by: NPankaj Gupta <pagupta@redhat.com> Tested-by: NPankaj Gupta <pagupta@redhat.com> Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 08 4月, 2019 1 次提交
-
-
由 Li RongQing 提交于
If offset is not zero and length is bigger than PAGE_SIZE, this will cause to out of boundary access to a page memory Fixes: 98cc093c ("block, THP: make block_device_operations.rw_page support THP") Co-developed-by: NLiang ZhiCheng <liangzhicheng@baidu.com> Signed-off-by: NLiang ZhiCheng <liangzhicheng@baidu.com> Signed-off-by: NLi RongQing <lirongqing@baidu.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 29 12月, 2018 1 次提交
-
-
由 Dan Williams 提交于
The last step before devm_memremap_pages() returns success is to allocate a release action, devm_memremap_pages_release(), to tear the entire setup down. However, the result from devm_add_action() is not checked. Checking the error from devm_add_action() is not enough. The api currently relies on the fact that the percpu_ref it is using is killed by the time the devm_memremap_pages_release() is run. Rather than continue this awkward situation, offload the responsibility of killing the percpu_ref to devm_memremap_pages_release() directly. This allows devm_memremap_pages() to do the right thing relative to init failures and shutdown. Without this change we could fail to register the teardown of devm_memremap_pages(). The likelihood of hitting this failure is tiny as small memory allocations almost always succeed. However, the impact of the failure is large given any future reconfiguration, or disable/enable, of an nvdimm namespace will fail forever as subsequent calls to devm_memremap_pages() will fail to setup the pgmap_radix since there will be stale entries for the physical address range. An argument could be made to require that the ->kill() operation be set in the @pgmap arg rather than passed in separately. However, it helps code readability, tracking the lifetime of a given instance, to be able to grep the kill routine directly at the devm_memremap_pages() call site. Link: http://lkml.kernel.org/r/154275558526.76910.7535251937849268605.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com> Fixes: e8d51348 ("memremap: change devm_memremap_pages interface...") Reviewed-by: N"Jérôme Glisse" <jglisse@redhat.com> Reported-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 11月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
With the legacy request path gone there is no real need to override the queue_lock. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 10月, 2018 1 次提交
-
-
由 Dan Williams 提交于
The driver is only initializing bb_res in the devm_memremap_pages() paths, but the raw namespace case is passing an uninitialized bb_res to nvdimm_badblocks_populate(). Fixes: e8d51348 ("memremap: change devm_memremap_pages interface...") Cc: <stable@vger.kernel.org> Cc: Christoph Hellwig <hch@lst.de> Reported-by: NJacek Zloch <jacek.zloch@intel.com> Reported-by: NKrzysztof Rusocki <krzysztof.rusocki@intel.com> Reviewed-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 28 9月, 2018 1 次提交
-
-
由 Hannes Reinecke 提交于
Update device_add_disk() to take an 'groups' argument so that individual drivers can register a device with additional sysfs attributes. This avoids race condition the driver would otherwise have if these groups were to be created with sysfs_add_groups(). Signed-off-by: NMartin Wilck <martin.wilck@suse.com> Signed-off-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 8月, 2018 1 次提交
-
-
由 Dan Williams 提交于
Use clear_mce_nospec() to restore WB mode for the kernel linear mapping of a pmem page that was marked 'HWPoison'. A page with 'HWPoison' set has also been marked UC in PAT (page attribute table) via set_mce_nospec() to prevent speculative retrievals of poison. The 'HWPoison' flag is only cleared when overwriting an entire page. Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NDave Jiang <dave.jiang@intel.com>
-
- 31 7月, 2018 1 次提交
-
-
由 Huaisheng Ye 提交于
pmem_direct_access() needs to check the validity of pointers kaddr and pfn for NULL assignment. If anyone equals to NULL, it doesn't need to calculate the value. If pointer equals to NULL, that is to say callers may have no need for kaddr or pfn, so this patch is prepared for allowing them to pass in NULL instead of having to pass in a pointer or local variable that they then just throw away. Signed-off-by: NHuaisheng Ye <yehs1@lenovo.com> Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDave Jiang <dave.jiang@intel.com>
-
- 18 7月, 2018 1 次提交
-
-
由 Tejun Heo 提交于
c11f0c0b ("block/mm: make bdev_ops->rw_page() take a bool for read/write") replaced @OP with boolean @is_write, which limited the amount of information going into ->rw_page() and more importantly page_endio(), which removed the need to expose block internals to mm. Unfortunately, we want to track discards separately and @is_write isn't enough information. This patch updates bdev_ops->rw_page() to take REQ_OP instead but leaves page_endio() to take bool @is_write. This allows the block part of operations to have enough information while not leaking it to mm. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 6月, 2018 1 次提交
-
-
由 Ross Zwisler 提交于
QUEUE_FLAG_DAX is an indication that a given block device supports filesystem DAX and should not be set for PMEM namespaces which are in "raw" mode. These namespaces lack struct page and are prevented from participating in filesystem DAX as of commit 569d0365 ("dax: require 'struct page' by default for filesystem dax"). Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Suggested-by: NMike Snitzer <snitzer@redhat.com> Fixes: 569d0365 ("dax: require 'struct page' by default for filesystem dax") Cc: stable@vger.kernel.org Acked-by: NDan Williams <dan.j.williams@intel.com> Reviewed-by: NToshi Kani <toshi.kani@hpe.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 07 6月, 2018 2 次提交
-
-
由 Ross Zwisler 提交于
Prior to this commit we would only do a "deep flush" (have nvdimm_flush() write to each of the flush hints for a region) in response to an msync/fsync/sync call if the nvdimm_has_cache() returned true at the time we were setting up the request queue. This happens due to the write cache value passed in to blk_queue_write_cache(), which then causes the block layer to send down BIOs with REQ_FUA and REQ_PREFLUSH set. We do have a "write_cache" sysfs entry for namespaces, i.e.: /sys/bus/nd/devices/pfn0.1/block/pmem0/dax/write_cache which can be used to control whether or not the kernel thinks a given namespace has a write cache, but this didn't modify the deep flush behavior that we set up when the driver was initialized. Instead, it only modified whether or not DAX would flush CPU caches via dax_flush() in response to *sync calls. Simplify this by making the *sync deep flush always happen, regardless of the write cache setting of a namespace. The DAX CPU cache flushing will still be controlled the write_cache setting of the namespace. Cc: <stable@vger.kernel.org> Fixes: 5fdf8e5b ("libnvdimm: re-enable deep flush for pmem devices via fsync()") Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Ross Zwisler 提交于
Complete the move from REQ_FLUSH to REQ_PREFLUSH that apparently started way back in v4.8. Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 23 5月, 2018 2 次提交
-
-
由 Dan Williams 提交于
Use the machine check safe version of copy_to_iter() for the ->copy_to_iter() operation published by the pmem driver. Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
Similar to the ->copy_from_iter() operation, a platform may want to deploy an architecture or device specific routine for handling reads from a dax_device like /dev/pmemX. On x86 this routine will point to a machine check safe version of copy_to_iter(). For now, add the plumbing to device-mapper and the dax core. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 22 5月, 2018 1 次提交
-
-
由 Dan Williams 提交于
In preparation for fixing dax-dma-vs-unmap issues, filesystems need to be able to rely on the fact that they will get wakeups on dev_pagemap page-idle events. Introduce MEMORY_DEVICE_FS_DAX and generic_dax_page_free() as common indicator / infrastructure for dax filesytems to require. With this change there are no users of the MEMORY_DEVICE_HOST designation, so remove it. The HMM sub-system extended dev_pagemap to arrange a callback when a dev_pagemap managed page is freed. Since a dev_pagemap page is free / idle when its reference count is 1 it requires an additional branch to check the page-type at put_page() time. Given put_page() is a hot-path we do not want to incur that check if HMM is not in use, so a static branch is used to avoid that overhead when not necessary. Now, the FS_DAX implementation wants to reuse this mechanism for receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific static-key into a generic mechanism that either HMM or FS_DAX code paths can enable. For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support, care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure. However, we still need to support FS_DAX in the FS_DAX_LIMITED case implemented by the s390/dcssblk driver. Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Reported-by: Nkbuild test robot <lkp@intel.com> Reported-by: NThomas Meyer <thomas@m3y3r.de> Reported-by: NDave Jiang <dave.jiang@intel.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 15 5月, 2018 1 次提交
-
-
由 Dan Williams 提交于
Machine check safe memory copies are currently deployed in the pmem driver whenever reading from persistent memory media, so that -EIO is returned rather than triggering a kernel panic. While this protects most pmem accesses, it is not complete in the filesystem-dax case. When filesystem-dax is enabled reads may bypass the block layer and the driver via dax_iomap_actor() and its usage of copy_to_iter(). In preparation for creating a copy_to_iter() variant that can handle machine checks, teach memcpy_mcsafe() to return the number of bytes remaining rather than -EFAULT when an exception occurs. Co-developed-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: hch@lst.de Cc: linux-fsdevel@vger.kernel.org Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/152539238119.31796.14318473522414462886.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-