1. 15 8月, 2020 1 次提交
  2. 01 7月, 2020 1 次提交
  3. 09 6月, 2020 1 次提交
  4. 27 5月, 2020 1 次提交
  5. 14 5月, 2020 1 次提交
  6. 03 4月, 2020 3 次提交
  7. 28 3月, 2020 1 次提交
    • C
      block: simplify queue allocation · 3d745ea5
      Christoph Hellwig 提交于
      Current make_request based drivers use either blk_alloc_queue_node or
      blk_alloc_queue to allocate a queue, and then set up the make_request_fn
      function pointer and a few parameters using the blk_queue_make_request
      helper.  Simplify this by passing the make_request pointer to
      blk_alloc_queue, and while at it merge the _node variant into the main
      helper by always passing a node_id, and remove the superfluous gfp_mask
      parameter.  A lower-level __blk_alloc_queue is kept for the blk-mq case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3d745ea5
  8. 01 2月, 2020 1 次提交
    • D
      mm: Cleanup __put_devmap_managed_page() vs ->page_free() · 429589d6
      Dan Williams 提交于
      After the removal of the device-public infrastructure there are only 2
      ->page_free() call backs in the kernel.  One of those is a
      device-private callback in the nouveau driver, the other is a generic
      wakeup needed in the DAX case.  In the hopes that all ->page_free()
      callbacks can be migrated to common core kernel functionality, move the
      device-private specific actions in __put_devmap_managed_page() under the
      is_device_private_page() conditional, including the ->page_free()
      callback.  For the other page types just open-code the generic wakeup.
      
      Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
      does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
      case.
      
      Link: http://lkml.kernel.org/r/20200107224558.2362728-4-jhubbard@nvidia.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJérôme Glisse <jglisse@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Leon Romanovsky <leonro@mellanox.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      429589d6
  9. 18 11月, 2019 1 次提交
  10. 15 11月, 2019 1 次提交
  11. 06 9月, 2019 1 次提交
  12. 19 7月, 2019 1 次提交
    • D
      driver-core, libnvdimm: Let device subsystems add local lockdep coverage · 87a30e1f
      Dan Williams 提交于
      For good reason, the standard device_lock() is marked
      lockdep_set_novalidate_class() because there is simply no sane way to
      describe the myriad ways the device_lock() ordered with other locks.
      However, that leaves subsystems that know their own local device_lock()
      ordering rules to find lock ordering mistakes manually. Instead,
      introduce an optional / additional lockdep-enabled lock that a subsystem
      can acquire in all the same paths that the device_lock() is acquired.
      
      A conversion of the NFIT driver and NVDIMM subsystem to a
      lockdep-validate device_lock() scheme is included. The
      debug_nvdimm_lock() implementation implements the correct lock-class and
      stacking order for the libnvdimm device topology hierarchy.
      
      Yes, this is a hack, but hopefully it is a useful hack for other
      subsystems device_lock() debug sessions. Quoting Greg:
      
          "Yeah, it feels a bit hacky but it's really up to a subsystem to mess up
           using it as much as anything else, so user beware :)
      
           I don't object to it if it makes things easier for you to debug."
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Link: https://lore.kernel.org/r/156341210661.292348.7014034644265455704.stgit@dwillia2-desk3.amr.corp.intel.com
      87a30e1f
  13. 06 7月, 2019 2 次提交
  14. 03 7月, 2019 5 次提交
  15. 14 6月, 2019 1 次提交
  16. 05 6月, 2019 1 次提交
  17. 21 5月, 2019 2 次提交
    • D
      libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead · 52f476a3
      Dan Williams 提交于
      Jeff discovered that performance improves from ~375K iops to ~519K iops
      on a simple psync-write fio workload when moving the location of 'struct
      page' from the default PMEM location to DRAM. This result is surprising
      because the expectation is that 'struct page' for dax is only needed for
      third party references to dax mappings. For example, a dax-mapped buffer
      passed to another system call for direct-I/O requires 'struct page' for
      sending the request down the driver stack and pinning the page. There is
      no usage of 'struct page' for first party access to a file via
      read(2)/write(2) and friends.
      
      However, this "no page needed" expectation is violated by
      CONFIG_HARDENED_USERCOPY and the check_copy_size() performed in
      copy_from_iter_full_nocache() and copy_to_iter_mcsafe(). The
      check_heap_object() helper routine assumes the buffer is backed by a
      slab allocator (DRAM) page and applies some checks.  Those checks are
      invalid, dax pages do not originate from the slab, and redundant,
      dax_iomap_actor() has already validated that the I/O is within bounds.
      Specifically that routine validates that the logical file offset is
      within bounds of the file, then it does a sector-to-pfn translation
      which validates that the physical mapping is within bounds of the block
      device.
      
      Bypass additional hardened usercopy overhead and call the 'no check'
      versions of the copy_{to,from}_iter operations directly.
      
      Fixes: 0aed55af ("x86, uaccess: introduce copy_from_iter_flushcache...")
      Cc: <stable@vger.kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Reported-and-tested-by: NJeff Smits <jeff.smits@intel.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      52f476a3
    • D
      dax: Arrange for dax_supported check to span multiple devices · 7bf7eac8
      Dan Williams 提交于
      Pankaj reports that starting with commit ad428cdb "dax: Check the
      end of the block-device capacity with dax_direct_access()" device-mapper
      no longer allows dax operation. This results from the stricter checks in
      __bdev_dax_supported() that validate that the start and end of a
      block-device map to the same 'pagemap' instance.
      
      Teach the dax-core and device-mapper to validate the 'pagemap' on a
      per-target basis. This is accomplished by refactoring the
      bdev_dax_supported() internals into generic_fsdax_supported() which
      takes a sector range to validate. Consequently generic_fsdax_supported()
      is suitable to be used in a device-mapper ->iterate_devices() callback.
      A new ->dax_supported() operation is added to allow composite devices to
      split and route upper-level bdev_dax_supported() requests.
      
      Fixes: ad428cdb ("dax: Check the end of the block-device...")
      Cc: <stable@vger.kernel.org>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reported-by: NPankaj Gupta <pagupta@redhat.com>
      Reviewed-by: NPankaj Gupta <pagupta@redhat.com>
      Tested-by: NPankaj Gupta <pagupta@redhat.com>
      Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      7bf7eac8
  18. 08 4月, 2019 1 次提交
  19. 29 12月, 2018 1 次提交
    • D
      mm, devm_memremap_pages: fix shutdown handling · a95c90f1
      Dan Williams 提交于
      The last step before devm_memremap_pages() returns success is to allocate
      a release action, devm_memremap_pages_release(), to tear the entire setup
      down.  However, the result from devm_add_action() is not checked.
      
      Checking the error from devm_add_action() is not enough.  The api
      currently relies on the fact that the percpu_ref it is using is killed by
      the time the devm_memremap_pages_release() is run.  Rather than continue
      this awkward situation, offload the responsibility of killing the
      percpu_ref to devm_memremap_pages_release() directly.  This allows
      devm_memremap_pages() to do the right thing relative to init failures and
      shutdown.
      
      Without this change we could fail to register the teardown of
      devm_memremap_pages().  The likelihood of hitting this failure is tiny as
      small memory allocations almost always succeed.  However, the impact of
      the failure is large given any future reconfiguration, or disable/enable,
      of an nvdimm namespace will fail forever as subsequent calls to
      devm_memremap_pages() will fail to setup the pgmap_radix since there will
      be stale entries for the physical address range.
      
      An argument could be made to require that the ->kill() operation be set in
      the @pgmap arg rather than passed in separately.  However, it helps code
      readability, tracking the lifetime of a given instance, to be able to grep
      the kill routine directly at the devm_memremap_pages() call site.
      
      Link: http://lkml.kernel.org/r/154275558526.76910.7535251937849268605.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Fixes: e8d51348 ("memremap: change devm_memremap_pages interface...")
      Reviewed-by: N"Jérôme Glisse" <jglisse@redhat.com>
      Reported-by: NLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a95c90f1
  20. 16 11月, 2018 1 次提交
  21. 09 10月, 2018 1 次提交
  22. 28 9月, 2018 1 次提交
  23. 21 8月, 2018 1 次提交
  24. 31 7月, 2018 1 次提交
  25. 18 7月, 2018 1 次提交
    • T
      block: make bdev_ops->rw_page() take a REQ_OP instead of bool · 3f289dcb
      Tejun Heo 提交于
      c11f0c0b ("block/mm: make bdev_ops->rw_page() take a bool for
      read/write") replaced @OP with boolean @is_write, which limited the
      amount of information going into ->rw_page() and more importantly
      page_endio(), which removed the need to expose block internals to mm.
      
      Unfortunately, we want to track discards separately and @is_write
      isn't enough information.  This patch updates bdev_ops->rw_page() to
      take REQ_OP instead but leaves page_endio() to take bool @is_write.
      This allows the block part of operations to have enough information
      while not leaking it to mm.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3f289dcb
  26. 29 6月, 2018 1 次提交
  27. 07 6月, 2018 2 次提交
    • R
      libnvdimm, pmem: Unconditionally deep flush on *sync · ce7f11a2
      Ross Zwisler 提交于
      Prior to this commit we would only do a "deep flush" (have nvdimm_flush()
      write to each of the flush hints for a region) in response to an
      msync/fsync/sync call if the nvdimm_has_cache() returned true at the time
      we were setting up the request queue.  This happens due to the write cache
      value passed in to blk_queue_write_cache(), which then causes the block
      layer to send down BIOs with REQ_FUA and REQ_PREFLUSH set.  We do have a
      "write_cache" sysfs entry for namespaces, i.e.:
      
        /sys/bus/nd/devices/pfn0.1/block/pmem0/dax/write_cache
      
      which can be used to control whether or not the kernel thinks a given
      namespace has a write cache, but this didn't modify the deep flush behavior
      that we set up when the driver was initialized.  Instead, it only modified
      whether or not DAX would flush CPU caches via dax_flush() in response to
      *sync calls.
      
      Simplify this by making the *sync deep flush always happen, regardless of
      the write cache setting of a namespace.  The DAX CPU cache flushing will
      still be controlled the write_cache setting of the namespace.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 5fdf8e5b ("libnvdimm: re-enable deep flush for pmem devices via fsync()")
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ce7f11a2
    • R
      libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH · d2d6364d
      Ross Zwisler 提交于
      Complete the move from REQ_FLUSH to REQ_PREFLUSH that apparently started
      way back in v4.8.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d2d6364d
  28. 23 5月, 2018 2 次提交
  29. 22 5月, 2018 1 次提交
    • D
      mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS · e7638488
      Dan Williams 提交于
      In preparation for fixing dax-dma-vs-unmap issues, filesystems need to
      be able to rely on the fact that they will get wakeups on dev_pagemap
      page-idle events. Introduce MEMORY_DEVICE_FS_DAX and
      generic_dax_page_free() as common indicator / infrastructure for dax
      filesytems to require. With this change there are no users of the
      MEMORY_DEVICE_HOST designation, so remove it.
      
      The HMM sub-system extended dev_pagemap to arrange a callback when a
      dev_pagemap managed page is freed. Since a dev_pagemap page is free /
      idle when its reference count is 1 it requires an additional branch to
      check the page-type at put_page() time. Given put_page() is a hot-path
      we do not want to incur that check if HMM is not in use, so a static
      branch is used to avoid that overhead when not necessary.
      
      Now, the FS_DAX implementation wants to reuse this mechanism for
      receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific
      static-key into a generic mechanism that either HMM or FS_DAX code paths
      can enable.
      
      For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support,
      care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure.
      However, we still need to support FS_DAX in the FS_DAX_LIMITED case
      implemented by the s390/dcssblk driver.
      
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Reported-by: NThomas Meyer <thomas@m3y3r.de>
      Reported-by: NDave Jiang <dave.jiang@intel.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e7638488
  30. 15 5月, 2018 1 次提交
    • D
      x86/asm/memcpy_mcsafe: Return bytes remaining · 60622d68
      Dan Williams 提交于
      Machine check safe memory copies are currently deployed in the pmem
      driver whenever reading from persistent memory media, so that -EIO is
      returned rather than triggering a kernel panic. While this protects most
      pmem accesses, it is not complete in the filesystem-dax case. When
      filesystem-dax is enabled reads may bypass the block layer and the
      driver via dax_iomap_actor() and its usage of copy_to_iter().
      
      In preparation for creating a copy_to_iter() variant that can handle
      machine checks, teach memcpy_mcsafe() to return the number of bytes
      remaining rather than -EFAULT when an exception occurs.
      Co-developed-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: hch@lst.de
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/152539238119.31796.14318473522414462886.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      60622d68