1. 14 10月, 2020 2 次提交
    • D
      mm/memremap_pages: support multiple ranges per invocation · b7b3c01b
      Dan Williams 提交于
      In support of device-dax growing the ability to front physically
      dis-contiguous ranges of memory, update devm_memremap_pages() to track
      multiple ranges with a single reference counter and devm instance.
      
      Convert all [devm_]memremap_pages() users to specify the number of ranges
      they are mapping in their 'struct dev_pagemap' instance.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: "Jérôme Glisse" <jglisse@redhat.co
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hulk Robot <hulkci@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7b3c01b
    • D
      mm/memremap_pages: convert to 'struct range' · a4574f63
      Dan Williams 提交于
      The 'struct resource' in 'struct dev_pagemap' is only used for holding
      resource span information.  The other fields, 'name', 'flags', 'desc',
      'parent', 'sibling', and 'child' are all unused wasted space.
      
      This is in preparation for introducing a multi-range extension of
      devm_memremap_pages().
      
      The bulk of this change is unwinding all the places internal to libnvdimm
      that used 'struct resource' unnecessarily, and replacing instances of
      'struct dev_pagemap'.res with 'struct dev_pagemap'.range.
      
      P2PDMA had a minor usage of the resource flags field, but only to report
      failures with "%pR".  That is replaced with an open coded print of the
      range.
      
      [dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
        Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadamSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	[xen]
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hulk Robot <hulkci@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a4574f63
  2. 06 10月, 2020 1 次提交
    • D
      x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() · ec6347bb
      Dan Williams 提交于
      In reaction to a proposal to introduce a memcpy_mcsafe_fast()
      implementation Linus points out that memcpy_mcsafe() is poorly named
      relative to communicating the scope of the interface. Specifically what
      addresses are valid to pass as source, destination, and what faults /
      exceptions are handled.
      
      Of particular concern is that even though x86 might be able to handle
      the semantics of copy_mc_to_user() with its common copy_user_generic()
      implementation other archs likely need / want an explicit path for this
      case:
      
        On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
        >
        > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
        > >
        > > However now I see that copy_user_generic() works for the wrong reason.
        > > It works because the exception on the source address due to poison
        > > looks no different than a write fault on the user address to the
        > > caller, it's still just a short copy. So it makes copy_to_user() work
        > > for the wrong reason relative to the name.
        >
        > Right.
        >
        > And it won't work that way on other architectures. On x86, we have a
        > generic function that can take faults on either side, and we use it
        > for both cases (and for the "in_user" case too), but that's an
        > artifact of the architecture oddity.
        >
        > In fact, it's probably wrong even on x86 - because it can hide bugs -
        > but writing those things is painful enough that everybody prefers
        > having just one function.
      
      Replace a single top-level memcpy_mcsafe() with either
      copy_mc_to_user(), or copy_mc_to_kernel().
      
      Introduce an x86 copy_mc_fragile() name as the rename for the
      low-level x86 implementation formerly named memcpy_mcsafe(). It is used
      as the slow / careful backend that is supplanted by a fast
      copy_mc_generic() in a follow-on patch.
      
      One side-effect of this reorganization is that separating copy_mc_64.S
      to its own file means that perf no longer needs to track dependencies
      for its memcpy_64.S benchmarks.
      
       [ bp: Massage a bit. ]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: <stable@vger.kernel.org>
      Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
      Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
      ec6347bb
  3. 25 9月, 2020 1 次提交
  4. 02 9月, 2020 1 次提交
  5. 15 8月, 2020 1 次提交
  6. 01 7月, 2020 1 次提交
  7. 09 6月, 2020 1 次提交
  8. 27 5月, 2020 1 次提交
  9. 14 5月, 2020 1 次提交
  10. 03 4月, 2020 3 次提交
  11. 28 3月, 2020 1 次提交
    • C
      block: simplify queue allocation · 3d745ea5
      Christoph Hellwig 提交于
      Current make_request based drivers use either blk_alloc_queue_node or
      blk_alloc_queue to allocate a queue, and then set up the make_request_fn
      function pointer and a few parameters using the blk_queue_make_request
      helper.  Simplify this by passing the make_request pointer to
      blk_alloc_queue, and while at it merge the _node variant into the main
      helper by always passing a node_id, and remove the superfluous gfp_mask
      parameter.  A lower-level __blk_alloc_queue is kept for the blk-mq case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3d745ea5
  12. 01 2月, 2020 1 次提交
    • D
      mm: Cleanup __put_devmap_managed_page() vs ->page_free() · 429589d6
      Dan Williams 提交于
      After the removal of the device-public infrastructure there are only 2
      ->page_free() call backs in the kernel.  One of those is a
      device-private callback in the nouveau driver, the other is a generic
      wakeup needed in the DAX case.  In the hopes that all ->page_free()
      callbacks can be migrated to common core kernel functionality, move the
      device-private specific actions in __put_devmap_managed_page() under the
      is_device_private_page() conditional, including the ->page_free()
      callback.  For the other page types just open-code the generic wakeup.
      
      Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
      does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
      case.
      
      Link: http://lkml.kernel.org/r/20200107224558.2362728-4-jhubbard@nvidia.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJérôme Glisse <jglisse@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Leon Romanovsky <leonro@mellanox.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      429589d6
  13. 18 11月, 2019 1 次提交
  14. 15 11月, 2019 1 次提交
  15. 06 9月, 2019 1 次提交
  16. 19 7月, 2019 1 次提交
    • D
      driver-core, libnvdimm: Let device subsystems add local lockdep coverage · 87a30e1f
      Dan Williams 提交于
      For good reason, the standard device_lock() is marked
      lockdep_set_novalidate_class() because there is simply no sane way to
      describe the myriad ways the device_lock() ordered with other locks.
      However, that leaves subsystems that know their own local device_lock()
      ordering rules to find lock ordering mistakes manually. Instead,
      introduce an optional / additional lockdep-enabled lock that a subsystem
      can acquire in all the same paths that the device_lock() is acquired.
      
      A conversion of the NFIT driver and NVDIMM subsystem to a
      lockdep-validate device_lock() scheme is included. The
      debug_nvdimm_lock() implementation implements the correct lock-class and
      stacking order for the libnvdimm device topology hierarchy.
      
      Yes, this is a hack, but hopefully it is a useful hack for other
      subsystems device_lock() debug sessions. Quoting Greg:
      
          "Yeah, it feels a bit hacky but it's really up to a subsystem to mess up
           using it as much as anything else, so user beware :)
      
           I don't object to it if it makes things easier for you to debug."
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Link: https://lore.kernel.org/r/156341210661.292348.7014034644265455704.stgit@dwillia2-desk3.amr.corp.intel.com
      87a30e1f
  17. 06 7月, 2019 2 次提交
  18. 03 7月, 2019 5 次提交
  19. 14 6月, 2019 1 次提交
  20. 05 6月, 2019 1 次提交
  21. 21 5月, 2019 2 次提交
    • D
      libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead · 52f476a3
      Dan Williams 提交于
      Jeff discovered that performance improves from ~375K iops to ~519K iops
      on a simple psync-write fio workload when moving the location of 'struct
      page' from the default PMEM location to DRAM. This result is surprising
      because the expectation is that 'struct page' for dax is only needed for
      third party references to dax mappings. For example, a dax-mapped buffer
      passed to another system call for direct-I/O requires 'struct page' for
      sending the request down the driver stack and pinning the page. There is
      no usage of 'struct page' for first party access to a file via
      read(2)/write(2) and friends.
      
      However, this "no page needed" expectation is violated by
      CONFIG_HARDENED_USERCOPY and the check_copy_size() performed in
      copy_from_iter_full_nocache() and copy_to_iter_mcsafe(). The
      check_heap_object() helper routine assumes the buffer is backed by a
      slab allocator (DRAM) page and applies some checks.  Those checks are
      invalid, dax pages do not originate from the slab, and redundant,
      dax_iomap_actor() has already validated that the I/O is within bounds.
      Specifically that routine validates that the logical file offset is
      within bounds of the file, then it does a sector-to-pfn translation
      which validates that the physical mapping is within bounds of the block
      device.
      
      Bypass additional hardened usercopy overhead and call the 'no check'
      versions of the copy_{to,from}_iter operations directly.
      
      Fixes: 0aed55af ("x86, uaccess: introduce copy_from_iter_flushcache...")
      Cc: <stable@vger.kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Reported-and-tested-by: NJeff Smits <jeff.smits@intel.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      52f476a3
    • D
      dax: Arrange for dax_supported check to span multiple devices · 7bf7eac8
      Dan Williams 提交于
      Pankaj reports that starting with commit ad428cdb "dax: Check the
      end of the block-device capacity with dax_direct_access()" device-mapper
      no longer allows dax operation. This results from the stricter checks in
      __bdev_dax_supported() that validate that the start and end of a
      block-device map to the same 'pagemap' instance.
      
      Teach the dax-core and device-mapper to validate the 'pagemap' on a
      per-target basis. This is accomplished by refactoring the
      bdev_dax_supported() internals into generic_fsdax_supported() which
      takes a sector range to validate. Consequently generic_fsdax_supported()
      is suitable to be used in a device-mapper ->iterate_devices() callback.
      A new ->dax_supported() operation is added to allow composite devices to
      split and route upper-level bdev_dax_supported() requests.
      
      Fixes: ad428cdb ("dax: Check the end of the block-device...")
      Cc: <stable@vger.kernel.org>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reported-by: NPankaj Gupta <pagupta@redhat.com>
      Reviewed-by: NPankaj Gupta <pagupta@redhat.com>
      Tested-by: NPankaj Gupta <pagupta@redhat.com>
      Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      7bf7eac8
  22. 08 4月, 2019 1 次提交
  23. 29 12月, 2018 1 次提交
    • D
      mm, devm_memremap_pages: fix shutdown handling · a95c90f1
      Dan Williams 提交于
      The last step before devm_memremap_pages() returns success is to allocate
      a release action, devm_memremap_pages_release(), to tear the entire setup
      down.  However, the result from devm_add_action() is not checked.
      
      Checking the error from devm_add_action() is not enough.  The api
      currently relies on the fact that the percpu_ref it is using is killed by
      the time the devm_memremap_pages_release() is run.  Rather than continue
      this awkward situation, offload the responsibility of killing the
      percpu_ref to devm_memremap_pages_release() directly.  This allows
      devm_memremap_pages() to do the right thing relative to init failures and
      shutdown.
      
      Without this change we could fail to register the teardown of
      devm_memremap_pages().  The likelihood of hitting this failure is tiny as
      small memory allocations almost always succeed.  However, the impact of
      the failure is large given any future reconfiguration, or disable/enable,
      of an nvdimm namespace will fail forever as subsequent calls to
      devm_memremap_pages() will fail to setup the pgmap_radix since there will
      be stale entries for the physical address range.
      
      An argument could be made to require that the ->kill() operation be set in
      the @pgmap arg rather than passed in separately.  However, it helps code
      readability, tracking the lifetime of a given instance, to be able to grep
      the kill routine directly at the devm_memremap_pages() call site.
      
      Link: http://lkml.kernel.org/r/154275558526.76910.7535251937849268605.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Fixes: e8d51348 ("memremap: change devm_memremap_pages interface...")
      Reviewed-by: N"Jérôme Glisse" <jglisse@redhat.com>
      Reported-by: NLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a95c90f1
  24. 16 11月, 2018 1 次提交
  25. 09 10月, 2018 1 次提交
  26. 28 9月, 2018 1 次提交
  27. 21 8月, 2018 1 次提交
  28. 31 7月, 2018 1 次提交
  29. 18 7月, 2018 1 次提交
    • T
      block: make bdev_ops->rw_page() take a REQ_OP instead of bool · 3f289dcb
      Tejun Heo 提交于
      c11f0c0b ("block/mm: make bdev_ops->rw_page() take a bool for
      read/write") replaced @OP with boolean @is_write, which limited the
      amount of information going into ->rw_page() and more importantly
      page_endio(), which removed the need to expose block internals to mm.
      
      Unfortunately, we want to track discards separately and @is_write
      isn't enough information.  This patch updates bdev_ops->rw_page() to
      take REQ_OP instead but leaves page_endio() to take bool @is_write.
      This allows the block part of operations to have enough information
      while not leaking it to mm.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3f289dcb
  30. 29 6月, 2018 1 次提交
  31. 07 6月, 2018 1 次提交
    • R
      libnvdimm, pmem: Unconditionally deep flush on *sync · ce7f11a2
      Ross Zwisler 提交于
      Prior to this commit we would only do a "deep flush" (have nvdimm_flush()
      write to each of the flush hints for a region) in response to an
      msync/fsync/sync call if the nvdimm_has_cache() returned true at the time
      we were setting up the request queue.  This happens due to the write cache
      value passed in to blk_queue_write_cache(), which then causes the block
      layer to send down BIOs with REQ_FUA and REQ_PREFLUSH set.  We do have a
      "write_cache" sysfs entry for namespaces, i.e.:
      
        /sys/bus/nd/devices/pfn0.1/block/pmem0/dax/write_cache
      
      which can be used to control whether or not the kernel thinks a given
      namespace has a write cache, but this didn't modify the deep flush behavior
      that we set up when the driver was initialized.  Instead, it only modified
      whether or not DAX would flush CPU caches via dax_flush() in response to
      *sync calls.
      
      Simplify this by making the *sync deep flush always happen, regardless of
      the write cache setting of a namespace.  The DAX CPU cache flushing will
      still be controlled the write_cache setting of the namespace.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 5fdf8e5b ("libnvdimm: re-enable deep flush for pmem devices via fsync()")
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ce7f11a2