1. 14 10月, 2020 5 次提交
    • D
      mm/memremap_pages: support multiple ranges per invocation · b7b3c01b
      Dan Williams 提交于
      In support of device-dax growing the ability to front physically
      dis-contiguous ranges of memory, update devm_memremap_pages() to track
      multiple ranges with a single reference counter and devm instance.
      
      Convert all [devm_]memremap_pages() users to specify the number of ranges
      they are mapping in their 'struct dev_pagemap' instance.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: "Jérôme Glisse" <jglisse@redhat.co
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hulk Robot <hulkci@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7b3c01b
    • D
      mm/memremap_pages: convert to 'struct range' · a4574f63
      Dan Williams 提交于
      The 'struct resource' in 'struct dev_pagemap' is only used for holding
      resource span information.  The other fields, 'name', 'flags', 'desc',
      'parent', 'sibling', and 'child' are all unused wasted space.
      
      This is in preparation for introducing a multi-range extension of
      devm_memremap_pages().
      
      The bulk of this change is unwinding all the places internal to libnvdimm
      that used 'struct resource' unnecessarily, and replacing instances of
      'struct dev_pagemap'.res with 'struct dev_pagemap'.range.
      
      P2PDMA had a minor usage of the resource flags field, but only to report
      failures with "%pR".  That is replaced with an open coded print of the
      range.
      
      [dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
        Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadamSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	[xen]
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hulk Robot <hulkci@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a4574f63
    • D
      device-dax: introduce 'struct dev_dax' typed-driver operations · f11cf813
      Dan Williams 提交于
      In preparation for introducing seed devices the dax-bus core needs to be
      able to intercept ->probe() and ->remove() operations.  Towards that end
      arrange for the bus and drivers to switch from raw 'struct device' driver
      operations to 'struct dev_dax' typed operations.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lkml.kernel.org/r/160106113357.30709.4541750544799737855.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f11cf813
    • D
      device-dax: make pgmap optional for instance creation · f5516ec5
      Dan Williams 提交于
      The passed in dev_pagemap is only required in the pmem case as the
      libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to
      place the memmap in pmem directly.  In the hmem case there is no agent
      reserving an altmap so it can all be handled by a core internal default.
      
      Pass the resource range via a new @range property of 'struct
      dev_dax_data'.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hulk Robot <hulkci@huawei.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lkml.kernel.org/r/159643099958.4062302.10379230791041872886.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: https://lkml.kernel.org/r/160106110513.30709.4303239334850606031.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f5516ec5
    • D
      device-dax: drop the dax_region.pfn_flags attribute · ec826909
      Dan Williams 提交于
      All callers specify the same flags to alloc_dax_region(), so there is no
      need to allow for anything other than PFN_DEV|PFN_MAP, or carry a
      ->pfn_flags around on the region.  Device-dax instances are always page
      backed.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Hulk Robot <hulkci@huawei.com>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Link: https://lkml.kernel.org/r/159643098829.4062302.13611520567669439046.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec826909
  2. 04 9月, 2020 1 次提交
  3. 03 6月, 2020 1 次提交
    • J
      vfs: track per-sb writeback errors and report them to syncfs · 735e4ae5
      Jeff Layton 提交于
      Patch series "vfs: have syncfs() return error when there are writeback
      errors", v6.
      
      Currently, syncfs does not return errors when one of the inodes fails to
      be written back.  It will return errors based on the legacy AS_EIO and
      AS_ENOSPC flags when syncing out the block device fails, but that's not
      particularly helpful for filesystems that aren't backed by a blockdev.
      It's also possible for a stray sync to lose those errors.
      
      The basic idea in this set is to track writeback errors at the
      superblock level, so that we can quickly and easily check whether
      something bad happened without having to fsync each file individually.
      syncfs is then changed to reliably report writeback errors after they
      occur, much in the same fashion as fsync does now.
      
      This patch (of 2):
      
      Usually we suggest that applications call fsync when they want to ensure
      that all data written to the file has made it to the backing store, but
      that can be inefficient when there are a lot of open files.
      
      Calling syncfs on the filesystem can be more efficient in some
      situations, but the error reporting doesn't currently work the way most
      people expect.  If a single inode on a filesystem reports a writeback
      error, syncfs won't necessarily return an error.  syncfs only returns an
      error if __sync_blockdev fails, and on some filesystems that's a no-op.
      
      It would be better if syncfs reported an error if there were any
      writeback failures.  Then applications could call syncfs to see if there
      are any errors on any open files, and could then call fsync on all of
      the other descriptors to figure out which one failed.
      
      This patch adds a new errseq_t to struct super_block, and has
      mapping_set_error also record writeback errors there.
      
      To report those errors, we also need to keep an errseq_t in struct file
      to act as a cursor.  This patch adds a dedicated field for that purpose,
      which slots nicely into 4 bytes of padding at the end of struct file on
      x86_64.
      
      An earlier version of this patch used an O_PATH file descriptor to cue
      the kernel that the open file should track the superblock error and not
      the inode's writeback error.
      
      I think that API is just too weird though.  This is simpler and should
      make syncfs error reporting "just work" even if someone is multiplexing
      fsync and syncfs on the same fds.
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Andres Freund <andres@anarazel.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: David Howells <dhowells@redhat.com>
      Link: http://lkml.kernel.org/r/20200428135155.19223-1-jlayton@kernel.org
      Link: http://lkml.kernel.org/r/20200428135155.19223-2-jlayton@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      735e4ae5
  4. 03 7月, 2019 4 次提交
  5. 14 6月, 2019 1 次提交
  6. 15 5月, 2019 1 次提交
  7. 07 1月, 2019 8 次提交
    • D
      device-dax: Add /sys/class/dax backwards compatibility · 730926c3
      Dan Williams 提交于
      On the expectation that some environments may not upgrade libdaxctl
      (userspace component that depends on the /sys/class/dax hierarchy),
      provide a default / legacy dax_pmem_compat driver. The dax_pmem_compat
      driver implements the original /sys/class/dax sysfs layout rather than
      /sys/bus/dax. When userspace is upgraded it can blacklist this module
      and switch to the dax_pmem driver going forward.
      
      CONFIG_DEV_DAX_PMEM_COMPAT and supporting code will be deleted according
      to the dax_pmem entry in Documentation/ABI/obsolete/.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      730926c3
    • D
      device-dax: Add support for a dax override driver · d200781e
      Dan Williams 提交于
      Introduce the 'new_id' concept for enabling a custom device-driver attach
      policy for dax-bus drivers. The intended use is to have a mechanism for
      hot-plugging device-dax ranges into the page allocator on-demand. With
      this in place the default policy of using device-dax for performance
      differentiated memory can be overridden by user-space policy that can
      arrange for the memory range to be managed as 'System RAM' with
      user-defined NUMA and other performance attributes.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d200781e
    • D
      device-dax: Move resource pinning+mapping into the common driver · 89ec9f2c
      Dan Williams 提交于
      Move the responsibility of calling devm_request_resource() and
      devm_memremap_pages() into the common device-dax driver. This is another
      preparatory step to allowing an alternate personality driver for a
      device-dax range.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      89ec9f2c
    • D
      device-dax: Introduce bus + driver model · 9567da0b
      Dan Williams 提交于
      In support of multiple device-dax instances per device-dax-region and
      allowing the 'kmem' driver to attach to dax-instances instead of the
      current device-node access, convert the dax sub-system from a class to a
      bus. Recall that the kmem driver takes reserved / special purpose
      memories and assigns them to be managed by the core-mm.
      
      Aside from the fact the device-dax instances are registered and probed
      on a bus, two other lifetime-management changes are made:
      
      1/ Delay attaching a cdev until driver probe time
      
      2/ A new run_dax() helper is introduced to allow restoring dax-operation
         after a kill_dax() event. So, at driver ->probe() time we run_dax()
         and at ->remove() time we kill_dax() and invalidate all mappings.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9567da0b
    • D
      device-dax: Start defining a dax bus model · 51cf784c
      Dan Williams 提交于
      Towards eliminating the dax_class, move the dax-device-attribute
      enabling to a new bus.c file in the core. The amount of code
      thrash of sub-sequent patches is reduced as no logic changes are made,
      just pure code movement.
      
      A temporary export of unregister_dex_dax() and dax_attribute_groups is
      needed to preserve compilation, but those symbols become static again in
      a follow-on patch.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      51cf784c
    • D
      device-dax: Remove multi-resource infrastructure · 753a0850
      Dan Williams 提交于
      The multi-resource implementation anticipated discontiguous sub-division
      support. That has not yet materialized, delete the infrastructure and
      related code.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      753a0850
    • D
      device-dax: Kill dax_region base · 93694f96
      Dan Williams 提交于
      Nothing consumes this attribute of a region and devres otherwise
      remembers the value for de-allocation purposes.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      93694f96
    • D
      device-dax: Kill dax_region ida · 21b9e979
      Dan Williams 提交于
      Commit bbb3be17 "device-dax: fix sysfs duplicate warnings" arranged
      for passing a dax instance-id to devm_create_dax_dev(), rather than
      generating one internally. Remove the dax_region ida and related code.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      21b9e979
  8. 23 9月, 2018 1 次提交
  9. 05 9月, 2018 1 次提交
  10. 18 8月, 2018 1 次提交
  11. 21 7月, 2018 3 次提交
  12. 29 6月, 2018 1 次提交
  13. 07 6月, 2018 1 次提交
    • K
      treewide: Use struct_size() for kmalloc()-family · acafe7e3
      Kees Cook 提交于
      One of the more common cases of allocation size calculations is finding
      the size of a structure that has a zero-sized array at the end, along
      with memory for some number of elements for that array. For example:
      
      struct foo {
          int stuff;
          void *entry[];
      };
      
      instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);
      
      Instead of leaving these open-coded and prone to type mistakes, we can
      now use the new struct_size() helper:
      
      instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);
      
      This patch makes the changes for kmalloc()-family (and kvmalloc()-family)
      uses. It was done via automatic conversion with manual review for the
      "CHECKME" non-standard cases noted below, using the following Coccinelle
      script:
      
      // pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
      //                      sizeof *pkey_cache->table, GFP_KERNEL);
      @@
      identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
      expression GFP;
      identifier VAR, ELEMENT;
      expression COUNT;
      @@
      
      - alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP)
      + alloc(struct_size(VAR, ELEMENT, COUNT), GFP)
      
      // mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL);
      @@
      identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
      expression GFP;
      identifier VAR, ELEMENT;
      expression COUNT;
      @@
      
      - alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP)
      + alloc(struct_size(VAR, ELEMENT, COUNT), GFP)
      
      // Same pattern, but can't trivially locate the trailing element name,
      // or variable name.
      @@
      identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
      expression GFP;
      expression SOMETHING, COUNT, ELEMENT;
      @@
      
      - alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP)
      + alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)
      Signed-off-by: NKees Cook <keescook@chromium.org>
      acafe7e3
  14. 20 4月, 2018 1 次提交
  15. 06 4月, 2018 1 次提交
  16. 07 3月, 2018 1 次提交
  17. 24 1月, 2018 1 次提交
  18. 30 11月, 2017 1 次提交
  19. 20 10月, 2017 1 次提交
  20. 19 7月, 2017 1 次提交
    • D
      device-dax: fix sysfs duplicate warnings · bbb3be17
      Dan Williams 提交于
      Fix warnings of the form...
      
           WARNING: CPU: 10 PID: 4983 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
           sysfs: cannot create duplicate filename '/class/dax/dax12.0'
           Call Trace:
            dump_stack+0x63/0x86
            __warn+0xcb/0xf0
            warn_slowpath_fmt+0x5a/0x80
            ? kernfs_path_from_node+0x4f/0x60
            sysfs_warn_dup+0x62/0x80
            sysfs_do_create_link_sd.isra.2+0x97/0xb0
            sysfs_create_link+0x25/0x40
            device_add+0x266/0x630
            devm_create_dax_dev+0x2cf/0x340 [dax]
            dax_pmem_probe+0x1f5/0x26e [dax_pmem]
            nvdimm_bus_probe+0x71/0x120
      
      ...by reusing the namespace id for the device-dax instance name.
      
      Now that we have decided that there will never by more than one
      device-dax instance per libnvdimm-namespace parent device [1], we can
      directly reuse the namepace ids. There are some possible follow-on
      cleanups, but those are saved for a later patch to simplify the -stable
      backport.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2016-December/008266.html
      
      Fixes: 98a29c39 ("libnvdimm, namespace: allow creation of multiple pmem...")
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: <stable@vger.kernel.org>
      Reported-by: NDariusz Dokupil <dariusz.dokupil@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      bbb3be17
  21. 18 7月, 2017 1 次提交
  22. 06 7月, 2017 1 次提交
    • J
      fs: new infrastructure for writeback error handling and reporting · 5660e13d
      Jeff Layton 提交于
      Most filesystems currently use mapping_set_error and
      filemap_check_errors for setting and reporting/clearing writeback errors
      at the mapping level. filemap_check_errors is indirectly called from
      most of the filemap_fdatawait_* functions and from
      filemap_write_and_wait*. These functions are called from all sorts of
      contexts to wait on writeback to finish -- e.g. mostly in fsync, but
      also in truncate calls, getattr, etc.
      
      The non-fsync callers are problematic. We should be reporting writeback
      errors during fsync, but many places spread over the tree clear out
      errors before they can be properly reported, or report errors at
      nonsensical times.
      
      If I get -EIO on a stat() call, there is no reason for me to assume that
      it is because some previous writeback failed. The fact that it also
      clears out the error such that a subsequent fsync returns 0 is a bug,
      and a nasty one since that's potentially silent data corruption.
      
      This patch adds a small bit of new infrastructure for setting and
      reporting errors during address_space writeback. While the above was my
      original impetus for adding this, I think it's also the case that
      current fsync semantics are just problematic for userland. Most
      applications that call fsync do so to ensure that the data they wrote
      has hit the backing store.
      
      In the case where there are multiple writers to the file at the same
      time, this is really hard to determine. The first one to call fsync will
      see any stored error, and the rest get back 0. The processes with open
      fds may not be associated with one another in any way. They could even
      be in different containers, so ensuring coordination between all fsync
      callers is not really an option.
      
      One way to remedy this would be to track what file descriptor was used
      to dirty the file, but that's rather cumbersome and would likely be
      slow. However, there is a simpler way to improve the semantics here
      without incurring too much overhead.
      
      This set adds an errseq_t to struct address_space, and a corresponding
      one is added to struct file. Writeback errors are recorded in the
      mapping's errseq_t, and the one in struct file is used as the "since"
      value.
      
      This changes the semantics of the Linux fsync implementation such that
      applications can now use it to determine whether there were any
      writeback errors since fsync(fd) was last called (or since the file was
      opened in the case of fsync having never been called).
      
      Note that those writeback errors may have occurred when writing data
      that was dirtied via an entirely different fd, but that's the case now
      with the current mapping_set_error/filemap_check_error infrastructure.
      This will at least prevent you from getting a false report of success.
      
      The new behavior is still consistent with the POSIX spec, and is more
      reliable for application developers. This patch just adds some basic
      infrastructure for doing this, and ensures that the f_wb_err "cursor"
      is properly set when a file is opened. Later patches will change the
      existing code to use this new infrastructure for reporting errors at
      fsync time.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      5660e13d
  23. 20 4月, 2017 2 次提交
    • D
      dax: introduce dax_operations · 6568b08b
      Dan Williams 提交于
      Track a set of dax_operations per dax_device that can be set at
      alloc_dax() time. These operations will be used to stop the abuse of
      block_device_operations for communicating dax capabilities to
      filesystems. It will also be used to replace the "pmem api" and move
      pmem-specific cache maintenance, and other dax-driver-specific
      filesystem-dax operations, to dax device methods. In particular this
      allows us to stop abusing __copy_user_nocache(), via memcpy_to_pmem(),
      with a driver specific replacement.
      
      This is a standalone introduction of the operations. Follow on patches
      convert each dax-driver and teach fs/dax.c to use ->direct_access() from
      dax_operations instead of block_device_operations.
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6568b08b
    • D
      dax: add a facility to lookup a dax device by 'host' device name · 72058005
      Dan Williams 提交于
      For the current block_device based filesystem-dax path, we need a way
      for it to lookup the dax_device associated with a block_device. Add a
      'host' property of a dax_device that can be used for this purpose. It is
      a free form string, but for a dax_device associated with a block device
      it is the bdev name.
      
      This is a stop-gap until filesystems are able to mount on a dax-inode
      directly.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      72058005