1. 30 11月, 2017 1 次提交
  2. 20 10月, 2017 1 次提交
  3. 19 7月, 2017 1 次提交
    • D
      device-dax: fix sysfs duplicate warnings · bbb3be17
      Dan Williams 提交于
      Fix warnings of the form...
      
           WARNING: CPU: 10 PID: 4983 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
           sysfs: cannot create duplicate filename '/class/dax/dax12.0'
           Call Trace:
            dump_stack+0x63/0x86
            __warn+0xcb/0xf0
            warn_slowpath_fmt+0x5a/0x80
            ? kernfs_path_from_node+0x4f/0x60
            sysfs_warn_dup+0x62/0x80
            sysfs_do_create_link_sd.isra.2+0x97/0xb0
            sysfs_create_link+0x25/0x40
            device_add+0x266/0x630
            devm_create_dax_dev+0x2cf/0x340 [dax]
            dax_pmem_probe+0x1f5/0x26e [dax_pmem]
            nvdimm_bus_probe+0x71/0x120
      
      ...by reusing the namespace id for the device-dax instance name.
      
      Now that we have decided that there will never by more than one
      device-dax instance per libnvdimm-namespace parent device [1], we can
      directly reuse the namepace ids. There are some possible follow-on
      cleanups, but those are saved for a later patch to simplify the -stable
      backport.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2016-December/008266.html
      
      Fixes: 98a29c39 ("libnvdimm, namespace: allow creation of multiple pmem...")
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: <stable@vger.kernel.org>
      Reported-by: NDariusz Dokupil <dariusz.dokupil@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      bbb3be17
  4. 18 7月, 2017 1 次提交
  5. 06 7月, 2017 1 次提交
    • J
      fs: new infrastructure for writeback error handling and reporting · 5660e13d
      Jeff Layton 提交于
      Most filesystems currently use mapping_set_error and
      filemap_check_errors for setting and reporting/clearing writeback errors
      at the mapping level. filemap_check_errors is indirectly called from
      most of the filemap_fdatawait_* functions and from
      filemap_write_and_wait*. These functions are called from all sorts of
      contexts to wait on writeback to finish -- e.g. mostly in fsync, but
      also in truncate calls, getattr, etc.
      
      The non-fsync callers are problematic. We should be reporting writeback
      errors during fsync, but many places spread over the tree clear out
      errors before they can be properly reported, or report errors at
      nonsensical times.
      
      If I get -EIO on a stat() call, there is no reason for me to assume that
      it is because some previous writeback failed. The fact that it also
      clears out the error such that a subsequent fsync returns 0 is a bug,
      and a nasty one since that's potentially silent data corruption.
      
      This patch adds a small bit of new infrastructure for setting and
      reporting errors during address_space writeback. While the above was my
      original impetus for adding this, I think it's also the case that
      current fsync semantics are just problematic for userland. Most
      applications that call fsync do so to ensure that the data they wrote
      has hit the backing store.
      
      In the case where there are multiple writers to the file at the same
      time, this is really hard to determine. The first one to call fsync will
      see any stored error, and the rest get back 0. The processes with open
      fds may not be associated with one another in any way. They could even
      be in different containers, so ensuring coordination between all fsync
      callers is not really an option.
      
      One way to remedy this would be to track what file descriptor was used
      to dirty the file, but that's rather cumbersome and would likely be
      slow. However, there is a simpler way to improve the semantics here
      without incurring too much overhead.
      
      This set adds an errseq_t to struct address_space, and a corresponding
      one is added to struct file. Writeback errors are recorded in the
      mapping's errseq_t, and the one in struct file is used as the "since"
      value.
      
      This changes the semantics of the Linux fsync implementation such that
      applications can now use it to determine whether there were any
      writeback errors since fsync(fd) was last called (or since the file was
      opened in the case of fsync having never been called).
      
      Note that those writeback errors may have occurred when writing data
      that was dirtied via an entirely different fd, but that's the case now
      with the current mapping_set_error/filemap_check_error infrastructure.
      This will at least prevent you from getting a false report of success.
      
      The new behavior is still consistent with the POSIX spec, and is more
      reliable for application developers. This patch just adds some basic
      infrastructure for doing this, and ensures that the f_wb_err "cursor"
      is properly set when a file is opened. Later patches will change the
      existing code to use this new infrastructure for reporting errors at
      fsync time.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      5660e13d
  6. 20 4月, 2017 2 次提交
    • D
      dax: introduce dax_operations · 6568b08b
      Dan Williams 提交于
      Track a set of dax_operations per dax_device that can be set at
      alloc_dax() time. These operations will be used to stop the abuse of
      block_device_operations for communicating dax capabilities to
      filesystems. It will also be used to replace the "pmem api" and move
      pmem-specific cache maintenance, and other dax-driver-specific
      filesystem-dax operations, to dax device methods. In particular this
      allows us to stop abusing __copy_user_nocache(), via memcpy_to_pmem(),
      with a driver specific replacement.
      
      This is a standalone introduction of the operations. Follow on patches
      convert each dax-driver and teach fs/dax.c to use ->direct_access() from
      dax_operations instead of block_device_operations.
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6568b08b
    • D
      dax: add a facility to lookup a dax device by 'host' device name · 72058005
      Dan Williams 提交于
      For the current block_device based filesystem-dax path, we need a way
      for it to lookup the dax_device associated with a block_device. Add a
      'host' property of a dax_device that can be used for this purpose. It is
      a free form string, but for a dax_device associated with a block device
      it is the bdev name.
      
      This is a stop-gap until filesystems are able to mount on a dax-inode
      directly.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      72058005
  7. 13 4月, 2017 6 次提交
    • D
      dax: refactor dax-fs into a generic provider of 'struct dax_device' instances · 7b6be844
      Dan Williams 提交于
      We want dax capable drivers to be able to publish a set of dax
      operations [1]. However, we do not want to further abuse block_devices
      to advertise these operations. Instead we will attach these operations
      to a dax device and add a lookup mechanism to go from block device path
      to a dax device. A dax capable driver like pmem or brd is responsible
      for registering a dax device, alongside a block device, and then a dax
      capable filesystem is responsible for retrieving the dax device by path
      name if it wants to call dax_operations.
      
      For now, we refactor the dax pseudo-fs to be a generic facility, rather
      than an implementation detail, of the device-dax use case. Where a "dax
      device" is just an inode + dax infrastructure, and "Device DAX" is a
      mapping service layered on top of that base 'struct dax_device'.
      "Filesystem DAX" is then a mapping service that layers a filesystem on
      top of that same base device. Filesystem DAX is associated with a
      block_device for now, but perhaps directly to a dax device in the
      future, or for new pmem-only filesystems.
      
      [1]: https://lkml.org/lkml/2017/1/19/880Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      7b6be844
    • D
      device-dax: rename 'dax_dev' to 'dev_dax' · 5f0694b3
      Dan Williams 提交于
      In preparation for introducing a struct dax_device type to the kernel
      global type namespace, rename dax_dev to dev_dax. A 'dax_device'
      instance will be a generic device-driver object for any provider of dax
      functionality. A 'dev_dax' object is a device-dax-driver local /
      internal instance.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5f0694b3
    • O
      device-dax: improve fault handler debug output · 76202620
      Oliver O'Halloran 提交于
      A couple of minor improvements to the debug output in the fault handlers:
      
      a) Print the region alignment and fault size when we sent a SIGBUS
         because the region alignment is greater than the fault size.
      b) Fix the message in the PFN_{DEV|MAP} check.
      c) Additionally print the fault size enum value in the huge fault handler.
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      76202620
    • P
      device-dax: fix dax_dev_huge_fault() unknown fault size handling · 54eafcc9
      Pushkar Jambhlekar 提交于
      The default case for dax_dev_huge_fault() fault size handling mistakenly
      returns when it should unlock. This is not a problem in practice since
      the only three possible fault sizes are handled. Going forward, if the
      core mm adds a new fault size beyond pte, pmd, or pud device-dax should
      abort VM_FAULT_SIGBUS requests not VM_FAULT_FALLBACK since device-dax
      guarantees a configured fault granularity for all faults.
      Signed-off-by: NPushkar Jambhlekar <pushkar.iit@gmail.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      54eafcc9
    • D
      device-dax, tools/testing/nvdimm: enable device-dax with mock resources · efebc711
      Dave Jiang 提交于
      Provide a replacement pgoff_to_phys() that translates an nfit_test
      resource (allocated by vmalloc()) to a pfn.
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      efebc711
    • D
      device-dax: switch to srcu, fix rcu_read_lock() vs pte allocation · 956a4cd2
      Dan Williams 提交于
      The following warning triggers with a new unit test that stresses the
      device-dax interface.
      
       ===============================
       [ ERR: suspicious RCU usage.  ]
       4.11.0-rc4+ #1049 Tainted: G           O
       -------------------------------
       ./include/linux/rcupdate.h:521 Illegal context switch in RCU read-side critical section!
      
       other info that might help us debug this:
      
       rcu_scheduler_active = 2, debug_locks = 0
       2 locks held by fio/9070:
        #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8d0739d7>] __do_page_fault+0x167/0x4f0
        #1:  (rcu_read_lock){......}, at: [<ffffffffc03fbd02>] dax_dev_huge_fault+0x32/0x620 [dax]
      
       Call Trace:
        dump_stack+0x86/0xc3
        lockdep_rcu_suspicious+0xd7/0x110
        ___might_sleep+0xac/0x250
        __might_sleep+0x4a/0x80
        __alloc_pages_nodemask+0x23a/0x360
        alloc_pages_current+0xa1/0x1f0
        pte_alloc_one+0x17/0x80
        __pte_alloc+0x1e/0x120
        __get_locked_pte+0x1bf/0x1d0
        insert_pfn.isra.70+0x3a/0x100
        ? lookup_memtype+0xa6/0xd0
        vm_insert_mixed+0x64/0x90
        dax_dev_huge_fault+0x520/0x620 [dax]
        ? dax_dev_huge_fault+0x32/0x620 [dax]
        dax_dev_fault+0x10/0x20 [dax]
        __do_fault+0x1e/0x140
        __handle_mm_fault+0x9af/0x10d0
        handle_mm_fault+0x16d/0x370
        ? handle_mm_fault+0x47/0x370
        __do_page_fault+0x28c/0x4f0
        trace_do_page_fault+0x58/0x2a0
        do_async_page_fault+0x1a/0xa0
        async_page_fault+0x28/0x30
      
      Inserting a page table entry may trigger an allocation while we are
      holding a read lock to keep the device instance alive for the duration
      of the fault. Use srcu for this keep-alive protection.
      
      Fixes: dee41079 ("/dev/dax, core: file operations and dax-mmap")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      956a4cd2
  8. 21 3月, 2017 2 次提交
  9. 11 3月, 2017 3 次提交
    • D
      device-dax: fix debug output typo · 52084f89
      Dave Jiang 提交于
      The debug output for return the return data of pgoff_to_phys() in the
      fault handlers has 'phys' and 'pgoff' incorrectly swapped.
      Reported-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      52084f89
    • D
      device-dax: fix pud fault fallback handling · 70b085b0
      Dave Jiang 提交于
      Jeff Moyer reports:
      
          With a device dax alignment of 4KB or 2MB, I get sigbus when running
          the attached fio job file for the current kernel (4.11.0-rc1+).  If
          I specify an alignment of 1GB, it works.
      
          I turned on debug output, and saw that it was failing in the huge
          fault code.
      
           dax dax1.0: dax_open
           dax dax1.0: dax_mmap
           dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 -
           dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf60)
           dax dax1.0: dax_release
      
          fio config for reproduce:
          [global]
          ioengine=dev-dax
          direct=0
          filename=/dev/dax0.0
          bs=2m
      
          [write]
          rw=write
      
          [read]
          stonewall
          rw=read
      
      The driver fails to fallback when taking a fault that is larger than
      the device alignment, or handling a larger fault when a smaller
      mapping is already established. While we could support larger
      mappings for a device with a smaller alignment, that change is
      too large for the immediate fix. The simplest change is to force
      fallback until the fault size matches the alignment.
      Reported-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      70b085b0
    • D
      device-dax: fix pmd/pte fault fallback handling · 0134ed4f
      Dave Jiang 提交于
      Jeff Moyer reports:
      
          With a device dax alignment of 4KB or 2MB, I get sigbus when running
          the attached fio job file for the current kernel (4.11.0-rc1+).  If
          I specify an alignment of 1GB, it works.
      
          I turned on debug output, and saw that it was failing in the huge
          fault code.
      
           dax dax1.0: dax_open
           dax dax1.0: dax_mmap
           dax dax1.0: dax_dev_huge_fault: fio: write (0x7f08f0a00000 -
           dax dax1.0: __dax_dev_pud_fault: phys_to_pgoff(0xffffffffcf60
           dax dax1.0: dax_release
      
          fio config for reproduce:
          [global]
          ioengine=dev-dax
          direct=0
          filename=/dev/dax0.0
          bs=2m
      
          [write]
          rw=write
      
          [read]
          stonewall
          rw=read
      
      The driver fails to fallback when taking a fault that is larger than
      the device alignment, or handling a larger fault when a smaller
      mapping is already established. While we could support larger
      mappings for a device with a smaller alignment, that change is
      too large for the immediate fix. The simplest change is to force
      fallback until the fault size matches the alignment.
      
      Fixes: dee41079 ("/dev/dax, core: file operations and dax-mmap")
      Cc: <stable@vger.kernel.org>
      Reported-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0134ed4f
  10. 02 3月, 2017 1 次提交
  11. 25 2月, 2017 4 次提交
  12. 23 2月, 2017 2 次提交
  13. 18 12月, 2016 1 次提交
  14. 15 12月, 2016 1 次提交
  15. 07 12月, 2016 1 次提交
    • D
      device-dax: fix private mapping restriction, permit read-only · 325896ff
      Dan Williams 提交于
      Hugh notes in response to commit 4cb19355 "device-dax: fail all
      private mapping attempts":
      
        "I think that is more restrictive than you intended: haven't tried, but I
        believe it rejects a PROT_READ, MAP_SHARED, O_RDONLY fd mmap, leaving no
        way to mmap /dev/dax without write permission to it."
      
      Indeed it does restrict read-only mappings, switch to checking
      VM_MAYSHARE, not VM_SHARED.
      
      Cc: <stable@vger.kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Pawel Lebioda <pawel.lebioda@intel.com>
      Fixes: 4cb19355 ("device-dax: fail all private mapping attempts")
      Reported-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      325896ff
  16. 17 11月, 2016 1 次提交
  17. 08 10月, 2016 2 次提交
  18. 04 9月, 2016 1 次提交
  19. 24 8月, 2016 8 次提交