1. 09 5月, 2017 1 次提交
    • D
      block, dax: move "select DAX" from BLOCK to FS_DAX · ef510424
      Dan Williams 提交于
      For configurations that do not enable DAX filesystems or drivers, do not
      require the DAX core to be built.
      
      Given that the 'direct_access' method has been removed from
      'block_device_operations', we can also go ahead and remove the
      block-related dax helper functions from fs/block_dev.c to
      drivers/dax/super.c. This keeps dax details out of the block layer and
      lets the DAX core be built as a module in the FS_DAX=n case.
      
      Filesystems need to include dax.h to call bdev_dax_supported().
      
      Cc: linux-xfs@vger.kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.com>
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ef510424
  2. 05 5月, 2017 1 次提交
  3. 30 4月, 2017 1 次提交
    • D
      libnvdimm: rework region badblocks clearing · 23f49844
      Dan Williams 提交于
      Toshi noticed that the new support for a region-level badblocks missed
      the case where errors are cleared due to BTT I/O.
      
      An initial attempt to fix this ran into a "sleeping while atomic"
      warning due to taking the nvdimm_bus_lock() in the BTT I/O path to
      satisfy the locking requirements of __nvdimm_bus_badblocks_clear().
      However, that lock is not needed since we are not acting on any data that
      is subject to change under that lock. The badblocks instance has its own
      internal lock to handle mutations of the error list.
      
      So, in order to make it clear that we are just acting on region devices,
      rename __nvdimm_bus_badblocks_clear() to nvdimm_clear_badblocks_regions().
      Eliminate the lock and consolidate all support routines for the new
      nvdimm_account_cleared_poison() in drivers/nvdimm/bus.c. Finally, to the
      opportunity to cleanup to some unnecessary casts, make the calling
      convention of nvdimm_clear_badblocks_regions() clearer by replacing struct
      resource with the minimal struct clear_badblocks_context, and use the
      DEVICE_ATTR macro.
      
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Reported-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      23f49844
  4. 26 4月, 2017 6 次提交
  5. 21 4月, 2017 3 次提交
    • D
      dm: add dax_device and dax_operations support · f26c5719
      Dan Williams 提交于
      Allocate a dax_device to represent the capacity of a device-mapper
      instance. Provide a ->direct_access() method via the new dax_operations
      indirection that mirrors the functionality of the current direct_access
      support via block_device_operations.  Once fs/dax.c has been converted
      to use dax_operations the old dm_blk_direct_access() will be removed.
      
      A new helper dm_dax_get_live_target() is introduced to separate some of
      the dm-specifics from the direct_access implementation.
      
      This enabling is only for the top-level dm representation to upper
      layers. Converting target direct_access implementations is deferred to a
      separate patch.
      
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f26c5719
    • D
      dax: introduce dax_direct_access() · b0686260
      Dan Williams 提交于
      Replace bdev_direct_access() with dax_direct_access() that uses
      dax_device and dax_operations instead of a block_device and
      block_device_operations for dax. Once all consumers of the old api have
      been converted bdev_direct_access() will be deleted.
      
      Given that block device partitioning decisions can cause dax page
      alignment constraints to be violated this also introduces the
      bdev_dax_pgoff() helper. It handles calculating a logical pgoff relative
      to the dax_device and also checks for page alignment.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b0686260
    • D
      block: kill bdev_dax_capable() · d8f07aee
      Dan Williams 提交于
      This is leftover dead code that has since been replaced by
      bdev_dax_supported().
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d8f07aee
  6. 20 4月, 2017 3 次提交
    • D
      pmem: add dax_operations support · c1d6e828
      Dan Williams 提交于
      Setup a dax_device to have the same lifetime as the pmem block device
      and add a ->direct_access() method that is equivalent to
      pmem_direct_access(). Once fs/dax.c has been converted to use
      dax_operations the old pmem_direct_access() will be removed.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c1d6e828
    • D
      dax: introduce dax_operations · 6568b08b
      Dan Williams 提交于
      Track a set of dax_operations per dax_device that can be set at
      alloc_dax() time. These operations will be used to stop the abuse of
      block_device_operations for communicating dax capabilities to
      filesystems. It will also be used to replace the "pmem api" and move
      pmem-specific cache maintenance, and other dax-driver-specific
      filesystem-dax operations, to dax device methods. In particular this
      allows us to stop abusing __copy_user_nocache(), via memcpy_to_pmem(),
      with a driver specific replacement.
      
      This is a standalone introduction of the operations. Follow on patches
      convert each dax-driver and teach fs/dax.c to use ->direct_access() from
      dax_operations instead of block_device_operations.
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6568b08b
    • D
      dax: add a facility to lookup a dax device by 'host' device name · 72058005
      Dan Williams 提交于
      For the current block_device based filesystem-dax path, we need a way
      for it to lookup the dax_device associated with a block_device. Add a
      'host' property of a dax_device that can be used for this purpose. It is
      a free form string, but for a dax_device associated with a block device
      it is the bdev name.
      
      This is a stop-gap until filesystems are able to mount on a dax-inode
      directly.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      72058005
  7. 14 4月, 2017 1 次提交
    • D
      libnvdimm: fix clear poison locking with spinlock and GFP_NOWAIT allocation · b3b454f6
      Dave Jiang 提交于
      The following warning results from holding a lane spinlock,
      preempt_disable(), or the btt map spinlock and then trying to take the
      reconfig_mutex to walk the poison list and potentially add new entries.
      
      BUG: sleeping function called from invalid context at kernel/locking/mutex.
      c:747
      in_atomic(): 1, irqs_disabled(): 0, pid: 17159, name: dd
      [..]
      Call Trace:
      dump_stack+0x85/0xc8
      ___might_sleep+0x184/0x250
      __might_sleep+0x4a/0x90
      __mutex_lock+0x58/0x9b0
      ? nvdimm_bus_lock+0x21/0x30 [libnvdimm]
      ? __nvdimm_bus_badblocks_clear+0x2f/0x60 [libnvdimm]
      ? acpi_nfit_forget_poison+0x79/0x80 [nfit]
      ? _raw_spin_unlock+0x27/0x40
      mutex_lock_nested+0x1b/0x20
      nvdimm_bus_lock+0x21/0x30 [libnvdimm]
      nvdimm_forget_poison+0x25/0x50 [libnvdimm]
      nvdimm_clear_poison+0x106/0x140 [libnvdimm]
      nsio_rw_bytes+0x164/0x270 [libnvdimm]
      btt_write_pg+0x1de/0x3e0 [nd_btt]
      ? blk_queue_enter+0x30/0x290
      btt_make_request+0x11a/0x310 [nd_btt]
      ? blk_queue_enter+0xb7/0x290
      ? blk_queue_enter+0x30/0x290
      generic_make_request+0x118/0x3b0
      
      A spinlock is introduced to protect the poison list. This allows us to not
      having to acquire the reconfig_mutex for touching the poison list. The
      add_poison() function has been broken out into two helper functions. One to
      allocate the poison entry and the other to apppend the entry. This allows us
      to unlock the poison_lock in non-I/O path and continue to be able to allocate
      the poison entry with GFP_KERNEL. We will use GFP_NOWAIT in the I/O path in
      order to satisfy being in atomic context.
      Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b3b454f6
  8. 13 4月, 2017 2 次提交
    • D
      dax: refactor dax-fs into a generic provider of 'struct dax_device' instances · 7b6be844
      Dan Williams 提交于
      We want dax capable drivers to be able to publish a set of dax
      operations [1]. However, we do not want to further abuse block_devices
      to advertise these operations. Instead we will attach these operations
      to a dax device and add a lookup mechanism to go from block device path
      to a dax device. A dax capable driver like pmem or brd is responsible
      for registering a dax device, alongside a block device, and then a dax
      capable filesystem is responsible for retrieving the dax device by path
      name if it wants to call dax_operations.
      
      For now, we refactor the dax pseudo-fs to be a generic facility, rather
      than an implementation detail, of the device-dax use case. Where a "dax
      device" is just an inode + dax infrastructure, and "Device DAX" is a
      mapping service layered on top of that base 'struct dax_device'.
      "Filesystem DAX" is then a mapping service that layers a filesystem on
      top of that same base device. Filesystem DAX is associated with a
      block_device for now, but perhaps directly to a dax device in the
      future, or for new pmem-only filesystems.
      
      [1]: https://lkml.org/lkml/2017/1/19/880Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      7b6be844
    • D
      libnvdimm: add support for clear poison list and badblocks for device dax · 006358b3
      Dave Jiang 提交于
      Providing mechanism to clear poison list via the ndctl ND_CMD_CLEAR_ERROR
      call. We will update the poison list and also the badblocks at region level
      if the region is in dax mode or in pmem mode and not active. In other
      words we force badblocks to be cleared through write requests if the
      address is currently accessed through a block device, otherwise it can
      only be done via the ioctl+dsm path.
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      006358b3
  9. 22 3月, 2017 5 次提交
  10. 21 3月, 2017 1 次提交
    • L
      chardev: add helper function to register char devs with a struct device · 233ed09d
      Logan Gunthorpe 提交于
      Credit for this patch goes is shared with Dan Williams [1]. I've
      taken things one step further to make the helper function more
      useful and clean up calling code.
      
      There's a common pattern in the kernel whereby a struct cdev is placed
      in a structure along side a struct device which manages the life-cycle
      of both. In the naive approach, the reference counting is broken and
      the struct device can free everything before the chardev code
      is entirely released.
      
      Many developers have solved this problem by linking the internal kobjs
      in this fashion:
      
      cdev.kobj.parent = &parent_dev.kobj;
      
      The cdev code explicitly gets and puts a reference to it's kobj parent.
      So this seems like it was intended to be used this way. Dmitrty Torokhov
      first put this in place in 2012 with this commit:
      
      2f0157f1 char_dev: pin parent kobject
      
      and the first instance of the fix was then done in the input subsystem
      in the following commit:
      
      4a215aad Input: fix use-after-free introduced with dynamic minor changes
      
      Subsequently over the years, however, this issue seems to have tripped
      up multiple developers independently. For example, see these commits:
      
      0d5b7dae iio: Prevent race between IIO chardev opening and IIO device
      (by Lars-Peter Clausen in 2013)
      
      ba0ef854 tpm: Fix initialization of the cdev
      (by Jason Gunthorpe in 2015)
      
      5b28dde5 [media] media: fix use-after-free in cdev_put() when app exits
      after driver unbind
      (by Shauh Khan in 2016)
      
      This technique is similarly done in at least 15 places within the kernel
      and probably should have been done so in another, at least, 5 places.
      The kobj line also looks very suspect in that one would not expect
      drivers to have to mess with kobject internals in this way.
      Even highly experienced kernel developers can be surprised by this
      code, as seen in [2].
      
      To help alleviate this situation, and hopefully prevent future
      wasted effort on this problem, this patch introduces a helper function
      to register a char device along with its parent struct device.
      This creates a more regular API for tying a char device to its parent
      without the developer having to set members in the underlying kobject.
      
      This patch introduce cdev_device_add and cdev_device_del which
      replaces a common pattern including setting the kobj parent, calling
      cdev_add and then calling device_add. It also introduces cdev_set_parent
      for the few cases that set the kobject parent without using device_add.
      
      [1] https://lkml.org/lkml/2017/2/13/700
      [2] https://lkml.org/lkml/2017/2/10/370Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NHans Verkuil <hans.verkuil@cisco.com>
      Reviewed-by: NAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      233ed09d
  11. 17 3月, 2017 8 次提交
  12. 16 3月, 2017 4 次提交
    • G
      crypto: ccp - Assign DMA commands to the channel's CCP · 7c468447
      Gary R Hook 提交于
      The CCP driver generally uses a round-robin approach when
      assigning operations to available CCPs. For the DMA engine,
      however, the DMA mappings of the SGs are associated with a
      specific CCP. When an IOMMU is enabled, the IOMMU is
      programmed based on this specific device.
      
      If the DMA operations are not performed by that specific
      CCP then addressing errors and I/O page faults will occur.
      
      Update the CCP driver to allow a specific CCP device to be
      requested for an operation and use this in the DMA engine
      support.
      
      Cc: <stable@vger.kernel.org> # 4.9.x-
      Signed-off-by: NGary R Hook <gary.hook@amd.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      7c468447
    • D
      vmbus: remove hv_event_tasklet_disable/enable · dad72a1d
      Dexuan Cui 提交于
      With the recent introduction of per-channel tasklet, we need to update
      the way we handle the 3 concurrency issues:
      
      1. hv_process_channel_removal -> percpu_channel_deq vs.
         vmbus_chan_sched -> list_for_each_entry(..., percpu_list);
      
      2. vmbus_process_offer -> percpu_channel_enq/deq vs. vmbus_chan_sched.
      
      3. vmbus_close_internal vs. the per-channel tasklet vmbus_on_event;
      
      The first 2 issues can be handled by Stephen's recent patch
      "vmbus: use rcu for per-cpu channel list", and the third issue
      can be handled by calling tasklet_disable in vmbus_close_internal here.
      
      We don't need the original hv_event_tasklet_disable/enable since we
      now use per-channel tasklet instead of the previous per-CPU tasklet,
      and actually we must remove them due to the side effect now:
      vmbus_process_offer -> hv_event_tasklet_enable -> tasklet_schedule will
      start the per-channel callback prematurely, cauing NULL dereferencing
      (the channel may haven't been properly configured to run the callback yet).
      
      Fixes: 631e63a9 ("vmbus: change to per channel tasklet")
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dad72a1d
    • S
      vmbus: use rcu for per-cpu channel list · 8200f208
      Stephen Hemminger 提交于
      The per-cpu channel list is now referred to in the interrupt
      routine. This is mostly safe since the host will not normally generate
      an interrupt when channel is being deleted but if it did then there
      would be a use after free problem.
      
      To solve, this use RCU protection on ther per-cpu list.
      
      Fixes: 631e63a9 ("vmbus: change to per channel tasklet")
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8200f208
    • E
      fscrypt: eliminate ->prepare_context() operation · 94840e3c
      Eric Biggers 提交于
      The only use of the ->prepare_context() fscrypt operation was to allow
      ext4 to evict inline data from the inode before ->set_context().
      However, there is no reason why this cannot be done as simply the first
      step in ->set_context(), and in fact it makes more sense to do it that
      way because then the policy modes and flags get validated before any
      real work is done.  Therefore, merge ext4_prepare_context() into
      ext4_set_context(), and remove ->prepare_context().
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      94840e3c
  13. 14 3月, 2017 3 次提交
  14. 13 3月, 2017 1 次提交
    • D
      bpf: improve read-only handling · 65869a47
      Daniel Borkmann 提交于
      Improve bpf_{prog,jit_binary}_{un,}lock_ro() by throwing a
      one-time warning in case of an error when the image couldn't
      be set read-only, and also mark struct bpf_prog as locked when
      bpf_prog_lock_ro() was called.
      
      Reason for the latter is that bpf_prog_unlock_ro() is called from
      various places including error paths, and we shouldn't mess with
      page attributes when really not needed.
      
      For bpf_jit_binary_unlock_ro() this is not needed as jited flag
      implicitly indicates this, thus for archs with ARCH_HAS_SET_MEMORY
      we're guaranteed to have a previously locked image. Overall, this
      should also help us to identify any further potential issues with
      set_memory_*() helpers.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65869a47