1. 05 8月, 2016 2 次提交
  2. 21 7月, 2016 5 次提交
  3. 13 7月, 2016 1 次提交
    • D
      pmem: kill __pmem address space · 7a9eb206
      Dan Williams 提交于
      The __pmem address space was meant to annotate codepaths that touch
      persistent memory and need to coordinate a call to wmb_pmem().  Now that
      wmb_pmem() is gone, there is little need to keep this annotation.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      7a9eb206
  4. 28 6月, 2016 1 次提交
    • J
      block: Convert fifo_time from ulong to u64 · 9828c2c6
      Jan Kara 提交于
      Currently rq->fifo_time is unsigned long but CFQ stores nanosecond
      timestamp in it which would overflow on 32-bit archs. Convert it to u64
      to avoid the overflow. Since the rq->fifo_time is unioned with struct
      call_single_data(), this does not change the size of struct request in
      any way.
      
      We have to slightly fixup block/deadline-iosched.c so that comparison
      happens in the right types.
      
      Fixes: 9a7f38c4Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9828c2c6
  5. 09 6月, 2016 2 次提交
  6. 08 6月, 2016 6 次提交
  7. 19 5月, 2016 1 次提交
  8. 17 5月, 2016 3 次提交
    • T
      block: Update blkdev_dax_capable() for consistency · a8078b1f
      Toshi Kani 提交于
      blkdev_dax_capable() is similar to bdev_dax_supported(), but needs
      to remain as a separate interface for checking dax capability of
      a raw block device.
      
      Rename and relocate blkdev_dax_capable() to keep them maintained
      consistently, and call bdev_direct_access() for the dax capability
      check.
      
      There is no change in the behavior.
      
      Link: https://lkml.org/lkml/2016/5/9/950Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      a8078b1f
    • T
      block: Add bdev_dax_supported() for dax mount checks · 2d96afc8
      Toshi Kani 提交于
      DAX imposes additional requirements to a device.  Add
      bdev_dax_supported() which performs all the precondition checks
      necessary for filesystem to mount the device with dax option.
      
      Also add a new check to verify if a partition is aligned by 4KB.
      When a partition is unaligned, any dax read/write access fails,
      except for metadata update.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      2d96afc8
    • T
      block: Add vfs_msg() interface · 2af3a815
      Toshi Kani 提交于
      In preparation of moving DAX capability checks to the block layer
      from filesystem code, add a VFS message interface that aligns with
      filesystem's message format.
      
      For instance, a vfs_msg() message followed by XFS messages in case
      of a dax mount error may look like:
      
        VFS (pmem0p1): error: unaligned partition for dax
        XFS (pmem0p1): DAX unsupported by block device. Turning off DAX.
        XFS (pmem0p1): Mounting V5 Filesystem
         :
      
      vfs_msg() is largely based on ext4_msg().
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      2af3a815
  9. 02 5月, 2016 1 次提交
  10. 14 4月, 2016 1 次提交
  11. 13 4月, 2016 3 次提交
  12. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  13. 05 3月, 2016 1 次提交
  14. 04 3月, 2016 3 次提交
  15. 19 2月, 2016 1 次提交
    • M
      block: Add blk_set_runtime_active() · d07ab6d1
      Mika Westerberg 提交于
      If block device is left runtime suspended during system suspend, resume
      hook of the driver typically corrects runtime PM status of the device back
      to "active" after it is resumed. However, this is not enough as queue's
      runtime PM status is still "suspended". As long as it is in this state
      blk_pm_peek_request() returns NULL and thus prevents new requests to be
      processed.
      
      Add new function blk_set_runtime_active() that can be used to force the
      queue status back to "active" as needed.
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Acked-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d07ab6d1
  16. 05 2月, 2016 1 次提交
  17. 16 1月, 2016 2 次提交
    • D
      mm, dax, pmem: introduce pfn_t · 34c0fd54
      Dan Williams 提交于
      For the purpose of communicating the optional presence of a 'struct
      page' for the pfn returned from ->direct_access(), introduce a type that
      encapsulates a page-frame-number plus flags.  These flags contain the
      historical "page_link" encoding for a scatterlist entry, but can also
      denote "device memory".  Where "device memory" is a set of pfns that are
      not part of the kernel's linear mapping by default, but are accessed via
      the same memory controller as ram.
      
      The motivation for this new type is large capacity persistent memory
      that needs struct page entries in the 'memmap' to support 3rd party DMA
      (i.e.  O_DIRECT I/O with a persistent memory source/target).  However,
      we also need it in support of maintaining a list of mapped inodes which
      need to be unmapped at driver teardown or freeze_bdev() time.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34c0fd54
    • D
      dax: fix lifetime of in-kernel dax mappings with dax_map_atomic() · b2e0d162
      Dan Williams 提交于
      The DAX implementation needs to protect new calls to ->direct_access()
      and usage of its return value against the driver for the underlying
      block device being disabled.  Use blk_queue_enter()/blk_queue_exit() to
      hold off blk_cleanup_queue() from proceeding, or otherwise fail new
      mapping requests if the request_queue is being torn down.
      
      This also introduces blk_dax_ctl to simplify the interface from fs/dax.c
      through dax_map_atomic() to bdev_direct_access().
      
      [willy@linux.intel.com: fix read() of a hole]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b2e0d162
  18. 29 12月, 2015 1 次提交
  19. 23 12月, 2015 1 次提交
    • C
      block: defer timeouts to a workqueue · 287922eb
      Christoph Hellwig 提交于
      Timer context is not very useful for drivers to perform any meaningful abort
      action from.  So instead of calling the driver from this useless context
      defer it to a workqueue as soon as possible.
      
      Note that while a delayed_work item would seem the right thing here I didn't
      dare to use it due to the magic in blk_add_timer that pokes deep into timer
      internals.  But maybe this encourages Tejun to add a sensible API for that to
      the workqueue API and we'll all be fine in the end :)
      
      Contains a major update from Keith Bush:
      
      "This patch removes synchronizing the timeout work so that the timer can
       start a freeze on its own queue. The timer enters the queue, so timer
       context can only start a freeze, but not wait for frozen."
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      287922eb
  20. 02 12月, 2015 1 次提交
  21. 30 11月, 2015 1 次提交
    • H
      block: Always check queue limits for cloned requests · bf4e6b4e
      Hannes Reinecke 提交于
      When a cloned request is retried on other queues it always needs
      to be checked against the queue limits of that queue.
      Otherwise the calculations for nr_phys_segments might be wrong,
      leading to a crash in scsi_init_sgtable().
      
      To clarify this the patch renames blk_rq_check_limits()
      to blk_cloned_rq_check_limits() and removes the symbol
      export, as the new function should only be used for
      cloned requests and never exported.
      
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Ewan Milne <emilne@redhat.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Fixes: e2a60da7 ("block: Clean up special command handling logic")
      Cc: stable@vger.kernel.org # 3.7+
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      bf4e6b4e
  22. 26 11月, 2015 1 次提交
    • M
      block/sd: Fix device-imposed transfer length limits · ca369d51
      Martin K. Petersen 提交于
      Commit 4f258a46 ("sd: Fix maximum I/O size for BLOCK_PC requests")
      had the unfortunate side-effect of removing an implicit clamp to
      BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer
      code. This caused problems for some SMR drives.
      
      Debugging this issue revealed a few problems with the existing
      infrastructure since the block layer didn't know how to deal with
      device-imposed limits, only limits set by the I/O controller.
      
       - Introduce a new queue limit, max_dev_sectors, which is used by the
         ULD to signal the maximum sectors for a REQ_TYPE_FS request.
      
       - Ensure that max_dev_sectors is correctly stacked and taken into
         account when overriding max_sectors through sysfs.
      
       - Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer
         values for later processing.
      
       - In sd_revalidate() set the queue's max_dev_sectors based on the
         MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value
         is not reported, fall back to a cap based on the CDB TRANSFER LENGTH
         field size.
      
       - In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits
         VPD--if reported and sane--to signal the preferred device transfer
         size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS.
      
       - blk_limits_max_hw_sectors() is no longer used and can be removed.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: sweeneygj@gmx.com
      Tested-by: NArzeets <anatol.pomozov@gmail.com>
      Tested-by: NDavid Eisner <david.eisner@oriel.oxon.org>
      Tested-by: NMario Kicherer <dev@kicherer.org>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      ca369d51