1. 04 7月, 2017 1 次提交
  2. 29 6月, 2017 1 次提交
    • J
      block: provide bio_uninit() free freeing integrity/task associations · 9ae3b3f5
      Jens Axboe 提交于
      Wen reports significant memory leaks with DIF and O_DIRECT:
      
      "With nvme devive + T10 enabled, On a system it has 256GB and started
      logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
      it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
      leaking.
      
      /proc/meminfo | grep SUnreclaim...
      
      SUnreclaim:      6752128 kB
      SUnreclaim:      6874880 kB
      SUnreclaim:      7238080 kB
      ....
      SUnreclaim:     22307264 kB
      SUnreclaim:     22485888 kB
      SUnreclaim:     22720256 kB
      
      When testcases with T10 enabled call into __blkdev_direct_IO_simple,
      code doesn't free memory allocated by bio_integrity_alloc. The patch
      fixes the issue. HTX has been run with +60 hours without failure."
      
      Since __blkdev_direct_IO_simple() allocates the bio on the stack, it
      doesn't go through the regular bio free. This means that any ancillary
      data allocated with the bio through the stack is not freed. Hence, we
      can leak the integrity data associated with the bio, if the device is
      using DIF/DIX.
      
      Fix this by providing a bio_uninit() and export it, so that we can use
      it to free this data. Note that this is a minimal fix for this issue.
      Any current user of bio's that are allocated outside of
      bio_alloc_bioset() suffers from this issue, most notably some drivers.
      We will fix those in a more comprehensive patch for 4.13. This also
      means that the commit marked as being fixed by this isn't the real
      culprit, it's just the most obvious one out there.
      
      Fixes: 542ff7bf ("block: new direct I/O implementation")
      Reported-by: NWen Xiong <wenxiong@linux.vnet.ibm.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9ae3b3f5
  3. 21 6月, 2017 1 次提交
  4. 20 6月, 2017 1 次提交
    • G
      block: return on congested block device · 03a07c92
      Goldwyn Rodrigues 提交于
      A new bio operation flag REQ_NOWAIT is introduced to identify bio's
      orignating from iocb with IOCB_NOWAIT. This flag indicates
      to return immediately if a request cannot be made instead
      of retrying.
      
      Stacked devices such as md (the ones with make_request_fn hooks)
      currently are not supported because it may block for housekeeping.
      For example, an md can have a part of the device suspended.
      For this reason, only request based devices are supported.
      In the future, this feature will be expanded to stacked devices
      by teaching them how to handle the REQ_NOWAIT flags.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      03a07c92
  5. 19 6月, 2017 3 次提交
  6. 09 6月, 2017 1 次提交
  7. 12 4月, 2017 1 次提交
  8. 26 3月, 2017 1 次提交
  9. 25 3月, 2017 1 次提交
  10. 23 3月, 2017 1 次提交
  11. 16 2月, 2017 1 次提交
  12. 09 12月, 2016 1 次提交
    • C
      block: improve handling of the magic discard payload · f9d03f96
      Christoph Hellwig 提交于
      Instead of allocating a single unused biovec for discard requests, send
      them down without any payload.  Instead we allow the driver to add a
      "special" payload using a biovec embedded into struct request (unioned
      over other fields never used while in the driver), and overloading
      the number of segments for this case.
      
      This has a couple of advantages:
      
       - we don't have to allocate the bio_vec
       - the amount of special casing for discard requests in the block
         layer is significantly reduced
       - using this same scheme for other request types is trivial,
         which will be important for implementing the new WRITE_ZEROES
         op on devices where it actually requires a payload (e.g. SCSI)
       - we can get rid of playing games with the request length, as
         we'll never touch it and completions will work just fine
       - it will allow us to support ranged discard operations in the
         future by merging non-contiguous discard bios into a single
         request
       - last but not least it removes a lot of code
      
      This patch is the common base for my WIP series for ranges discards and to
      remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES,
      so it would be good to get it in quickly.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f9d03f96
  13. 01 12月, 2016 1 次提交
  14. 30 11月, 2016 1 次提交
    • K
      block: add bio_iov_iter_get_pages() · 38161995
      Kent Overstreet 提交于
      This is a helper that pins down a range from an iov_iter and adds it to
      a bio without requiring a separate memory allocation for the page array.
      It will be used for upcoming direct I/O implementations for block devices
      and iomap based file systems.
      Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      [hch: ported to the iov_iter interface, renamed and added comments.
            All blame should be directed to me and all fame should go to Kent
            after this!]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      
      (cherry picked from commit 9cd56d916aa481ce8f56d9c5302a6ed90c2e0b5f)
      38161995
  15. 22 11月, 2016 1 次提交
  16. 03 11月, 2016 1 次提交
  17. 01 11月, 2016 2 次提交
  18. 28 10月, 2016 1 次提交
  19. 22 9月, 2016 1 次提交
  20. 14 9月, 2016 1 次提交
  21. 16 8月, 2016 1 次提交
  22. 08 8月, 2016 1 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
  23. 05 8月, 2016 1 次提交
  24. 21 7月, 2016 1 次提交
  25. 10 6月, 2016 4 次提交
  26. 08 6月, 2016 5 次提交
  27. 06 5月, 2016 1 次提交
    • M
      block: make bio_inc_remaining() interface accessible again · 0ef5a50c
      Mike Snitzer 提交于
      Commit 326e1dbb ("block: remove management of bi_remaining when
      restoring original bi_end_io") made bio_inc_remaining() private to bio.c
      because the only use-case that made sense was confined to the
      bio_chain() interface.
      
      Since that time DM thinp went on to use bio_chain() in its relatively
      complex implementation of async discard support.  That implementation,
      even when converted over to use the new async __blkdev_issue_discard()
      interface, depends on deferred completion of the original discard bio --
      which is most appropriately implemented using bio_inc_remaining().
      
      DM thinp foolishly duplicated bio_inc_remaining(), local to dm-thin.c as
      __bio_inc_remaining(), so re-exporting bio_inc_remaining() allows us to
      put an end to that foolishness.
      
      All said, bio_inc_remaining() should really only be used in conjunction
      with bio_chain().  It isn't intended for generic bio reference counting.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0ef5a50c
  28. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  29. 13 3月, 2016 1 次提交
  30. 04 3月, 2016 1 次提交