1. 15 5月, 2018 3 次提交
  2. 22 3月, 2018 1 次提交
  3. 24 1月, 2018 1 次提交
  4. 07 1月, 2018 1 次提交
  5. 21 12月, 2017 1 次提交
    • S
      block-throttle: avoid double charge · 111be883
      Shaohua Li 提交于
      If a bio is throttled and split after throttling, the bio could be
      resubmited and enters the throttling again. This will cause part of the
      bio to be charged multiple times. If the cgroup has an IO limit, the
      double charge will significantly harm the performance. The bio split
      becomes quite common after arbitrary bio size change.
      
      To fix this, we always set the BIO_THROTTLED flag if a bio is throttled.
      If the bio is cloned/split, we copy the flag to new bio too to avoid a
      double charge. However, cloned bio could be directed to a new disk,
      keeping the flag be a problem. The observation is we always set new disk
      for the bio in this case, so we can clear the flag in bio_set_dev().
      
      This issue exists for a long time, arbitrary bio size change just makes
      it worse, so this should go into stable at least since v4.2.
      
      V1-> V2: Not add extra field in bio based on discussion with Tejun
      
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: stable@vger.kernel.org
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      111be883
  6. 23 11月, 2017 1 次提交
  7. 17 11月, 2017 1 次提交
  8. 26 10月, 2017 1 次提交
    • B
      block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion() · e319e1fb
      Byungchul Park 提交于
      Darrick posted the following warning and Dave Chinner analyzed it:
      
      > ======================================================
      > WARNING: possible circular locking dependency detected
      > 4.14.0-rc1-fixes #1 Tainted: G        W
      > ------------------------------------------------------
      > loop0/31693 is trying to acquire lock:
      >  (&(&ip->i_mmaplock)->mr_lock){++++}, at: [<ffffffffa00f1b0c>] xfs_ilock+0x23c/0x330 [xfs]
      >
      > but now in release context of a crosslock acquired at the following:
      >  ((complete)&ret.event){+.+.}, at: [<ffffffff81326c1f>] submit_bio_wait+0x7f/0xb0
      >
      > which lock already depends on the new lock.
      >
      > the existing dependency chain (in reverse order) is:
      >
      > -> #2 ((complete)&ret.event){+.+.}:
      >        lock_acquire+0xab/0x200
      >        wait_for_completion_io+0x4e/0x1a0
      >        submit_bio_wait+0x7f/0xb0
      >        blkdev_issue_zeroout+0x71/0xa0
      >        xfs_bmapi_convert_unwritten+0x11f/0x1d0 [xfs]
      >        xfs_bmapi_write+0x374/0x11f0 [xfs]
      >        xfs_iomap_write_direct+0x2ac/0x430 [xfs]
      >        xfs_file_iomap_begin+0x20d/0xd50 [xfs]
      >        iomap_apply+0x43/0xe0
      >        dax_iomap_rw+0x89/0xf0
      >        xfs_file_dax_write+0xcc/0x220 [xfs]
      >        xfs_file_write_iter+0xf0/0x130 [xfs]
      >        __vfs_write+0xd9/0x150
      >        vfs_write+0xc8/0x1c0
      >        SyS_write+0x45/0xa0
      >        entry_SYSCALL_64_fastpath+0x1f/0xbe
      >
      > -> #1 (&xfs_nondir_ilock_class){++++}:
      >        lock_acquire+0xab/0x200
      >        down_write_nested+0x4a/0xb0
      >        xfs_ilock+0x263/0x330 [xfs]
      >        xfs_setattr_size+0x152/0x370 [xfs]
      >        xfs_vn_setattr+0x6b/0x90 [xfs]
      >        notify_change+0x27d/0x3f0
      >        do_truncate+0x5b/0x90
      >        path_openat+0x237/0xa90
      >        do_filp_open+0x8a/0xf0
      >        do_sys_open+0x11c/0x1f0
      >        entry_SYSCALL_64_fastpath+0x1f/0xbe
      >
      > -> #0 (&(&ip->i_mmaplock)->mr_lock){++++}:
      >        up_write+0x1c/0x40
      >        xfs_iunlock+0x1d0/0x310 [xfs]
      >        xfs_file_fallocate+0x8a/0x310 [xfs]
      >        loop_queue_work+0xb7/0x8d0
      >        kthread_worker_fn+0xb9/0x1f0
      >
      > Chain exists of:
      >   &(&ip->i_mmaplock)->mr_lock --> &xfs_nondir_ilock_class --> (complete)&ret.event
      >
      >  Possible unsafe locking scenario by crosslock:
      >
      >        CPU0                    CPU1
      >        ----                    ----
      >   lock(&xfs_nondir_ilock_class);
      >   lock((complete)&ret.event);
      >                                lock(&(&ip->i_mmaplock)->mr_lock);
      >                                unlock((complete)&ret.event);
      >
      >                *** DEADLOCK ***
      
      The warning is a false positive, caused by the fact that all
      wait_for_completion()s in submit_bio_wait() are waiting with the same
      lock class.
      
      However, some bios have nothing to do with others, for example in the case
      of loop devices, there's no direct connection between the bios of an upper
      device and the bios of a lower device(=loop device).
      
      The safest way to assign different lock classes to different devices is
      to do it for each gendisk. In other words, this patch assigns a
      lockdep_map per gendisk and uses it when initializing completion in
      submit_bio_wait().
      Analyzed-by: NDave Chinner <david@fromorbit.com>
      Reported-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NByungchul Park <byungchul.park@lge.com>
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: amir73il@gmail.com
      Cc: axboe@kernel.dk
      Cc: david@fromorbit.com
      Cc: hch@infradead.org
      Cc: idryomov@gmail.com
      Cc: johan@kernel.org
      Cc: johannes.berg@intel.com
      Cc: kernel-team@lge.com
      Cc: linux-block@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-xfs@vger.kernel.org
      Cc: oleg@redhat.com
      Cc: tj@kernel.org
      Link: http://lkml.kernel.org/r/1508921765-15396-10-git-send-email-byungchul.park@lge.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e319e1fb
  9. 25 10月, 2017 1 次提交
  10. 17 10月, 2017 1 次提交
  11. 12 10月, 2017 11 次提交
  12. 11 10月, 2017 3 次提交
  13. 07 10月, 2017 1 次提交
  14. 26 9月, 2017 1 次提交
  15. 26 8月, 2017 1 次提交
  16. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  17. 10 8月, 2017 1 次提交
  18. 02 8月, 2017 1 次提交
  19. 11 7月, 2017 1 次提交
    • S
      block: call bio_uninit in bio_endio · b222dd2f
      Shaohua Li 提交于
      bio_free isn't a good place to free cgroup info. There are a
      lot of cases bio is allocated in special way (for example, in stack) and
      never gets called by bio_put hence bio_free, we are leaking memory. This
      patch moves the free to bio endio, which should be called anyway. The
      bio_uninit call in bio_free is kept, in case the bio never gets called
      bio endio.
      
      This assumes ->bi_end_io() doesn't access cgroup info, which seems true
      in my audit.
      
      This along with Christoph's integrity patch should fix the memory leak
      issue.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b222dd2f
  20. 04 7月, 2017 3 次提交
  21. 29 6月, 2017 1 次提交
    • J
      block: provide bio_uninit() free freeing integrity/task associations · 9ae3b3f5
      Jens Axboe 提交于
      Wen reports significant memory leaks with DIF and O_DIRECT:
      
      "With nvme devive + T10 enabled, On a system it has 256GB and started
      logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
      it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
      leaking.
      
      /proc/meminfo | grep SUnreclaim...
      
      SUnreclaim:      6752128 kB
      SUnreclaim:      6874880 kB
      SUnreclaim:      7238080 kB
      ....
      SUnreclaim:     22307264 kB
      SUnreclaim:     22485888 kB
      SUnreclaim:     22720256 kB
      
      When testcases with T10 enabled call into __blkdev_direct_IO_simple,
      code doesn't free memory allocated by bio_integrity_alloc. The patch
      fixes the issue. HTX has been run with +60 hours without failure."
      
      Since __blkdev_direct_IO_simple() allocates the bio on the stack, it
      doesn't go through the regular bio free. This means that any ancillary
      data allocated with the bio through the stack is not freed. Hence, we
      can leak the integrity data associated with the bio, if the device is
      using DIF/DIX.
      
      Fix this by providing a bio_uninit() and export it, so that we can use
      it to free this data. Note that this is a minimal fix for this issue.
      Any current user of bio's that are allocated outside of
      bio_alloc_bioset() suffers from this issue, most notably some drivers.
      We will fix those in a more comprehensive patch for 4.13. This also
      means that the commit marked as being fixed by this isn't the real
      culprit, it's just the most obvious one out there.
      
      Fixes: 542ff7bf ("block: new direct I/O implementation")
      Reported-by: NWen Xiong <wenxiong@linux.vnet.ibm.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9ae3b3f5
  22. 28 6月, 2017 1 次提交
  23. 19 6月, 2017 2 次提交
    • N
      block: remove bio_clone() and all references. · 9b10f6a9
      NeilBrown 提交于
      bio_clone() is no longer used.
      Only bio_clone_bioset() or bio_clone_fast().
      This is for the best, as bio_clone() used fs_bio_set,
      and filesystems are unlikely to want to use bio_clone().
      
      So remove bio_clone() and all references.
      This includes a fix to some incorrect documentation.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9b10f6a9
    • N
      blk: make the bioset rescue_workqueue optional. · 47e0fb46
      NeilBrown 提交于
      This patch converts bioset_create() to not create a workqueue by
      default, so alloctions will never trigger punt_bios_to_rescuer().  It
      also introduces a new flag BIOSET_NEED_RESCUER which tells
      bioset_create() to preserve the old behavior.
      
      All callers of bioset_create() that are inside block device drivers,
      are given the BIOSET_NEED_RESCUER flag.
      
      biosets used by filesystems or other top-level users do not
      need rescuing as the bio can never be queued behind other
      bios.  This includes fs_bio_set, blkdev_dio_pool,
      btrfs_bioset, xfs_ioend_bioset, and one allocated by
      target_core_iblock.c.
      
      biosets used by md/raid do not need rescuing as
      their usage was recently audited and revised to never
      risk deadlock.
      
      It is hoped that most, if not all, of the remaining biosets
      can end up being the non-rescued version.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Credit-to: Ming Lei <ming.lei@redhat.com> (minor fixes)
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      47e0fb46