1. 19 6月, 2017 6 次提交
  2. 18 6月, 2017 1 次提交
    • N
      loop: Add PF_LESS_THROTTLE to block/loop device thread. · b2ee7d46
      NeilBrown 提交于
      When a filesystem is mounted from a loop device, writes are
      throttled by balance_dirty_pages() twice: once when writing
      to the filesystem and once when the loop_handle_cmd() writes
      to the backing file.  This double-throttling can trigger
      positive feedback loops that create significant delays.  The
      throttling at the lower level is seen by the upper level as
      a slow device, so it throttles extra hard.
      
      The PF_LESS_THROTTLE flag was created to handle exactly this
      circumstance, though with an NFS filesystem mounted from a
      local NFS server.  It reduces the throttling on the lower
      layer so that it can proceed largely unthrottled.
      
      To demonstrate this, create a filesystem on a loop device
      and write (e.g. with dd) several large files which combine
      to consume significantly more than the limit set by
      /proc/sys/vm/dirty_ratio or dirty_bytes.  Measure the total
      time taken.
      
      When I do this directly on a device (no loop device) the
      total time for several runs (mkfs, mount, write 200 files,
      umount) is fairly stable: 28-35 seconds.
      When I do this over a loop device the times are much worse
      and less stable.  52-460 seconds.  Half below 100seconds,
      half above.
      When I apply this patch, the times become stable again,
      though not as fast as the no-loop-back case: 53-72 seconds.
      
      There may be room for further improvement as the total overhead still
      seems too high, but this is a big improvement.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMing Lei <tom.leiming@gmail.com>
      Suggested-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b2ee7d46
  3. 16 6月, 2017 1 次提交
  4. 09 6月, 2017 5 次提交
    • C
      block: switch bios to blk_status_t · 4e4cbee9
      Christoph Hellwig 提交于
      Replace bi_error with a new bi_status to allow for a clear conversion.
      Note that device mapper overloaded bi_error with a private value, which
      we'll have to keep arround at least for now and thus propagate to a
      proper blk_status_t value.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4e4cbee9
    • C
      blk-mq: switch ->queue_rq return value to blk_status_t · fc17b653
      Christoph Hellwig 提交于
      Use the same values for use for request completion errors as the return
      value from ->queue_rq.  BLK_STS_RESOURCE is special cased to cause
      a requeue, and all the others are completed as-is.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      fc17b653
    • C
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig 提交于
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2a842aca
    • J
      nbd: set sk->sk_sndtimeo for our sockets · dc88e34d
      Josef Bacik 提交于
      If the nbd server stops receiving packets altogether we will get stuck
      waiting for them to receive indefinitely as the tcp buffer will never
      empty, which looks like a deadlock.  Fix this by setting the sk send
      timeout to our configured timeout, that way if the server really
      misbehaves we'll disconnect cleanly instead of waiting forever.
      Reported-by: NDan Melnic <dmm@fb.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      dc88e34d
    • A
      loop: fix error handling regression · b040ad9c
      Arnd Bergmann 提交于
      gcc points out an unusual indentation:
      
      drivers/block/loop.c: In function 'loop_set_status':
      drivers/block/loop.c:1149:3: error: this 'if' clause does not guard... [-Werror=misleading-indentation]
         if (figure_loop_size(lo, info->lo_offset, info->lo_sizelimit,
         ^~
      drivers/block/loop.c:1152:4: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if'
          goto exit;
      
      This was introduced by a new feature that accidentally moved the opening
      braces from one condition to another. Adding a second pair of braces
      makes it work correctly again and also more readable.
      
      Fixes: f2c6df7d ("loop: support 4k physical blocksize")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b040ad9c
  5. 08 6月, 2017 3 次提交
    • H
      loop: support 4k physical blocksize · f2c6df7d
      Hannes Reinecke 提交于
      When generating bootable VM images certain systems (most notably
      s390x) require devices with 4k blocksize. This patch implements
      a new flag 'LO_FLAGS_BLOCKSIZE' which will set the physical
      blocksize to that of the underlying device, and allow to change
      the logical blocksize for up to the physical blocksize.
      Signed-off-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f2c6df7d
    • H
      51001b7d
    • J
      Fix loop device flush before configure v3 · 64604957
      James Wang 提交于
      While installing SLES-12 (based on v4.4), I found that the installer
      will stall for 60+ seconds during LVM disk scan.  The root cause was
      determined to be the removal of a bound device check in loop_flush()
      by commit b5dd2f60 ("block: loop: improve performance via blk-mq").
      
      Restoring this check, examining ->lo_state as set by loop_set_fd()
      eliminates the bad behavior.
      
      Test method:
      modprobe loop max_loop=64
      dd if=/dev/zero of=disk bs=512 count=200K
      for((i=0;i<4;i++))do losetup -f disk; done
      mkfs.ext4 -F /dev/loop0
      for((i=0;i<4;i++))do mkdir t$i; mount /dev/loop$i t$i;done
      for f in `ls /dev/loop[0-9]*|sort`; do \
      	echo $f; dd if=$f of=/dev/null  bs=512 count=1; \
      	done
      
      Test output:  stock          patched
      /dev/loop0    18.1217e-05    8.3842e-05
      /dev/loop1     6.1114e-05    0.000147979
      /dev/loop10    0.414701      0.000116564
      /dev/loop11    0.7474        6.7942e-05
      /dev/loop12    0.747986      8.9082e-05
      /dev/loop13    0.746532      7.4799e-05
      /dev/loop14    0.480041      9.3926e-05
      /dev/loop15    1.26453       7.2522e-05
      
      Note that from loop10 onward, the device is not mounted, yet the
      stock kernel consumes several orders of magnitude more wall time
      than it does for a mounted device.
      (Thanks for Mike Galbraith <efault@gmx.de>, give a changelog review.)
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJames Wang <jnwang@suse.com>
      Fixes: b5dd2f60 ("block: loop: improve performance via blk-mq")
      Signed-off-by: NJens Axboe <axboe@fb.com>
      64604957
  6. 02 6月, 2017 2 次提交
  7. 30 5月, 2017 3 次提交
  8. 29 5月, 2017 1 次提交
    • I
      rbd: implement REQ_OP_WRITE_ZEROES · 6ac56951
      Ilya Dryomov 提交于
      Commit 93c1defe ("rbd: remove the discard_zeroes_data flag")
      explicitly didn't implement REQ_OP_WRITE_ZEROES for rbd, while the
      following commit 48920ff2 ("block: remove the discard_zeroes_data
      flag") dropped ->discard_zeroes_data in favor of REQ_OP_WRITE_ZEROES.
      
      rbd does support efficient zeroing via CEPH_OSD_OP_ZERO opcode and will
      release either some or all blocks depending on whether the zeroing
      request is rbd_obj_bytes() aligned.  This is how we currently implement
      discards, so REQ_OP_WRITE_ZEROES can be identical to REQ_OP_DISCARD for
      now.  Caveats:
      
      - REQ_NOUNMAP is ignored, but AFAICT that's true of at least two other
        current implementations - nvme and loop
      
      - there is no ->write_zeroes_alignment and blk_bio_write_zeroes_split()
        is hence less helpful than blk_bio_discard_split(), but this can (and
        should) be fixed on the rbd side
      
      In the future we will split these into two code paths to respect
      REQ_NOUNMAP on zeroout and save on zeroing blocks that couldn't be
      released on discard.
      
      Fixes: 93c1defe ("rbd: remove the discard_zeroes_data flag")
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NJason Dillaman <dillaman@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      6ac56951
  9. 16 5月, 2017 1 次提交
  10. 12 5月, 2017 1 次提交
  11. 09 5月, 2017 3 次提交
  12. 04 5月, 2017 13 次提交