1. 25 1月, 2021 3 次提交
    • C
      block: propagate BLKROSET on the whole device to all partitions · 947139bf
      Christoph Hellwig 提交于
      Change the policy so that a BLKROSET on the whole device also affects
      partitions.  To quote Martin K. Petersen:
      
      It's very common for database folks to twiddle the read-only state of
      block devices and partitions. I know that our users will find it very
      counter-intuitive that setting /dev/sda read-only won't prevent writes
      to /dev/sda1.
      
      The existing behavior is inconsistent in the sense that doing:
      
        # blockdev --setro /dev/sda
        # echo foo > /dev/sda1
      
      permits writes. But:
      
        # blockdev --setro /dev/sda
        <something triggers revalidate>
        # echo foo > /dev/sda1
      
      doesn't.
      
      And a subsequent:
      
        # blockdev --setrw /dev/sda
        # echo foo > /dev/sda1
      
      doesn't work either since sda1's read-only policy has been inherited
      from the whole-disk device.
      
      You need to do:
      
        # blockdev --rereadpt
      
      after setting the whole-disk device rw to effectuate the same change on
      the partitions, otherwise they are stuck being read-only indefinitely.
      
      However, setting the read-only policy on a partition does *not* require
      the revalidate step. As a matter of fact, doing the revalidate will blow
      away the policy setting you just made.
      
      So the user needs to take different actions depending on whether they
      are trying to read-protect a whole-disk device or a partition. Despite
      using the same ioctl. That is really confusing.
      
      I have lost count how many times our customers have had data clobbered
      because of ambiguity of the existing whole-disk device policy. The
      current behavior violates the principle of least surprise by letting the
      user think they write protected the whole disk when they actually
      didn't.
      Suggested-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      947139bf
    • C
      block: add a hard-readonly flag to struct gendisk · 52f019d4
      Christoph Hellwig 提交于
      Commit 20bd1d02 ("scsi: sd: Keep disk read-only when re-reading
      partition") addressed a long-standing problem with user read-only
      policy being overridden as a result of a device-initiated revalidate.
      The commit has since been reverted due to a regression that left some
      USB devices read-only indefinitely.
      
      To fix the underlying problems with revalidate we need to keep track
      of hardware state and user policy separately.
      
      The gendisk has been updated to reflect the current hardware state set
      by the device driver. This is done to allow returning the device to
      the hardware state once the user clears the BLKROSET flag.
      
      The resulting semantics are as follows:
      
       - If BLKROSET sets a given partition read-only, that partition will
         remain read-only even if the underlying storage stack initiates a
         revalidate. However, the BLKRRPART ioctl will cause the partition
         table to be dropped and any user policy on partitions will be lost.
      
       - If BLKROSET has not been set, both the whole disk device and any
         partitions will reflect the current write-protect state of the
         underlying device.
      
      Based on a patch from Martin K. Petersen <martin.petersen@oracle.com>.
      Reported-by: NOleksii Kurochko <olkuroch@cisco.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201221Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      52f019d4
    • C
      block: remove the NULL bdev check in bdev_read_only · 6f0d9689
      Christoph Hellwig 提交于
      Only a single caller can end up in bdev_read_only, so move the check
      there.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6f0d9689
  2. 08 1月, 2021 1 次提交
  3. 06 1月, 2021 3 次提交
    • M
      block: fix use-after-free in disk_part_iter_next · aebf5db9
      Ming Lei 提交于
      Make sure that bdgrab() is done on the 'block_device' instance before
      referring to it for avoiding use-after-free.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: syzbot+825f0f9657d4e528046e@syzkaller.appspotmail.com
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      aebf5db9
    • J
      bfq: Fix computation of shallow depth · 6d4d2735
      Jan Kara 提交于
      BFQ computes number of tags it allows to be allocated for each request type
      based on tag bitmap. However it uses 1 << bitmap.shift as number of
      available tags which is wrong. 'shift' is just an internal bitmap value
      containing logarithm of how many bits bitmap uses in each bitmap word.
      Thus number of tags allowed for some request types can be far to low.
      Use proper bitmap.depth which has the number of tags instead.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6d4d2735
    • T
      blk-iocost: fix NULL iocg deref from racing against initialization · d16baa3f
      Tejun Heo 提交于
      When initializing iocost for a queue, its rqos should be registered before
      the blkcg policy is activated to allow policy data initiailization to lookup
      the associated ioc. This unfortunately means that the rqos methods can be
      called on bios before iocgs are attached to all existing blkgs.
      
      While the race is theoretically possible on ioc_rqos_throttle(), it mostly
      happened in ioc_rqos_merge() due to the difference in how they lookup ioc.
      The former determines it from the passed in @rqos and then bails before
      dereferencing iocg if the looked up ioc is disabled, which most likely is
      the case if initialization is still in progress. The latter looked up ioc by
      dereferencing the possibly NULL iocg making it a lot more prone to actually
      triggering the bug.
      
      * Make ioc_rqos_merge() use the same method as ioc_rqos_throttle() to look
        up ioc for consistency.
      
      * Make ioc_rqos_throttle() and ioc_rqos_merge() test for NULL iocg before
        dereferencing it.
      
      * Explain the danger of NULL iocgs in blk_iocost_init().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJonathan Lemon <bsd@fb.com>
      Cc: stable@vger.kernel.org # v5.4+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d16baa3f
  4. 30 12月, 2020 1 次提交
  5. 22 12月, 2020 1 次提交
  6. 18 12月, 2020 1 次提交
  7. 17 12月, 2020 2 次提交
  8. 13 12月, 2020 3 次提交
  9. 10 12月, 2020 4 次提交
  10. 08 12月, 2020 11 次提交
  11. 05 12月, 2020 5 次提交
  12. 03 12月, 2020 3 次提交
    • Y
      blk-throttle: don't check whether or not lower limit is valid if... · acaf523a
      Yu Kuai 提交于
      blk-throttle: don't check whether or not lower limit is valid if CONFIG_BLK_DEV_THROTTLING_LOW is off
      
      blk_throtl_update_limit_valid() will search for descendants to see if
      'LIMIT_LOW' of bps/iops and READ/WRITE is nonzero. However, they're always
      zero if CONFIG_BLK_DEV_THROTTLING_LOW is not set, furthermore, a lot of
      time will be wasted to iterate descendants.
      
      Thus do nothing in blk_throtl_update_limit_valid() in such situation.
      Signed-off-by: NYu Kuai <yukuai3@huawei.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      acaf523a
    • J
      block: fix inflight statistics of part0 · b0d97557
      Jeffle Xu 提交于
      The inflight of partition 0 doesn't include inflight IOs to all
      sub-partitions, since currently mq calculates inflight of specific
      partition by simply camparing the value of the partition pointer.
      
      Thus the following case is possible:
      
      $ cat /sys/block/vda/inflight
             0        0
      $ cat /sys/block/vda/vda1/inflight
             0      128
      
      While single queue device (on a previous version, e.g. v3.10) has no
      this issue:
      
      $cat /sys/block/sda/sda3/inflight
             0       33
      $cat /sys/block/sda/inflight
             0       33
      
      Partition 0 should be specially handled since it represents the whole
      disk. This issue is introduced since commit bf0ddaba ("blk-mq: fix
      sysfs inflight counter").
      
      Besides, this patch can also fix the inflight statistics of part 0 in
      /proc/diskstats. Before this patch, the inflight statistics of part 0
      doesn't include that of sub partitions. (I have marked the 'inflight'
      field with asterisk.)
      
      $cat /proc/diskstats
       259       0 nvme0n1 45974469 0 367814768 6445794 1 0 1 0 *0* 111062 6445794 0 0 0 0 0 0
       259       2 nvme0n1p1 45974058 0 367797952 6445727 0 0 0 0 *33* 111001 6445727 0 0 0 0 0 0
      
      This is introduced since commit f299b7c7 ("blk-mq: provide internal
      in-flight variant").
      
      Fixes: bf0ddaba ("blk-mq: fix sysfs inflight counter")
      Fixes: f299b7c7 ("blk-mq: provide internal in-flight variant")
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      [axboe: adapt for 5.11 partition change]
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b0d97557
    • P
      bio: optimise bvec iteration · 22b56c29
      Pavel Begunkov 提交于
      __bio_for_each_bvec(), __bio_for_each_segment() and bio_copy_data_iter()
      fall under conditions of bvec_iter_advance_single(), which is a faster
      and slimmer version of bvec_iter_advance(). Add
      bio_advance_iter_single() and convert them.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      22b56c29
  13. 02 12月, 2020 2 次提交