1. 13 5月, 2020 2 次提交
  2. 28 3月, 2020 1 次提交
  3. 12 3月, 2020 1 次提交
    • A
      block, zoned: fix integer overflow with BLKRESETZONE et al · 11bde986
      Alexey Dobriyan 提交于
      Check for overflow in addition before checking for end-of-block-device.
      
      Steps to reproduce:
      
      	#define _GNU_SOURCE 1
      	#include <sys/ioctl.h>
      	#include <sys/types.h>
      	#include <sys/stat.h>
      	#include <fcntl.h>
      
      	typedef unsigned long long __u64;
      
      	struct blk_zone_range {
      	        __u64 sector;
      	        __u64 nr_sectors;
      	};
      
      	#define BLKRESETZONE    _IOW(0x12, 131, struct blk_zone_range)
      
      	int main(void)
      	{
      	        int fd = open("/dev/nullb0", O_RDWR|O_DIRECT);
      	        struct blk_zone_range zr = {4096, 0xfffffffffffff000ULL};
      	        ioctl(fd, BLKRESETZONE, &zr);
      	        return 0;
      	}
      
      BUG: KASAN: null-ptr-deref in submit_bio_wait+0x74/0xe0
      Write of size 8 at addr 0000000000000040 by task a.out/1590
      
      CPU: 8 PID: 1590 Comm: a.out Not tainted 5.6.0-rc1-00019-g359c92c0 #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
      Call Trace:
       dump_stack+0x76/0xa0
       __kasan_report.cold+0x5/0x3e
       kasan_report+0xe/0x20
       submit_bio_wait+0x74/0xe0
       blkdev_zone_mgmt+0x26f/0x2a0
       blkdev_zone_mgmt_ioctl+0x14b/0x1b0
       blkdev_ioctl+0xb28/0xe60
       block_ioctl+0x69/0x80
       ksys_ioctl+0x3af/0xa50
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlexey Dobriyan (SK hynix) <adobriyan@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      11bde986
  4. 09 1月, 2020 1 次提交
  5. 04 12月, 2019 1 次提交
  6. 03 12月, 2019 5 次提交
  7. 13 11月, 2019 5 次提交
    • C
      block: rework zone reporting · d4100351
      Christoph Hellwig 提交于
      Avoid the need to allocate a potentially large array of struct blk_zone
      in the block layer by switching the ->report_zones method interface to
      a callback model. Now the caller simply supplies a callback that is
      executed on each reported zone, and private data for it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d4100351
    • D
      block: Remove partition support for zoned block devices · 5eac3eb3
      Damien Le Moal 提交于
      No known partitioning tool supports zoned block devices, especially the
      host managed flavor with strong sequential write constraints.
      Furthermore, there are also no known user nor use cases for partitioned
      zoned block devices.
      
      This patch removes partition device creation for zoned block devices,
      which allows simplifying the processing of zone commands for zoned
      block devices. A warning is added if a partition table is found on the
      device.
      
      For report zones operations no zone sector information remapping is
      necessary anymore, simplifying the code. Of note is that remapping of
      zone reports for DM targets is still necessary as done by
      dm_remap_zone_report().
      
      Similarly, remaping of a zone reset bio is not necessary anymore.
      Testing for the applicability of the zone reset all request also becomes
      simpler and only needs to check that the number of sectors of the
      requested zone range is equal to the disk capacity.
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5eac3eb3
    • D
      block: Simplify report zones execution · ceeb373a
      Damien Le Moal 提交于
      All kernel users of blkdev_report_zones() as well as applications use
      through ioctl(BLKZONEREPORT) expect to potentially get less zone
      descriptors than requested. As such, the use of the internal report
      zones command execution loop implemented by blk_report_zones() is
      not necessary and can even be harmful to performance by causing the
      execution of inefficient small zones report command to service the
      reminder of a requested zone array.
      
      This patch removes blk_report_zones(), simplifying the code. Also
      remove a now incorrect comment in dm_blk_report_zones().
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJavier Gonzalez <javier@javigon.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ceeb373a
    • C
      block: cleanup the !zoned case in blk_revalidate_disk_zones · c98c3d09
      Christoph Hellwig 提交于
      blk_revalidate_disk_zones is never called for non-zoned devices.  Just
      return early and warn instead of trying to handle this case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c98c3d09
    • D
      block: Enhance blk_revalidate_disk_zones() · d9dd7308
      Damien Le Moal 提交于
      For ZBC and ZAC zoned devices, the scsi driver revalidation processing
      implemented by sd_revalidate_disk() includes a call to
      sd_zbc_read_zones() which executes a full disk zone report used to
      check that all zones of the disk are the same size. This processing is
      followed by a call to blk_revalidate_disk_zones(), used to initialize
      the device request queue zone bitmaps (zone type and zone write lock
      bitmaps). To do so, blk_revalidate_disk_zones() also executes a full
      device zone report to obtain zone types. As a result, the entire
      zoned block device revalidation process includes two full device zone
      report.
      
      By moving the zone size checks into blk_revalidate_disk_zones(), this
      process can be optimized to a single full device zone report, leading to
      shorter device scan and revalidation times. This patch implements this
      optimization, reducing the original full device zone report implemented
      in sd_zbc_check_zones() to a single, small, report zones command
      execution to obtain the size of the first zone of the device. Checks
      whether all zones of the device are the same size as the first zone
      size are moved to the generic blk_check_zone() function called from
      blk_revalidate_disk_zones().
      
      This optimization also has the following benefits:
      1) fewer memory allocations in the scsi layer during disk revalidation
         as the potentailly large buffer for zone report execution is not
         needed.
      2) Implement zone checks in a generic manner, reducing the burden on
         device driver which only need to obtain the zone size and check that
         this size is a power of 2 number of LBAs. Any new type of zoned
         block device will benefit from this.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d9dd7308
  8. 07 11月, 2019 4 次提交
  9. 05 8月, 2019 1 次提交
  10. 12 7月, 2019 2 次提交
  11. 10 7月, 2019 1 次提交
    • D
      block: Fix potential overflow in blk_report_zones() · 113ab72e
      Damien Le Moal 提交于
      For large values of the number of zones reported and/or large zone
      sizes, the sector increment calculated with
      
      blk_queue_zone_sectors(q) * n
      
      in blk_report_zones() loop can overflow the unsigned int type used for
      the calculation as both "n" and blk_queue_zone_sectors() value are
      unsigned int. E.g. for a device with 256 MB zones (524288 sectors),
      overflow happens with 8192 or more zones reported.
      
      Changing the return type of blk_queue_zone_sectors() to sector_t, fixes
      this problem and avoids overflow problem for all other callers of this
      helper too. The same change is also applied to the bdev_zone_sectors()
      helper.
      
      Fixes: e76239a3 ("block: add a report_zones method")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      113ab72e
  12. 01 5月, 2019 1 次提交
  13. 12 12月, 2018 1 次提交
  14. 16 11月, 2018 1 次提交
  15. 26 10月, 2018 5 次提交
    • D
      block: Introduce blk_revalidate_disk_zones() · bf505456
      Damien Le Moal 提交于
      Drivers exposing zoned block devices have to initialize and maintain
      correctness (i.e. revalidate) of the device zone bitmaps attached to
      the device request queue (seq_zones_bitmap and seq_zones_wlock).
      
      To simplify coding this, introduce a generic helper function
      blk_revalidate_disk_zones() suitable for most (and likely all) cases.
      This new function always update the seq_zones_bitmap and seq_zones_wlock
      bitmaps as well as the queue nr_zones field when called for a disk
      using a request based queue. For a disk using a BIO based queue, only
      the number of zones is updated since these queues do not have
      schedulers and so do not need the zone bitmaps.
      
      With this change, the zone bitmap initialization code in sd_zbc.c can be
      replaced with a call to this function in sd_zbc_read_zones(), which is
      called from the disk revalidate block operation method.
      
      A call to blk_revalidate_disk_zones() is also added to the null_blk
      driver for devices created with the zoned mode enabled.
      
      Finally, to ensure that zoned devices created with dm-linear or
      dm-flakey expose the correct number of zones through sysfs, a call to
      blk_revalidate_disk_zones() is added to dm_table_set_restrictions().
      
      The zone bitmaps allocated and initialized with
      blk_revalidate_disk_zones() are freed automatically from
      __blk_release_queue() using the block internal function
      blk_queue_free_zone_bitmaps().
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bf505456
    • C
      block: add a report_zones method · e76239a3
      Christoph Hellwig 提交于
      Dispatching a report zones command through the request queue is a major
      pain due to the command reply payload rewriting necessary. Given that
      blkdev_report_zones() is executing everything synchronously, implement
      report zones as a block device file operation instead, allowing major
      simplification of the code in many places.
      
      sd, null-blk, dm-linear and dm-flakey being the only block device
      drivers supporting exposing zoned block devices, these drivers are
      modified to provide the device side implementation of the
      report_zones() block device file operation.
      
      For device mappers, a new report_zones() target type operation is
      defined so that the upper block layer calls blkdev_report_zones() can
      be propagated down to the underlying devices of the dm targets.
      Implementation for this new operation is added to the dm-linear and
      dm-flakey targets.
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      [Damien]
      * Changed method block_device argument to gendisk
      * Various bug fixes and improvements
      * Added support for null_blk, dm-linear and dm-flakey.
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e76239a3
    • D
      block: Improve zone reset execution · a2d6b3a2
      Damien Le Moal 提交于
      There is no need to synchronously execute all REQ_OP_ZONE_RESET BIOs
      necessary to reset a range of zones. Similarly to what is done for
      discard BIOs in blk-lib.c, all zone reset BIOs can be chained and
      executed asynchronously and a synchronous call done only for the last
      BIO of the chain.
      
      Modify blkdev_reset_zones() to operate similarly to
      blkdev_issue_discard() using the next_bio() helper for chaining BIOs. To
      avoid code duplication of that function in blk_zoned.c, rename
      next_bio() into blk_next_bio() and declare it as a block internal
      function in blk.h.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a2d6b3a2
    • D
      block: Limit allocation of zone descriptors for report zones · 2e85fbaf
      Damien Le Moal 提交于
      There is no point in allocating more zone descriptors than the number of
      zones a block device has for doing a zone report. Avoid doing that in
      blkdev_report_zones_ioctl() by limiting the number of zone decriptors
      allocated internally to process the user request.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2e85fbaf
    • D
      block: Introduce blkdev_nr_zones() helper · a91e1380
      Damien Le Moal 提交于
      Introduce the blkdev_nr_zones() helper function to get the total
      number of zones of a zoned block device. This number is always 0 for a
      regular block device (q->limits.zoned == BLK_ZONED_NONE case).
      
      Replace hard-coded number of zones calculation in dmz_get_zoned_device()
      with a call to this helper.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a91e1380
  16. 09 7月, 2018 1 次提交
  17. 13 6月, 2018 1 次提交
    • K
      treewide: kvmalloc() -> kvmalloc_array() · 344476e1
      Kees Cook 提交于
      The kvmalloc() function has a 2-factor argument form, kvmalloc_array(). This
      patch replaces cases of:
      
              kvmalloc(a * b, gfp)
      
      with:
              kvmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kvmalloc(a * b * c, gfp)
      
      with:
      
              kvmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kvmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kvmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kvmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kvmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kvmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kvmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kvmalloc
      + kvmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kvmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kvmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kvmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kvmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kvmalloc(C1 * C2 * C3, ...)
      |
        kvmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kvmalloc(sizeof(THING) * C2, ...)
      |
        kvmalloc(sizeof(TYPE) * C2, ...)
      |
        kvmalloc(C1 * C2 * C3, ...)
      |
        kvmalloc(C1 * C2, ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kvmalloc
      + kvmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      344476e1
  18. 23 5月, 2018 1 次提交
    • B
      blkdev_report_zones_ioctl(): Use vmalloc() to allocate large buffers · 327ea4ad
      Bart Van Assche 提交于
      Avoid that complaints similar to the following appear in the kernel log
      if the number of zones is sufficiently large:
      
        fio: page allocation failure: order:9, mode:0x140c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
        Call Trace:
        dump_stack+0x63/0x88
        warn_alloc+0xf5/0x190
        __alloc_pages_slowpath+0x8f0/0xb0d
        __alloc_pages_nodemask+0x242/0x260
        alloc_pages_current+0x6a/0xb0
        kmalloc_order+0x18/0x50
        kmalloc_order_trace+0x26/0xb0
        __kmalloc+0x20e/0x220
        blkdev_report_zones_ioctl+0xa5/0x1a0
        blkdev_ioctl+0x1ba/0x930
        block_ioctl+0x41/0x50
        do_vfs_ioctl+0xaa/0x610
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x79/0x1b0
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 3ed05a98 ("blk-zoned: implement ioctls")
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Shaun Tancheff <shaun.tancheff@seagate.com>
      Cc: Damien Le Moal <damien.lemoal@hgst.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      327ea4ad
  19. 09 3月, 2018 1 次提交
  20. 06 1月, 2018 1 次提交
    • C
      block: introduce zoned block devices zone write locking · 6cc77e9c
      Christoph Hellwig 提交于
      Components relying only on the request_queue structure for accessing
      block devices (e.g. I/O schedulers) have a limited knowledged of the
      device characteristics. In particular, the device capacity cannot be
      easily discovered, which for a zoned block device also result in the
      inability to easily know the number of zones of the device (the zone
      size is indicated by the chunk_sectors field of the queue limits).
      
      Introduce the nr_zones field to the request_queue structure to simplify
      access to this information. Also, add the bitmap seq_zone_bitmap which
      indicates which zones of the device are sequential zones (write
      preferred or write required) and the bitmap seq_zones_wlock which
      indicates if a zone is write locked, that is, if a write request
      targeting a zone was dispatched to the device. These fields are
      initialized by the low level block device driver (sd.c for ZBC/ZAC
      disks). They are not initialized by stacking drivers (device mappers)
      handling zoned block devices (e.g. dm-linear).
      
      Using this, I/O schedulers can introduce zone write locking to control
      request dispatching to a zoned block device and avoid write request
      reordering by limiting to at most a single write request per zone
      outside of the scheduler at any time.
      
      Based on previous patches from Damien Le Moal.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      [Damien]
      * Fixed comments and identation in blkdev.h
      * Changed helper functions
      * Fixed this commit message
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6cc77e9c
  21. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  22. 12 1月, 2017 1 次提交
    • D
      block: Rename blk_queue_zone_size and bdev_zone_size · f99e8648
      Damien Le Moal 提交于
      All block device data fields and functions returning a number of 512B
      sectors are by convention named xxx_sectors while names in the form
      xxx_size are generally used for a number of bytes. The blk_queue_zone_size
      and bdev_zone_size functions were not following this convention so rename
      them.
      
      No functional change is introduced by this patch.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      
      Collapsed the two patches, they were nonsensically split and broke
      bisection.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f99e8648
  23. 25 10月, 2016 1 次提交