1. 13 8月, 2021 3 次提交
  2. 12 8月, 2021 1 次提交
  3. 10 8月, 2021 5 次提交
  4. 03 8月, 2021 1 次提交
  5. 29 6月, 2021 1 次提交
  6. 26 6月, 2021 6 次提交
  7. 22 6月, 2021 1 次提交
  8. 18 6月, 2021 1 次提交
  9. 17 6月, 2021 1 次提交
  10. 16 6月, 2021 1 次提交
  11. 15 6月, 2021 12 次提交
  12. 14 6月, 2021 3 次提交
  13. 12 6月, 2021 1 次提交
  14. 09 6月, 2021 2 次提交
    • C
      bcache: avoid oversized read request in cache missing code path · 41fe8d08
      Coly Li 提交于
      In the cache missing code path of cached device, if a proper location
      from the internal B+ tree is matched for a cache miss range, function
      cached_dev_cache_miss() will be called in cache_lookup_fn() in the
      following code block,
      [code block 1]
        526         unsigned int sectors = KEY_INODE(k) == s->iop.inode
        527                 ? min_t(uint64_t, INT_MAX,
        528                         KEY_START(k) - bio->bi_iter.bi_sector)
        529                 : INT_MAX;
        530         int ret = s->d->cache_miss(b, s, bio, sectors);
      
      Here s->d->cache_miss() is the call backfunction pointer initialized as
      cached_dev_cache_miss(), the last parameter 'sectors' is an important
      hint to calculate the size of read request to backing device of the
      missing cache data.
      
      Current calculation in above code block may generate oversized value of
      'sectors', which consequently may trigger 2 different potential kernel
      panics by BUG() or BUG_ON() as listed below,
      
      1) BUG_ON() inside bch_btree_insert_key(),
      [code block 2]
         886         BUG_ON(b->ops->is_extents && !KEY_SIZE(k));
      2) BUG() inside biovec_slab(),
      [code block 3]
         51         default:
         52                 BUG();
         53                 return NULL;
      
      All the above panics are original from cached_dev_cache_miss() by the
      oversized parameter 'sectors'.
      
      Inside cached_dev_cache_miss(), parameter 'sectors' is used to calculate
      the size of data read from backing device for the cache missing. This
      size is stored in s->insert_bio_sectors by the following lines of code,
      [code block 4]
        909    s->insert_bio_sectors = min(sectors, bio_sectors(bio) + reada);
      
      Then the actual key inserting to the internal B+ tree is generated and
      stored in s->iop.replace_key by the following lines of code,
      [code block 5]
        911   s->iop.replace_key = KEY(s->iop.inode,
        912                    bio->bi_iter.bi_sector + s->insert_bio_sectors,
        913                    s->insert_bio_sectors);
      The oversized parameter 'sectors' may trigger panic 1) by BUG_ON() from
      the above code block.
      
      And the bio sending to backing device for the missing data is allocated
      with hint from s->insert_bio_sectors by the following lines of code,
      [code block 6]
        926    cache_bio = bio_alloc_bioset(GFP_NOWAIT,
        927                 DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS),
        928                 &dc->disk.bio_split);
      The oversized parameter 'sectors' may trigger panic 2) by BUG() from the
      agove code block.
      
      Now let me explain how the panics happen with the oversized 'sectors'.
      In code block 5, replace_key is generated by macro KEY(). From the
      definition of macro KEY(),
      [code block 7]
        71 #define KEY(inode, offset, size)                                  \
        72 ((struct bkey) {                                                  \
        73      .high = (1ULL << 63) | ((__u64) (size) << 20) | (inode),     \
        74      .low = (offset)                                              \
        75 })
      
      Here 'size' is 16bits width embedded in 64bits member 'high' of struct
      bkey. But in code block 1, if "KEY_START(k) - bio->bi_iter.bi_sector" is
      very probably to be larger than (1<<16) - 1, which makes the bkey size
      calculation in code block 5 is overflowed. In one bug report the value
      of parameter 'sectors' is 131072 (= 1 << 17), the overflowed 'sectors'
      results the overflowed s->insert_bio_sectors in code block 4, then makes
      size field of s->iop.replace_key to be 0 in code block 5. Then the 0-
      sized s->iop.replace_key is inserted into the internal B+ tree as cache
      missing check key (a special key to detect and avoid a racing between
      normal write request and cache missing read request) as,
      [code block 8]
        915   ret = bch_btree_insert_check_key(b, &s->op, &s->iop.replace_key);
      
      Then the 0-sized s->iop.replace_key as 3rd parameter triggers the bkey
      size check BUG_ON() in code block 2, and causes the kernel panic 1).
      
      Another kernel panic is from code block 6, is by the bvecs number
      oversized value s->insert_bio_sectors from code block 4,
              min(sectors, bio_sectors(bio) + reada)
      There are two possibility for oversized reresult,
      - bio_sectors(bio) is valid, but bio_sectors(bio) + reada is oversized.
      - sectors < bio_sectors(bio) + reada, but sectors is oversized.
      
      From a bug report the result of "DIV_ROUND_UP(s->insert_bio_sectors,
      PAGE_SECTORS)" from code block 6 can be 344, 282, 946, 342 and many
      other values which larther than BIO_MAX_VECS (a.k.a 256). When calling
      bio_alloc_bioset() with such larger-than-256 value as the 2nd parameter,
      this value will eventually be sent to biovec_slab() as parameter
      'nr_vecs' in following code path,
         bio_alloc_bioset() ==> bvec_alloc() ==> biovec_slab()
      Because parameter 'nr_vecs' is larger-than-256 value, the panic by BUG()
      in code block 3 is triggered inside biovec_slab().
      
      From the above analysis, we know that the 4th parameter 'sector' sent
      into cached_dev_cache_miss() may cause overflow in code block 5 and 6,
      and finally cause kernel panic in code block 2 and 3. And if result of
      bio_sectors(bio) + reada exceeds valid bvecs number, it may also trigger
      kernel panic in code block 3 from code block 6.
      
      Now the almost-useless readahead size for cache missing request back to
      backing device is removed, this patch can fix the oversized issue with
      more simpler method.
      - add a local variable size_limit,  set it by the minimum value from
        the max bkey size and max bio bvecs number.
      - set s->insert_bio_sectors by the minimum value from size_limit,
        sectors, and the sectors size of bio.
      - replace sectors by s->insert_bio_sectors to do bio_next_split.
      
      By the above method with size_limit, s->insert_bio_sectors will never
      result oversized replace_key size or bio bvecs number. And split bio
      'miss' from bio_next_split() will always match the size of 'cache_bio',
      that is the current maximum bio size we can sent to backing device for
      fetching the cache missing data.
      
      Current problmatic code can be partially found since Linux v3.13-rc1,
      therefore all maintained stable kernels should try to apply this fix.
      Reported-by: NAlexander Ullrich <ealex1979@gmail.com>
      Reported-by: NDiego Ercolani <diego.ercolani@gmail.com>
      Reported-by: NJan Szubiak <jan.szubiak@linuxpolska.pl>
      Reported-by: NMarco Rebhan <me@dblsaiko.net>
      Reported-by: NMatthias Ferdinand <bcache@mfedv.net>
      Reported-by: NVictor Westerhuis <victor@westerhu.is>
      Reported-by: NVojtech Pavlik <vojtech@suse.cz>
      Reported-and-tested-by: NRolf Fokkens <rolf@rolffokkens.nl>
      Reported-and-tested-by: NThorsten Knabe <linux@thorsten-knabe.de>
      Signed-off-by: NColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Nix <nix@esperi.org.uk>
      Cc: Takashi Iwai <tiwai@suse.com>
      Link: https://lore.kernel.org/r/20210607125052.21277-3-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
      41fe8d08
    • C
      bcache: remove bcache device self-defined readahead · 1616a4c2
      Coly Li 提交于
      For read cache missing, bcache defines a readahead size for the read I/O
      request to the backing device for the missing data. This readahead size
      is initialized to 0, and almost no one uses it to avoid unnecessary read
      amplifying onto backing device and write amplifying onto cache device.
      Considering upper layer file system code has readahead logic allready
      and works fine with readahead_cache_policy sysfile interface, we don't
      have to keep bcache self-defined readahead anymore.
      
      This patch removes the bcache self-defined readahead for cache missing
      request for backing device, and the readahead sysfs file interfaces are
      removed as well.
      
      This is the preparation for next patch to fix potential kernel panic due
      to oversized request in a simpler method.
      Reported-by: NAlexander Ullrich <ealex1979@gmail.com>
      Reported-by: NDiego Ercolani <diego.ercolani@gmail.com>
      Reported-by: NJan Szubiak <jan.szubiak@linuxpolska.pl>
      Reported-by: NMarco Rebhan <me@dblsaiko.net>
      Reported-by: NMatthias Ferdinand <bcache@mfedv.net>
      Reported-by: NVictor Westerhuis <victor@westerhu.is>
      Reported-by: NVojtech Pavlik <vojtech@suse.cz>
      Reported-and-tested-by: NRolf Fokkens <rolf@rolffokkens.nl>
      Reported-and-tested-by: NThorsten Knabe <linux@thorsten-knabe.de>
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: stable@vger.kernel.org
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Nix <nix@esperi.org.uk>
      Cc: Takashi Iwai <tiwai@suse.com>
      Link: https://lore.kernel.org/r/20210607125052.21277-2-colyli@suse.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
      1616a4c2
  15. 05 6月, 2021 1 次提交
    • D
      dm crypt: Fix zoned block device support · f34ee1dc
      Damien Le Moal 提交于
      Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector
      of the zone to be written instead of the actual sector location to
      write. The write location is determined by the device and returned to
      the host upon completion of the operation. This interface, while simple
      and efficient for writing into sequential zones of a zoned block
      device, is incompatible with the use of sector values to calculate a
      cypher block IV. All data written in a zone end up using the same IV
      values corresponding to the first sectors of the zone, but read
      operation will specify any sector within the zone resulting in an IV
      mismatch between encryption and decryption.
      
      To solve this problem, report to DM core that zone append operations are
      not supported. This result in the zone append operations being emulated
      using regular write operations.
      Reported-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      f34ee1dc