1. 15 6月, 2020 3 次提交
    • C
      block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group() · 8cf52b47
      Carlo Nonato 提交于
      task #28557799
      
      [ Upstream commit 14afc59361976c0ba39e3a9589c3eaa43ebc7e1d ]
      
      The bfq_find_set_group() function takes as input a blkcg (which represents
      a cgroup) and retrieves the corresponding bfq_group, then it updates the
      bfq internal group hierarchy (see comments inside the function for why
      this is needed) and finally it returns the bfq_group.
      In the hierarchy update cycle, the pointer holding the correct bfq_group
      that has to be returned is mistakenly used to traverse the hierarchy
      bottom to top, meaning that in each iteration it gets overwritten with the
      parent of the current group. Since the update cycle stops at root's
      children (depth = 2), the overwrite becomes a problem only if the blkcg
      describes a cgroup at a hierarchy level deeper than that (depth > 2). In
      this case the root's child that happens to be also an ancestor of the
      correct bfq_group is returned. The main consequence is that processes
      contained in a cgroup at depth greater than 2 are wrongly placed in the
      group described above by BFQ.
      
      This commits fixes this problem by using a different bfq_group pointer in
      the update cycle in order to avoid the overwrite of the variable holding
      the original group reference.
      Reported-by: NKwon Je Oh <kwonje.oh2@gmail.com>
      Signed-off-by: NCarlo Nonato <carlo.nonato95@gmail.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      8cf52b47
    • M
      block: fix an integer overflow in logical block size · 8b05616d
      Mikulas Patocka 提交于
      task #28557799
      
      commit ad6bf88a6c19a39fb3b0045d78ea880325dfcf15 upstream.
      
      Logical block size has type unsigned short. That means that it can be at
      most 32768. However, there are architectures that can run with 64k pages
      (for example arm64) and on these architectures, it may be possible to
      create block devices with 64k block size.
      
      For exmaple (run this on an architecture with 64k pages):
      
      Mount will fail with this error because it tries to read the superblock using 2-sector
      access:
        device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
        EXT4-fs (dm-0): unable to read superblock
      
      This patch changes the logical block size from unsigned short to unsigned
      int to avoid the overflow.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      8b05616d
    • Y
      block: fix memleak when __blk_rq_map_user_iov() is failed · a15ce925
      Yang Yingliang 提交于
      task #28557799
      
      [ Upstream commit 3b7995a98ad76da5597b488fa84aa5a56d43b608 ]
      
      When I doing fuzzy test, get the memleak report:
      
      BUG: memory leak
      unreferenced object 0xffff88837af80000 (size 4096):
        comm "memleak", pid 3557, jiffies 4294817681 (age 112.499s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          20 00 00 00 10 01 00 00 00 00 00 00 01 00 00 00   ...............
        backtrace:
          [<000000001c894df8>] bio_alloc_bioset+0x393/0x590
          [<000000008b139a3c>] bio_copy_user_iov+0x300/0xcd0
          [<00000000a998bd8c>] blk_rq_map_user_iov+0x2f1/0x5f0
          [<000000005ceb7f05>] blk_rq_map_user+0xf2/0x160
          [<000000006454da92>] sg_common_write.isra.21+0x1094/0x1870
          [<00000000064bb208>] sg_write.part.25+0x5d9/0x950
          [<000000004fc670f6>] sg_write+0x5f/0x8c
          [<00000000b0d05c7b>] __vfs_write+0x7c/0x100
          [<000000008e177714>] vfs_write+0x1c3/0x500
          [<0000000087d23f34>] ksys_write+0xf9/0x200
          [<000000002c8dbc9d>] do_syscall_64+0x9f/0x4f0
          [<00000000678d8e9a>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      If __blk_rq_map_user_iov() is failed in blk_rq_map_user_iov(),
      the bio(s) which is allocated before this failing will leak. The
      refcount of the bio(s) is init to 1 and increased to 2 by calling
      bio_get(), but __blk_rq_unmap_user() only decrease it to 1, so
      the bio cannot be freed. Fix it by calling blk_rq_unmap_user().
      Reviewed-by: NBob Liu <bob.liu@oracle.com>
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      a15ce925
  2. 12 6月, 2020 2 次提交
    • J
      io_uring: check file O_NONBLOCK state for accept · e8758c9b
      Jiufei Xue 提交于
      task #27774850
      
      commit e697deed834de15d2322d0619d51893022c90ea2 upstream.
      
      If the socket is O_NONBLOCK, we should complete the accept request
      with -EAGAIN when data is not ready.
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      e8758c9b
    • J
      ext4: fix partial cluster initialization when splitting extent · b4a0105f
      Jeffle Xu 提交于
      fix #27212891
      
      Fix the bug when calculating the physical block number of the first
      block in the split extent.
      
      This bug will cause xfstests shared/298 failure on ext4 with bigalloc
      enabled occasionally. Ext4 error messages indicate that previously freed
      blocks are being freed again, and the following fsck will fail due to
      the inconsistency of block bitmap and bg descriptor.
      
      The following is an example case:
      
      1. First, Initialize a ext4 filesystem with cluster size '16K', block size
      '4K', in which case, one cluster contains four blocks.
      
      2. Create one file (e.g., xxx.img) on this ext4 filesystem. Now the extent
      tree of this file is like:
      
      ...
      36864:[0]4:220160
      36868:[0]14332:145408
      51200:[0]2:231424
      ...
      
      3. Then execute PUNCH_HOLE fallocate on this file. The hole range is
      like:
      
      ..
      ext4_ext_remove_space: dev 254,16 ino 12 since 49506 end 49506 depth 1
      ext4_ext_remove_space: dev 254,16 ino 12 since 49544 end 49546 depth 1
      ext4_ext_remove_space: dev 254,16 ino 12 since 49605 end 49607 depth 1
      ...
      
      4. Then the extent tree of this file after punching is like
      
      ...
      49507:[0]37:158047
      49547:[0]58:158087
      ...
      
      5. Detailed procedure of punching hole [49544, 49546]
      
      5.1. The block address space:
      ```
      lblk        ~49505  49506   49507~49543     49544~49546    49547~
      	  ---------+------+-------------+----------------+--------
      	    extent | hole |   extent	|	hole	 | extent
      	  ---------+------+-------------+----------------+--------
      pblk       ~158045  158046  158047~158083  158084~158086   158087~
      ```
      
      5.2. The detailed layout of cluster 39521:
      ```
      		cluster 39521
      	<------------------------------->
      
      		hole		  extent
      	<----------------------><--------
      
      lblk      49544   49545   49546   49547
      	+-------+-------+-------+-------+
      	|	|	|	|	|
      	+-------+-------+-------+-------+
      pblk     158084  1580845  158086  158087
      ```
      
      5.3. The ftrace output when punching hole [49544, 49546]:
      - ext4_ext_remove_space (start 49544, end 49546)
        - ext4_ext_rm_leaf (start 49544, end 49546, last_extent [49507(158047), 40], partial [pclu 39522 lblk 0 state 2])
          - ext4_remove_blocks (extent [49507(158047), 40], from 49544 to 49546, partial [pclu 39522 lblk 0 state 2]
            - ext4_free_blocks: (block 158084 count 4)
              - ext4_mballoc_free (extent 1/6753/1)
      
      5.4. Ext4 error message in dmesg:
      EXT4-fs error (device vdb): mb_free_blocks:1457: group 1, block 158084:freeing already freed block (bit 6753); block bitmap corrupt.
      EXT4-fs error (device vdb): ext4_mb_generate_buddy:747: group 1, block bitmap and bg descriptor inconsistent: 19550 vs 19551 free clusters
      
      In this case, the whole cluster 39521 is freed mistakenly when freeing
      pblock 158084~158086 (i.e., the first three blocks of this cluster),
      although pblock 158087 (the last remaining block of this cluster) has
      not been freed yet.
      
      The root cause of this isuue is that, the pclu of the partial cluster is
      calculated mistakenly in ext4_ext_remove_space(). The correct
      partial_cluster.pclu (i.e., the cluster number of the first block in the
      next extent, that is, lblock 49597 (pblock 158086)) should be 39521 rather
      than 39522.
      
      Fixes: f4226d9e ("ext4: fix partial cluster initialization")
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Reviewed-by: NEric Whitney <enwlinux@gmail.com>
      Cc: stable@kernel.org # v3.19+
      Link: https://lore.kernel.org/r/1590121124-37096-1-git-send-email-jefflexu@linux.alibaba.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      b4a0105f
  3. 11 6月, 2020 1 次提交
    • J
      alinux: blk-mq: remove QUEUE_FLAG_POLL from default MQ flags · 294d5fb2
      Joseph Qi 提交于
      fix #28528017
      
      In case of virtio-blk device, checking /sys/block/<device>/queue/io_poll
      will show 1 and user can't disable it. Actually virtio-blk doesn't
      support poll yet, so it will confuse end user. The root cause is mq
      initialization will default set bit QUEUE_FLAG_POLL.
      
      This fix takes ideas from the following upstream commits:
      6544d229bf43 ("block: enable polling by default if a poll map is initalized")
      6e0de61107f0 ("blk-mq: remove QUEUE_FLAG_POLL from default MQ flags")
      Since we don't want to get HCTX_TYPE_POLL related logic involved, so
      just check mq_ops->poll and then set QUEUE_FLAG_POLL.
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      294d5fb2
  4. 09 6月, 2020 14 次提交
  5. 08 6月, 2020 1 次提交
  6. 05 6月, 2020 2 次提交
  7. 04 6月, 2020 17 次提交