1. 29 6月, 2020 2 次提交
  2. 24 6月, 2020 3 次提交
  3. 23 6月, 2020 3 次提交
    • Y
      alinux: sched: Add switch for scheduler_tick load tracking · bcaf8afd
      Yihao Wu 提交于
      to #28739709
      
      Assume workloads are composed of massive short tasks. Then periodical
      load tracking is unnecessary. Because load tracking should be already
      guaranteed by frequent sleep and wake-up.
      
      If these massive short tasks run in their individual cgroups, the load
      tracking becomes extremely heavy.
      
      This patch adds a switch to bypass scheduler_tick load tracking, in
      order to reduce scheduler overhead, without sacrificing much balance
      in this scenario.
      
      Performance Tests:
      
      1) 1100+ tasks in their individual cgroups, on a 96-HT Skylake machine
      
      	sched overhead(each HT): 0.74% -> 0.48%
      
      	(This test's baseline is from the previous patch)
      
      2) sysbench-threads with 96 threads, running for 5min
      
      	latency_ms 95th: 63.07 -> 54.01
      
      Besides these, no regression is found on our test platform.
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>
      bcaf8afd
    • Y
      alinux: sched: Add switch for update_blocked_averages · bb48b716
      Yihao Wu 提交于
      to #28739709
      
      Unless the workloads are IO-bounded, update_blocked_averages doesn't help
      load balance. This patch adds a switch to bypass update_blocked_averages
      if prior knowledge about workloads indicates IO is negligible.
      
      Performance Tests:
      
      1) 1100+ tasks in their individual cgroups, on a 96-HT Skylake machine
      
      	sched overhead(each HT): 3.78% -> 0.74%
      
      2) cgroup-overhead benchmark in our sched-test suite on a 96-HT Skylake
      
      	overhead: 21.06 -> 18.08
      
      3) unixbench context1 with 96 threads running for 1min
      
      	Score: 15409.40 -> 16821.77
      
      Besides these, UnixBench has some performance ups and downs. But
      generally, the performance of UnixBench hasn't changed.
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>
      bb48b716
    • Y
      alinux: mm: thp: add fast_cow switch · 56a432f5
      Yang Shi 提交于
      task #27327988
      
      The commit ("thp: change CoW semantics for anon-THP") rewrites THP CoW
      page fault handler to allocate base page only, but there is request to
      keep the old behavior just in case.  So, introduce a new sysfs knob,
      fast_cow, to control the behavior, the default is the new behavior.
      Write that knob to 0 to switch to old behavior.
      Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
      [ caspar: fix checkpatch.pl warnings ]
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      56a432f5
  4. 15 6月, 2020 2 次提交
    • S
      block: Fix use-after-free issue accessing struct io_cq · fba123ba
      Sahitya Tummala 提交于
      task #28557799
      
      [ Upstream commit 30a2da7b7e225ef6c87a660419ea04d3cef3f6a7 ]
      
      There is a potential race between ioc_release_fn() and
      ioc_clear_queue() as shown below, due to which below kernel
      crash is observed. It also can result into use-after-free
      issue.
      
      context#1:				context#2:
      ioc_release_fn()			__ioc_clear_queue() gets the same icq
      ->spin_lock(&ioc->lock);		->spin_lock(&ioc->lock);
      ->ioc_destroy_icq(icq);
        ->list_del_init(&icq->q_node);
        ->call_rcu(&icq->__rcu_head,
        	icq_free_icq_rcu);
      ->spin_unlock(&ioc->lock);
      					->ioc_destroy_icq(icq);
      					  ->hlist_del_init(&icq->ioc_node);
      					  This results into below crash as this memory
      					  is now used by icq->__rcu_head in context#1.
      					  There is a chance that icq could be free'd
      					  as well.
      
      22150.386550:   <6> Unable to handle kernel write to read-only memory
      at virtual address ffffffaa8d31ca50
      ...
      Call trace:
      22150.607350:   <2>  ioc_destroy_icq+0x44/0x110
      22150.611202:   <2>  ioc_clear_queue+0xac/0x148
      22150.615056:   <2>  blk_cleanup_queue+0x11c/0x1a0
      22150.619174:   <2>  __scsi_remove_device+0xdc/0x128
      22150.623465:   <2>  scsi_forget_host+0x2c/0x78
      22150.627315:   <2>  scsi_remove_host+0x7c/0x2a0
      22150.631257:   <2>  usb_stor_disconnect+0x74/0xc8
      22150.635371:   <2>  usb_unbind_interface+0xc8/0x278
      22150.639665:   <2>  device_release_driver_internal+0x198/0x250
      22150.644897:   <2>  device_release_driver+0x24/0x30
      22150.649176:   <2>  bus_remove_device+0xec/0x140
      22150.653204:   <2>  device_del+0x270/0x460
      22150.656712:   <2>  usb_disable_device+0x120/0x390
      22150.660918:   <2>  usb_disconnect+0xf4/0x2e0
      22150.664684:   <2>  hub_event+0xd70/0x17e8
      22150.668197:   <2>  process_one_work+0x210/0x480
      22150.672222:   <2>  worker_thread+0x32c/0x4c8
      
      Fix this by adding a new ICQ_DESTROYED flag in ioc_destroy_icq() to
      indicate this icq is once marked as destroyed. Also, ensure
      __ioc_clear_queue() is accessing icq within rcu_read_lock/unlock so
      that icq doesn't get free'd up while it is still using it.
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Co-developed-by: NPradeep P V K <ppvk@codeaurora.org>
      Signed-off-by: NPradeep P V K <ppvk@codeaurora.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      fba123ba
    • M
      block: fix an integer overflow in logical block size · 8b05616d
      Mikulas Patocka 提交于
      task #28557799
      
      commit ad6bf88a6c19a39fb3b0045d78ea880325dfcf15 upstream.
      
      Logical block size has type unsigned short. That means that it can be at
      most 32768. However, there are architectures that can run with 64k pages
      (for example arm64) and on these architectures, it may be possible to
      create block devices with 64k block size.
      
      For exmaple (run this on an architecture with 64k pages):
      
      Mount will fail with this error because it tries to read the superblock using 2-sector
      access:
        device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
        EXT4-fs (dm-0): unable to read superblock
      
      This patch changes the logical block size from unsigned short to unsigned
      int to avoid the overflow.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      8b05616d
  5. 11 6月, 2020 1 次提交
    • J
      alinux: blk-mq: remove QUEUE_FLAG_POLL from default MQ flags · 294d5fb2
      Joseph Qi 提交于
      fix #28528017
      
      In case of virtio-blk device, checking /sys/block/<device>/queue/io_poll
      will show 1 and user can't disable it. Actually virtio-blk doesn't
      support poll yet, so it will confuse end user. The root cause is mq
      initialization will default set bit QUEUE_FLAG_POLL.
      
      This fix takes ideas from the following upstream commits:
      6544d229bf43 ("block: enable polling by default if a poll map is initalized")
      6e0de61107f0 ("blk-mq: remove QUEUE_FLAG_POLL from default MQ flags")
      Since we don't want to get HCTX_TYPE_POLL related logic involved, so
      just check mq_ops->poll and then set QUEUE_FLAG_POLL.
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      294d5fb2
  6. 09 6月, 2020 5 次提交
  7. 05 6月, 2020 2 次提交
  8. 04 6月, 2020 2 次提交
  9. 28 5月, 2020 20 次提交