1. 15 6月, 2023 5 次提交
  2. 12 6月, 2023 5 次提交
  3. 09 6月, 2023 2 次提交
    • W
      sched: smart grid: init sched_grid_qos structure on QOS purpose · ce35ded5
      Wang ShaoBo 提交于
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0
      CVE: NA
      
      ----------------------------------------
      
      As smart grid scheduling (SGS) may shrink resources and affect task QOS,
      We provide methods for evaluating task QOS in divided grid, we mainly
      focus on the following two aspects:
      
         1. Evaluate whether (such as CPU or memory) resources meet our demand
         2. Ensure the least impact when working with (cpufreq and cpuidle) governors
      
      For tackling this questions, we have summarized several sampling methods
      to obtain tasks' characteristics at same time reducing scheduling noise
      as much as possible:
      
        1. we detected the key factors that how sensitive a process is in cpufreq
           or cpuidle adjustment, and to guide the cpufreq/cpuidle governor
        2. We dynamically monitor process memory bandwidth and adjust memory
           allocation to minimize cross-remote memory access
        3. We provide a variety of load tracking mechanisms to adapt to different
           types of task's load change
      
           ---------------------------------     -----------------
          |            class A              |   |     class B     |
          |    --------        --------     |   |     --------    |
          |   | group0 |      | group1 |    |---|    | group2 |   |----------+
          |    --------        --------     |   |     --------    |          |
          |    CPU/memory sensitive type    |   |   balance type  |          |
           ----------------+----------------     --------+--------           |
                           v                             v                   | (target cpufreq)
           -------------------------------------------------------           | (sensitivity)
          |              Not satisfied with QOS?                  |          |
           --------------------------+----------------------------           |
                                     v                                       v
           -------------------------------------------------------     ----------------
          |              expand or shrink resource                |<--|  energy model  |
           ----------------------------+--------------------------     ----------------
                                       v                                     |
           -----------          -----------          ------------            v
          |           |        |           |        |            |     ---------------
          |   GRID0   +--------+   GRID1   +--------+   GRID2    |<-- |   governor    |
          |           |        |           |        |            |     ---------------
           -----------          -----------          ------------
                         \            |            /
                          \  -------------------  /
                            |  pages migration  |
                             -------------------
      
      We will introduce the energy model in the follow-up implementation, and consider
      the dynamic affinity adjustment between each divided grid in the runtime.
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      ce35ded5
    • H
      sched: Introduce smart grid scheduling strategy for cfs · 713cfd26
      Hui Tang 提交于
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0
      CVE: NA
      
      ----------------------------------------
      
      We want to dynamically expand or shrink the affinity range of tasks
      based on the CPU topology level while meeting the minimum resource
      requirements of tasks.
      
      We divide several level of affinity domains according to sched domains:
      
      level4   * SOCKET  [                                                  ]
      level3   * DIE     [                             ]
      level2   * MC      [             ] [             ]
      level1   * SMT     [     ] [     ] [     ] [     ]
      level0   * CPU      0   1   2   3   4   5   6   7
      
      Whether users tend to choose power saving or performance will affect
      strategy of adjusting affinity, when selecting the power saving mode,
      we will choose a more appropriate affinity based on the energy model
      to reduce power consumption, while considering the QOS of resources
      such as CPU and memory consumption, for instance, if the current task
      CPU load is less than required, smart grid will judge whether to aggregate
      tasks together into a smaller range or not according to energy model.
      
      The main difference from EAS is that we pay more attention to the impact
      of power consumption brought by such as cpuidle and DVFS, and classify
      tasks to reduce interference and ensure resource QOS in each divided unit,
      which are more suitable for general-purpose on non-heterogeneous CPUs.
      
              --------        --------        --------
             | group0 |      | group1 |      | group2 |
              --------        --------        --------
      	   |                |              |
      	   v                |              v
             ---------------------+-----     -----------------
            |                  ---v--   |   |
            |       DIE0      |  MC1 |  |   |   DIE1
            |                  ------   |   |
             ---------------------------     -----------------
      
      We regularly count the resource satisfaction of groups, and adjust the
      affinity, scheduling balance and migrating memory will be considered
      based on memory location for better meetting resource requirements.
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      713cfd26
  4. 08 6月, 2023 25 次提交
  5. 07 6月, 2023 3 次提交
    • Z
      block: Fix the partition start may overflow in add_partition() · a325a269
      Zhong Jinghua 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 187268, https://gitee.com/openeuler/kernel/issues/I76JDY
      CVE: NA
      
      ----------------------------------------
      
      In the block_ioctl, we can pass in the unsigned number 0x8000000000000000
      as an input parameter, like below:
      
      block_ioctl
        blkdev_ioctl
          blkpg_ioctl
            blkpg_do_ioctl
              copy_from_user
              bdev_add_partition
                add_partition
                  p->start_sect = start; // start = 0x8000000000000000
      
      Then, there was an warning when submit bio:
      
      WARNING: CPU: 0 PID: 382 at fs/iomap/apply.c:54
      Call trace:
       iomap_apply+0x644/0x6e0
       __iomap_dio_rw+0x5cc/0xa24
       iomap_dio_rw+0x4c/0xcc
       ext4_dio_read_iter
       ext4_file_read_iter
       ext4_file_read_iter+0x318/0x39c
       call_read_iter
       lo_rw_aio.isra.0+0x748/0x75c
       do_req_filebacked+0x2d4/0x370
       loop_handle_cmd
       loop_queue_work+0x94/0x23c
       kthread_worker_fn+0x160/0x6bc
       loop_kthread_worker_fn+0x3c/0x50
       kthread+0x20c/0x25c
       ret_from_fork+0x10/0x18
      
      Stack:
      
      submit_bio_noacct
        submit_bio_checks
          blk_partition_remap
            bio->bi_iter.bi_sector += p->start_sect
            // bio->bi_iter.bi_sector = 0xffc0000000000000 + 65408
      ..
      loop_queue_work
       loop_handle_cmd
        do_req_filebacked
         pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset // pos < 0
         lo_rw_aio
           call_read_iter
            ext4_dio_read_iter
             __iomap_dio_rw
              iomap_apply
               ext4_iomap_begin
                 map.m_lblk = offset >> blkbits
                   ext4_set_iomap
                   iomap->offset = (u64) map->m_lblk << blkbits
                   // iomap->offset = 64512
               WARN_ON(iomap.offset > pos) // iomap.offset = 64512 and pos < 0
      
      This is unreasonable for start + length > disk->part0.nr_sects. There is
      already a similar check in blk_add_partition().
      Fix it by adding a check in bdev_add_partition().
      Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      a325a269
    • C
      block: refactor blkpg_ioctl · ad0628d3
      Christoph Hellwig 提交于
      mainline inclusion
      from mainline-v5.8-rc1
      commit fa9156ae
      category: bugfix
      bugzilla: 187268, https://gitee.com/openeuler/kernel/issues/I76JDY
      CVE: NA
      
      ----------------------------------------
      
      Split each sub-command out into a separate helper, and move those helpers
      to block/partitions/core.c instead of having a lot of partition
      manipulation logic open coded in block/ioctl.c.
      
      Signed-off-by: Christoph Hellwig <hch@lst.de
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      
      conflicts:
      	block/ioctl.c
      	block/partition-generic.c
      	include/linux/genhd.h
      Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      ad0628d3
    • Z
      nbd: get config_lock before sock_shutdown · 4b53fff3
      Zhong Jinghua 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188799, https://gitee.com/openeuler/kernel/issues/I79QWO
      CVE: NA
      
      ----------------------------------------
      
      Config->socks in sock_shutdown may trigger a UAF problem.
      The reason is that sock_shutdown does not hold the config_lock,
      so that nbd_ioctl can release config->socks at this time.
      
      T0: NBD_SET_SOCK
      T1: NBD_DO_IT
      
      T0						T1
      
      nbd_ioctl
        mutex_lock(&nbd->config_lock)
        // get lock
        __nbd_ioctl
      	nbd_start_device_ioctl
      	  nbd_start_device
      	  mutex_unlock(&nbd->config_lock)
      	  // relase lock
      	  wait_event_interruptible
      	  (kill, enter sock_shutdown)
      	  sock_shutdown
      					nbd_ioctl
      					  mutex_lock(&nbd->config_lock)
      					  // get lock
      					  __nbd_ioctl
      					    nbd_add_socket
      					      krealloc
      						kfree(p)
      					        //config->socks is NULL
      	    nbd_sock *nsock = config->socks // error
      
      Fix it by moving config_lock up before sock_shutdown.
      Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4b53fff3