1. 19 4月, 2022 29 次提交
  2. 13 4月, 2022 4 次提交
  3. 12 4月, 2022 7 次提交
    • Z
      sched/fair: Add qos_throttle_list node in struct cfs_rq · 01c6cfa8
      Zhang Qiao 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I50PPU
      CVE: NA
      
      -----------------------------------------------------------------
      
      when unthrottle a cfs_rq at distribute_cfs_runtime(), another cpu
      may re-throttle this cfs_rq at qos_throttle_cfs_rq() before access
      the cfs_rq->throttle_list.next, but meanwhile, qos throttle will
      attach the cfs_rq throttle_list node to percpu qos_throttled_cfs_rq,
      it will change cfs_rq->throttle_list.next and cause panic or hardlockup
      at distribute_cfs_runtime().
      
      Fix it by adding a qos_throttle_list node in struct cfs_rq, and qos
      throttle disuse the cfs_rq->throttle_list.
      Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com>
      Reviewed-by: Nzheng zucheng <zhengzucheng@huawei.com>
      Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
      Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      01c6cfa8
    • A
      ARM: 9142/1: kasan: work around LPAE build warning · 5a61cbe1
      Arnd Bergmann 提交于
      mainline inclusion
      from mainline-v5.16-rc1
      commit c2e6df3e
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I50HG3
      CVE: NA
      
      --------------------------------
      
      pgd_page_vaddr() returns an 'unsigned long' address, causing a warning
      with the memcpy() call in kasan_init():
      
      arch/arm/mm/kasan_init.c: In function 'kasan_init':
      include/asm-generic/pgtable-nop4d.h:44:50: error: passing argument 2 of '__memcpy' makes pointer from integer without a cast [-Werror=int-conversion]
         44 | #define pgd_page_vaddr(pgd)                     ((unsigned long)(p4d_pgtable((p4d_t){ pgd })))
            |                                                 ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            |                                                  |
            |                                                  long unsigned int
      arch/arm/include/asm/string.h:58:45: note: in definition of macro 'memcpy'
         58 | #define memcpy(dst, src, len) __memcpy(dst, src, len)
            |                                             ^~~
      arch/arm/mm/kasan_init.c:229:16: note: in expansion of macro 'pgd_page_vaddr'
        229 |                pgd_page_vaddr(*pgd_offset_k(KASAN_SHADOW_START)),
            |                ^~~~~~~~~~~~~~
      arch/arm/include/asm/string.h:21:47: note: expected 'const void *' but argument is of type 'long unsigned int'
         21 | extern void *__memcpy(void *dest, const void *src, __kernel_size_t n);
            |                                   ~~~~~~~~~~~~^~~
      
      Avoid this by adding an explicit typecast.
      
      Link: https://lore.kernel.org/all/CACRpkdb3DMvof3-xdtss0Pc6KM36pJA-iy=WhvtNVnsDpeJ24Q@mail.gmail.com/
      
      Fixes: 5615f69b ("ARM: 9016/2: Initialize the mapping of KASan shadow memory")
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NMa Wupeng <mawupeng1@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      5a61cbe1
    • M
      mm: kfence: fix missing objcg housekeeping for SLAB · d9b360f4
      Muchun Song 提交于
      mainline inclusion
      from mainline-v5.18-rc1
      commit ae085d7f
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I50GZX
      CVE: NA
      
      -----------------------------------
      
      The objcg is not cleared and put for kfence object when it is freed,
      which could lead to memory leak for struct obj_cgroup and wrong
      statistics of NR_SLAB_RECLAIMABLE_B or NR_SLAB_UNRECLAIMABLE_B.
      
      Since the last freed object's objcg is not cleared,
      mem_cgroup_from_obj() could return the wrong memcg when this kfence
      object, which is not charged to any objcgs, is reallocated to other
      users.
      
      A real word issue [1] is caused by this bug.
      
      Link: https://lore.kernel.org/all/000000000000cabcb505dae9e577@google.com/ [1]
      Reported-by: syzbot+f8c45ccc7d5d45fc5965@syzkaller.appspotmail.com
      Fixes: d3fb45f3 ("mm, kfence: insert KFENCE hooks for SLAB")
      Signed-off-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeng Liu <liupeng256@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      d9b360f4
    • L
      cgroup: Export cgroup.kill from cgroupv2 to cgroupv1 · c09fad1e
      Lu Jialin 提交于
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4X0YD?from=project-issue
      CVE: NA
      
      --------
      
      Export cgroup.kill feature from cgroupv2 to cgroupv1. Therefore, user
      can kill all process in one cgroup and its subcgroups instead of kill them
      one by one.
      Signed-off-by: NLu Jialin <lujialin4@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      c09fad1e
    • C
      cgroup: introduce cgroup.kill · 55bc6cae
      Christian Brauner 提交于
      mainline inclusion
      from mianline-v5.14-rc1
      commit 661ee628
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4X0YD?from=project-issue
      CVE: NA
      
      -----------------------------------
      
      Introduce the cgroup.kill file. It does what it says on the tin and
      allows a caller to kill a cgroup by writing "1" into cgroup.kill.
      The file is available in non-root cgroups.
      
      Killing cgroups is a process directed operation, i.e. the whole
      thread-group is affected. Consequently trying to write to cgroup.kill in
      threaded cgroups will be rejected and EOPNOTSUPP returned. This behavior
      aligns with cgroup.procs where reads in threaded-cgroups are rejected
      with EOPNOTSUPP.
      
      The cgroup.kill file is write-only since killing a cgroup is an event
      not which makes it different from e.g. freezer where a cgroup
      transitions between the two states.
      
      As with all new cgroup features cgroup.kill is recursive by default.
      
      Killing a cgroup is protected against concurrent migrations through the
      cgroup mutex. To protect against forkbombs and to mitigate the effect of
      racing forks a new CGRP_KILL css set lock protected flag is introduced
      that is set prior to killing a cgroup and unset after the cgroup has
      been killed. We can then check in cgroup_post_fork() where we hold the
      css set lock already whether the cgroup is currently being killed. If so
      we send the child a SIGKILL signal immediately taking it down as soon as
      it returns to userspace. To make the killing of the child semantically
      clean it is killed after all cgroup attachment operations have been
      finalized.
      
      There are various use-cases of this interface:
      - Containers usually have a conservative layout where each container
        usually has a delegated cgroup. For such layouts there is a 1:1
        mapping between container and cgroup. If the container in addition
        uses a separate pid namespace then killing a container usually becomes
        a simple kill -9 <container-init-pid> from an ancestor pid namespace.
        However, there are quite a few scenarios where that isn't true. For
        example, there are containers that share the cgroup with other
        processes on purpose that are supposed to be bound to the lifetime of
        the container but are not in the same pidns of the container.
        Containers that are in a delegated cgroup but share the pid namespace
        with the host or other containers.
      - Service managers such as systemd use cgroups to group and organize
        processes belonging to a service. They usually rely on a recursive
        algorithm now to kill a service. With cgroup.kill this becomes a
        simple write to cgroup.kill.
      - Userspace OOM implementations can make good use of this feature to
        efficiently take down whole cgroups quickly.
      - The kill program can gain a new
        kill --cgroup /sys/fs/cgroup/delegated
        flag to take down cgroups.
      
      A few observations about the semantics:
      - If parent and child are in the same cgroup and CLONE_INTO_CGROUP is
        not specified we are not taking cgroup mutex meaning the cgroup can be
        killed while a process in that cgroup is forking.
        If the kill request happens right before cgroup_can_fork() and before
        the parent grabs its siglock the parent is guaranteed to see the
        pending SIGKILL. In addition we perform another check in
        cgroup_post_fork() whether the cgroup is being killed and is so take
        down the child (see above). This is robust enough and protects gainst
        forkbombs. If userspace really really wants to have stricter
        protection the simple solution would be to grab the write side of the
        cgroup threadgroup rwsem which will force all ongoing forks to
        complete before killing starts. We concluded that this is not
        necessary as the semantics for concurrent forking should simply align
        with freezer where a similar check as cgroup_post_fork() is performed.
      
        For all other cases CLONE_INTO_CGROUP is required. In this case we
        will grab the cgroup mutex so the cgroup can't be killed while we
        fork. Once we're done with the fork and have dropped cgroup mutex we
        are visible and will be found by any subsequent kill request.
      - We obviously don't kill kthreads. This means a cgroup that has a
        kthread will not become empty after killing and consequently no
        unpopulated event will be generated. The assumption is that kthreads
        should be in the root cgroup only anyway so this is not an issue.
      - We skip killing tasks that already have pending fatal signals.
      - Freezer doesn't care about tasks in different pid namespaces, i.e. if
        you have two tasks in different pid namespaces the cgroup would still
        be frozen. The cgroup.kill mechanism consequently behaves the same
        way, i.e. we kill all processes and ignore in which pid namespace they
        exist.
      - If the caller is located in a cgroup that is killed the caller will
        obviously be killed as well.
      
      Link: https://lore.kernel.org/r/20210503143922.3093755-1-brauner@kernel.org
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: cgroups@vger.kernel.org
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NSerge Hallyn <serge@hallyn.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NLu Jialin <lujialin4@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      55bc6cae
    • L
      memcg: Fix inconsistent oom event behavior for OOM_MEMCG_KILL · 59119846
      Lu Jialin 提交于
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4X0YD?from=project-issue
      CVE: NA
      
      --------
      
      Since memory.event is fully supported in cgroupv1, the problem of inconsistent
      oom event behavior for OOM_MEMCG_KILL occurs again.
      We fix the problem by add a new condition to support the event adding
      continue. Therefore, there are two condition:
      1) memcg is not root memcg;
      2) the memcg is root memcg and the event is OOM_MEMCG_KILL of cgroupv1
      Signed-off-by: NLu Jialin <lujialin4@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      59119846
    • L
      memcg: Export memory.events and memory.events.local from cgroupv2 to cgroupv1 · bbe0840a
      Lu Jialin 提交于
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4X0YD?from=project-issue
      CVE: NA
      
      --------
      
      Export "memory.events" and "memory.events.local" from cgroupv2 to
      cgroupv1.
      
      There are some differences between v2 and v1:
      
      1)events of MEMCG_OOM_GROUP_KILL is not included in cgroupv1. Because,
      there is no member of memory.oom.group.
      
      2)events of MEMCG_MAX is represented with "limit_in_bytes" in cgroupv1 instead
      of memory.max
      
      3)event of oom_kill is include in memory.oom_control. make oom_kill include
      its descendants' events and add oom_kill_local include its oom_kill event only.
      Signed-off-by: NLu Jialin <lujialin4@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      bbe0840a