1. 31 1月, 2023 1 次提交
  2. 23 11月, 2022 1 次提交
    • L
      cgroup: support cgroup writeback on cgroupv1 · 644547a9
      Lu Jialin 提交于
      hulkl inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZG61
      
      -------------------------------
      
      In cgroupv1, cgroup writeback is not supproted for two problems:
      1) Blkcg_css and memcg_css are mounted on different cgroup trees.
         Therefore, blkcg_css cannot be found according to a certain memcg_css.
      2) Buffer I/O is worked by kthread, which is in the root_blkcg.
         Therefore, blkcg cannot limit wbps and wiops of buffer I/O.
      
      We solve the two problems to support cgroup writeback on cgroupv1.
      1) A memcg is attached to the blkcg_root css when the memcg was created.
      2) We add a member "wb_blkio_ino" in mem_cgroup_legacy_files.
         User can attach a memcg to a cerntain blkcg through echo the file
         inode of the blkcg into the wb_blkio of the memcg.
      3) inode_cgwb_enabled() return true when memcg and io are both mounted
         on cgroupv2 or both on cgroupv1.
      4) Buffer I/O can find a blkcg according to its memcg.
      
      Thus, a memcg can find a certain blkcg, and cgroup writeback can be
      supported on cgroupv1.
      Signed-off-by: NLu Jialin <lujialin4@huawei.com>
      644547a9
  3. 09 8月, 2022 2 次提交
  4. 23 5月, 2022 1 次提交
    • R
      mm: memcg: synchronize objcg lists with a dedicated spinlock · 9ba3dc33
      Roman Gushchin 提交于
      stable inclusion
      from stable-v5.10.102
      commit 8c8385972ea96adeb9b678c9390beaa4d94c4aae
      bugzilla: https://gitee.com/openeuler/kernel/issues/I567K6
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8c8385972ea96adeb9b678c9390beaa4d94c4aae
      
      --------------------------------
      
      commit 0764db9b upstream.
      
      Alexander reported a circular lock dependency revealed by the mmap1 ltp
      test:
      
        LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
                WARNING: possible circular locking dependency detected
                5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
                ------------------------------------------------------
                mmap1/202299 is trying to acquire lock:
                00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
                but task is already holding lock:
                00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
                which lock already depends on the new lock.
                the existing dependency chain (in reverse order) is:
                -> #1 (&sighand->siglock){-.-.}-{2:2}:
                       __lock_acquire+0x604/0xbd8
                       lock_acquire.part.0+0xe2/0x238
                       lock_acquire+0xb0/0x200
                       _raw_spin_lock_irqsave+0x6a/0xd8
                       __lock_task_sighand+0x90/0x190
                       cgroup_freeze_task+0x2e/0x90
                       cgroup_migrate_execute+0x11c/0x608
                       cgroup_update_dfl_csses+0x246/0x270
                       cgroup_subtree_control_write+0x238/0x518
                       kernfs_fop_write_iter+0x13e/0x1e0
                       new_sync_write+0x100/0x190
                       vfs_write+0x22c/0x2d8
                       ksys_write+0x6c/0xf8
                       __do_syscall+0x1da/0x208
                       system_call+0x82/0xb0
                -> #0 (css_set_lock){..-.}-{2:2}:
                       check_prev_add+0xe0/0xed8
                       validate_chain+0x736/0xb20
                       __lock_acquire+0x604/0xbd8
                       lock_acquire.part.0+0xe2/0x238
                       lock_acquire+0xb0/0x200
                       _raw_spin_lock_irqsave+0x6a/0xd8
                       obj_cgroup_release+0x4a/0xe0
                       percpu_ref_put_many.constprop.0+0x150/0x168
                       drain_obj_stock+0x94/0xe8
                       refill_obj_stock+0x94/0x278
                       obj_cgroup_charge+0x164/0x1d8
                       kmem_cache_alloc+0xac/0x528
                       __sigqueue_alloc+0x150/0x308
                       __send_signal+0x260/0x550
                       send_signal+0x7e/0x348
                       force_sig_info_to_task+0x104/0x180
                       force_sig_fault+0x48/0x58
                       __do_pgm_check+0x120/0x1f0
                       pgm_check_handler+0x11e/0x180
                other info that might help us debug this:
                 Possible unsafe locking scenario:
                       CPU0                    CPU1
                       ----                    ----
                  lock(&sighand->siglock);
                                               lock(css_set_lock);
                                               lock(&sighand->siglock);
                  lock(css_set_lock);
                 *** DEADLOCK ***
                2 locks held by mmap1/202299:
                 #0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
                 #1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
                stack backtrace:
                CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
                Hardware name: IBM 3906 M04 704 (LPAR)
                Call Trace:
                  dump_stack_lvl+0x76/0x98
                  check_noncircular+0x136/0x158
                  check_prev_add+0xe0/0xed8
                  validate_chain+0x736/0xb20
                  __lock_acquire+0x604/0xbd8
                  lock_acquire.part.0+0xe2/0x238
                  lock_acquire+0xb0/0x200
                  _raw_spin_lock_irqsave+0x6a/0xd8
                  obj_cgroup_release+0x4a/0xe0
                  percpu_ref_put_many.constprop.0+0x150/0x168
                  drain_obj_stock+0x94/0xe8
                  refill_obj_stock+0x94/0x278
                  obj_cgroup_charge+0x164/0x1d8
                  kmem_cache_alloc+0xac/0x528
                  __sigqueue_alloc+0x150/0x308
                  __send_signal+0x260/0x550
                  send_signal+0x7e/0x348
                  force_sig_info_to_task+0x104/0x180
                  force_sig_fault+0x48/0x58
                  __do_pgm_check+0x120/0x1f0
                  pgm_check_handler+0x11e/0x180
                INFO: lockdep is turned off.
      
      In this example a slab allocation from __send_signal() caused a
      refilling and draining of a percpu objcg stock, resulted in a releasing
      of another non-related objcg.  Objcg release path requires taking the
      css_set_lock, which is used to synchronize objcg lists.
      
      This can create a circular dependency with the sighandler lock, which is
      taken with the locked css_set_lock by the freezer code (to freeze a
      task).
      
      In general it seems that using css_set_lock to synchronize objcg lists
      makes any slab allocations and deallocation with the locked css_set_lock
      and any intervened locks risky.
      
      To fix the problem and make the code more robust let's stop using
      css_set_lock to synchronize objcg lists and use a new dedicated spinlock
      instead.
      
      Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
      Fixes: bf4f0599 ("mm: memcg/slab: obj_cgroup API")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Reported-by: NAlexander Egorenkov <egorenar@linux.ibm.com>
      Tested-by: NAlexander Egorenkov <egorenar@linux.ibm.com>
      Reviewed-by: NWaiman Long <longman@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NJeremy Linton <jeremy.linton@arm.com>
      Tested-by: NJeremy Linton <jeremy.linton@arm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYu Liao <liaoyu15@huawei.com>
      Reviewed-by: NWei Li <liwei391@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      9ba3dc33
  5. 07 4月, 2022 2 次提交
  6. 20 3月, 2022 1 次提交
  7. 19 1月, 2022 2 次提交
    • L
      mm/dynamic_hugetlb: establish the dynamic hugetlb feature framework · a8a836a3
      Liu Shixin 提交于
      hulk inclusion
      category: feature
      bugzilla: 46904, https://gitee.com/openeuler/kernel/issues/I4QSHG
      CVE: NA
      
      --------------------------------
      
      Dynamic hugetlb is a self-developed feature based on the hugetlb and memcontrol.
      It supports to split huge page dynamically in a memory cgroup. There is a new structure
      dhugetlb_pool in every mem_cgroup to manage the pages configured to the mem_cgroup.
      For the mem_cgroup configured with dhugetlb_pool, processes in the mem_cgroup will
      preferentially use the pages in dhugetlb_pool.
      
      Dynamic hugetlb supports three types of pages, including 1G/2M huge pages and 4K pages.
      For the mem_cgroup configured with dhugetlb_pool, processes will be limited to alloc
      1G/2M huge pages only from dhugetlb_pool. But there is no such constraint for 4K pages.
      If there are insufficient 4K pages in the dhugetlb_pool, pages can also be allocated from
      buddy system. So before using dynamic hugetlb, user must know how many huge pages they
      need.
      
      Usage:
      1. Add 'dynamic_hugetlb=on' in cmdline to enable dynamic hugetlb feature.
      2. Prealloc some 1G hugepages through hugetlb.
      3. Create a mem_cgroup and configure dhugetlb_pool to mem_cgroup.
      4. Configure the count of 1G/2M hugepages, and the remaining pages in dhugetlb_pool will
         be used as basic pages.
      5. Bound a process to mem_cgroup. then the memory for it will be allocated from dhugetlb_pool.
      
      This patch add the corresponding structure dhugetlb_pool for dynamic hugetlb feature,
      the interface 'dhugetlb.nr_pages' in mem_cgroup to configure dhugetlb_pool and the cmdline
      'dynamic_hugetlb=on' to enable dynamic hugetlb feature.
      Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      a8a836a3
    • L
      mm: declare several functions · 98ecb3cd
      Liu Shixin 提交于
      hulk inclusion
      category: feature
      bugzilla: 46904, https://gitee.com/openeuler/kernel/issues/I4QSHG
      CVE: NA
      
      --------------------------------
      
      There are several functions that will be used in next patches for
      dynamic hugetlb feature. Declare them.
      
      No functional changes.
      Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      98ecb3cd
  8. 07 1月, 2022 1 次提交
  9. 31 12月, 2021 1 次提交
  10. 30 11月, 2021 6 次提交
  11. 30 10月, 2021 6 次提交
  12. 19 10月, 2021 1 次提交
  13. 26 9月, 2021 4 次提交
  14. 14 7月, 2021 3 次提交
  15. 08 7月, 2021 2 次提交
  16. 19 4月, 2021 1 次提交
  17. 27 11月, 2020 1 次提交
  18. 15 11月, 2020 1 次提交
  19. 19 10月, 2020 1 次提交
  20. 14 10月, 2020 1 次提交
  21. 15 8月, 2020 1 次提交
    • Q
      mm/memcontrol: fix a data race in scan count · e0e3f42f
      Qian Cai 提交于
      struct mem_cgroup_per_node mz.lru_zone_size[zone_idx][lru] could be
      accessed concurrently as noticed by KCSAN,
      
       BUG: KCSAN: data-race in lruvec_lru_size / mem_cgroup_update_lru_size
      
       write to 0xffff9c804ca285f8 of 8 bytes by task 50951 on cpu 12:
        mem_cgroup_update_lru_size+0x11c/0x1d0
        mem_cgroup_update_lru_size at mm/memcontrol.c:1266
        isolate_lru_pages+0x6a9/0xf30
        shrink_active_list+0x123/0xcc0
        shrink_lruvec+0x8fd/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_vma+0x8a/0x2c0
        do_anonymous_page+0x170/0x700
        __handle_mm_fault+0xc9f/0xd00
        handle_mm_fault+0xfc/0x2f0
        do_page_fault+0x263/0x6f9
        page_fault+0x34/0x40
      
       read to 0xffff9c804ca285f8 of 8 bytes by task 50964 on cpu 95:
        lruvec_lru_size+0xbb/0x270
        mem_cgroup_get_zone_lru_size at include/linux/memcontrol.h:536
        (inlined by) lruvec_lru_size at mm/vmscan.c:326
        shrink_lruvec+0x1d0/0x1380
        shrink_node+0x317/0xd80
        do_try_to_free_pages+0x1f7/0xa10
        try_to_free_pages+0x26c/0x5e0
        __alloc_pages_slowpath+0x458/0x1290
        __alloc_pages_nodemask+0x3bb/0x450
        alloc_pages_current+0xa6/0x120
        alloc_slab_page+0x3b1/0x540
        allocate_slab+0x70/0x660
        new_slab+0x46/0x70
        ___slab_alloc+0x4ad/0x7d0
        __slab_alloc+0x43/0x70
        kmem_cache_alloc+0x2c3/0x420
        getname_flags+0x4c/0x230
        getname+0x22/0x30
        do_sys_openat2+0x205/0x3b0
        do_sys_open+0x9a/0xf0
        __x64_sys_openat+0x62/0x80
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 95 PID: 50964 Comm: cc1 Tainted: G        W  O L    5.5.0-next-20200204+ #6
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      The write is under lru_lock, but the read is done as lockless.  The scan
      count is used to determine how aggressively the anon and file LRU lists
      should be scanned.  Load tearing could generate an inefficient heuristic,
      so fix it by adding READ_ONCE() for the read.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Link: http://lkml.kernel.org/r/20200206034945.2481-1-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0e3f42f