1. 15 11月, 2022 4 次提交
    • X
      mm/sharepool: Fix add group failed with errno 28 · 7086bdba
      Xu Qiang 提交于
      ascend inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I612UG
      CVE: NA
      
      --------------------------------
      
      We increase task->mm->mm_users by one when we add the task to a
      sharepool group. Correspondingly we should drop the mm_users count when
      the task exits. Currently we hijack the mmput function and make it
      return early and decrease mm->mm_users by one (just as mmput would do)
      if it is not called from a task's exiting process, or we decrease
      mm->mm_users by the group number the task was added to. This has two
      problems:
      1. It makes mmput and sp_group_exit hard to understand.
      2. The process of judging if the task (also its mm) is exiting and
         decrease its mm_users count is not atomic. We use this condition:
           mm->mm_users == master->count + MM_WOULD_FREE(1)
         If someone else change the mm->mm_users during those two steps, the
         mm->mm_users would be wrong and mm_struct cannot be released anymore.
      
      Suppose the following process:
      
              proc1                                        proc2
      
      1)      mmput
                |
                V
      2)  enter sp_group_exit and
          'mm->mm_users == master->count + 1' is true
      3)        |                                         mmget
                V
      4)  decrease mm->mm_users by master->count
                |
                V
      5)  enter __mmput and release mm_struct
          if mm->mm_users == 1
      6)                                                  mmput
      
      The statistical structure who has the same id of the task would get leaked
      together with mm_struct, so the next time we try to create the statistical
      structure of the same id, we get a failure.
      
      We fix this by moving sp_group_exit to do_exit() actually where the task is
      exiting. We don't need to judge if the task is exiting when someone
      calling mmput so there is no chance to change mm_users wrongly.
      Signed-off-by: NXu Qiang <xuqiang36@huawei.com>
      Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
      7086bdba
    • Z
      mm: sharepool: Fix static check warning · fab907d0
      Zhang Zekun 提交于
      ascend inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I612UG
      CVE: NA
      
      --------------------------------
      
      Fix the following static check warning.
      Use parentheses to specify the sequence of expressions, instead of using
      the default priority.Should use parenthesis while use bitwise operator.
      
      Fix this by add bracket in the expression.
      Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>
      fab907d0
    • Z
      mm/sharepool: Use "tgid" instead of "pid" to find a task · 32c81f1b
      Zhang Zekun 提交于
      ascend inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I612UG
      CVE: NA
      
      --------------------------------
      
      To support container scenario, use tgid instead of pid to find a
      specific task. In normal cases, "tgid" represent a process in init_pid_ns,
      this patch should not introduce problems to existing code.
      
      Rename the input parameter "int pid" to "int tgid" in following
      exported interfaces:
      1.mg_sp_group_id_by_pid()
      2.mg_sp_group_add_task()
      3.mg_sp_group_del_task()
      4.mg_sp_make_share_k2u()
      5.mg_sp_make_share_u2k()
      6.mg_sp_config_dvpp_range()
      
      Besides, rename these static function together:
      1.__sp_find_spg_locked()
      2.__sp_find_spg()
      
      The following function use "current->pid" to find spg, change
      "current->pid" to "current->tgid".
      1.find_or_alloc_sp_group()
      2.sp_alloc_prepare()
      3.mg_sp_make_share_k2u()
      Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>
      32c81f1b
    • W
      ascend/arm64: Add ascend_enable_all kernel parameter · 66ae8ddd
      Wang Wensheng 提交于
      ascend inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I612UG
      CVE: NA
      
      --------------------------------
      
      This kernel parameter is used for ascend scene and would open all the
      options needed at once.
      Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
      66ae8ddd
  2. 11 11月, 2022 25 次提交
  3. 10 11月, 2022 2 次提交
    • J
      page_alloc: fix invalid watermark check on a negative value · 48bc7241
      Jaewon Kim 提交于
      stable inclusion
      from stable-v5.10.135
      commit 2670f76a563124478d0d14e603b38b73b99c389c
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZWFM
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2670f76a563124478d0d14e603b38b73b99c389c
      
      --------------------------------
      
      commit 9282012f upstream.
      
      There was a report that a task is waiting at the
      throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was
      increasing.
      
      This is a bug where zone_watermark_fast returns true even when the free
      is very low. The commit f27ce0e1 ("page_alloc: consider highatomic
      reserve in watermark fast") changed the watermark fast to consider
      highatomic reserve. But it did not handle a negative value case which
      can be happened when reserved_highatomic pageblock is bigger than the
      actual free.
      
      If watermark is considered as ok for the negative value, allocating
      contexts for order-0 will consume all free pages without direct reclaim,
      and finally free page may become depleted except highatomic free.
      
      Then allocating contexts may fall into throttle_direct_reclaim. This
      symptom may easily happen in a system where wmark min is low and other
      reclaimers like kswapd does not make free pages quickly.
      
      Handle the negative case by using MIN.
      
      Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com
      Fixes: f27ce0e1 ("page_alloc: consider highatomic reserve in watermark fast")
      Signed-off-by: NJaewon Kim <jaewon31.kim@samsung.com>
      Reported-by: NGyeongHwan Hong <gh21.hong@samsung.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Yong-Taek Lee <ytk.lee@samsung.com>
      Cc: <stable@vger.kerenl.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Reviewed-by: NWei Li <liwei391@huawei.com>
      48bc7241
    • W
      mm/mempolicy: fix uninit-value in mpol_rebind_policy() · 7057a3c7
      Wang Cheng 提交于
      stable inclusion
      from stable-v5.10.134
      commit ddb3f0b68863bd1c5f43177eea476bce316d4993
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZVR7
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ddb3f0b68863bd1c5f43177eea476bce316d4993
      
      --------------------------------
      
      commit 018160ad upstream.
      
      mpol_set_nodemask()(mm/mempolicy.c) does not set up nodemask when
      pol->mode is MPOL_LOCAL.  Check pol->mode before access
      pol->w.cpuset_mems_allowed in mpol_rebind_policy()(mm/mempolicy.c).
      
      BUG: KMSAN: uninit-value in mpol_rebind_policy mm/mempolicy.c:352 [inline]
      BUG: KMSAN: uninit-value in mpol_rebind_task+0x2ac/0x2c0 mm/mempolicy.c:368
       mpol_rebind_policy mm/mempolicy.c:352 [inline]
       mpol_rebind_task+0x2ac/0x2c0 mm/mempolicy.c:368
       cpuset_change_task_nodemask kernel/cgroup/cpuset.c:1711 [inline]
       cpuset_attach+0x787/0x15e0 kernel/cgroup/cpuset.c:2278
       cgroup_migrate_execute+0x1023/0x1d20 kernel/cgroup/cgroup.c:2515
       cgroup_migrate kernel/cgroup/cgroup.c:2771 [inline]
       cgroup_attach_task+0x540/0x8b0 kernel/cgroup/cgroup.c:2804
       __cgroup1_procs_write+0x5cc/0x7a0 kernel/cgroup/cgroup-v1.c:520
       cgroup1_tasks_write+0x94/0xb0 kernel/cgroup/cgroup-v1.c:539
       cgroup_file_write+0x4c2/0x9e0 kernel/cgroup/cgroup.c:3852
       kernfs_fop_write_iter+0x66a/0x9f0 fs/kernfs/file.c:296
       call_write_iter include/linux/fs.h:2162 [inline]
       new_sync_write fs/read_write.c:503 [inline]
       vfs_write+0x1318/0x2030 fs/read_write.c:590
       ksys_write+0x28b/0x510 fs/read_write.c:643
       __do_sys_write fs/read_write.c:655 [inline]
       __se_sys_write fs/read_write.c:652 [inline]
       __x64_sys_write+0xdb/0x120 fs/read_write.c:652
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:524 [inline]
       slab_alloc_node mm/slub.c:3251 [inline]
       slab_alloc mm/slub.c:3259 [inline]
       kmem_cache_alloc+0x902/0x11c0 mm/slub.c:3264
       mpol_new mm/mempolicy.c:293 [inline]
       do_set_mempolicy+0x421/0xb70 mm/mempolicy.c:853
       kernel_set_mempolicy mm/mempolicy.c:1504 [inline]
       __do_sys_set_mempolicy mm/mempolicy.c:1510 [inline]
       __se_sys_set_mempolicy+0x44c/0xb60 mm/mempolicy.c:1507
       __x64_sys_set_mempolicy+0xd8/0x110 mm/mempolicy.c:1507
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      KMSAN: uninit-value in mpol_rebind_task (2)
      https://syzkaller.appspot.com/bug?id=d6eb90f952c2a5de9ea718a1b873c55cb13b59dc
      
      This patch seems to fix below bug too.
      KMSAN: uninit-value in mpol_rebind_mm (2)
      https://syzkaller.appspot.com/bug?id=f2fecd0d7013f54ec4162f60743a2b28df40926b
      
      The uninit-value is pol->w.cpuset_mems_allowed in mpol_rebind_policy().
      When syzkaller reproducer runs to the beginning of mpol_new(),
      
      	    mpol_new() mm/mempolicy.c
      	  do_mbind() mm/mempolicy.c
      	kernel_mbind() mm/mempolicy.c
      
      `mode` is 1(MPOL_PREFERRED), nodes_empty(*nodes) is `true` and `flags`
      is 0. Then
      
      	mode = MPOL_LOCAL;
      	...
      	policy->mode = mode;
      	policy->flags = flags;
      
      will be executed. So in mpol_set_nodemask(),
      
      	    mpol_set_nodemask() mm/mempolicy.c
      	  do_mbind()
      	kernel_mbind()
      
      pol->mode is 4 (MPOL_LOCAL), that `nodemask` in `pol` is not initialized,
      which will be accessed in mpol_rebind_policy().
      
      Link: https://lkml.kernel.org/r/20220512123428.fq3wofedp6oiotd4@ppc.localdomainSigned-off-by: NWang Cheng <wanngchenng@gmail.com>
      Reported-by: <syzbot+217f792c92599518a2ab@syzkaller.appspotmail.com>
      Tested-by: <syzbot+217f792c92599518a2ab@syzkaller.appspotmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Reviewed-by: NWei Li <liwei391@huawei.com>
      7057a3c7
  4. 09 11月, 2022 9 次提交