提交 · 5.10.0-23.0.0 · openeuler / Kernel

30 11月, 2021 40 次提交

memcg: unify memcg stat flushing · 863511a4

由 Shakeel Butt 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.16-rc1
commit fd25a9e0
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fd25a9e0e23b

---------------------------------------------------------------------

The memcg stats can be flushed in multiple context and potentially in
parallel too.  For example multiple parallel user space readers for
memcg stats will contend on the rstat locks with each other.  There is
no need for that.  We just need one flusher and everyone else can
benefit.

In addition after aa48e47e ("memcg: infrastructure to flush memcg
stats") the kernel periodically flush the memcg stats from the root, so,
the other flushers will potentially have much less work to do.

Link: https://lkml.kernel.org/r/20211001190040.48086-2-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

863511a4

memcg: flush stats only if updated · 490bc222

由 Shakeel Butt 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 11192d9c
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=11192d9c124d

----------------------------------------------------------------------

At the moment, the kernel flushes the memcg stats on every refault and
also on every reclaim iteration.  Although rstat maintains per-cpu
update tree but on the flush the kernel still has to go through all the
cpu rstat update tree to check if there is anything to flush.  This
patch adds the tracking on the stats update side to make flush side more
clever by skipping the flush if there is no update.

The stats update codepath is very sensitive performance wise for many
workloads and benchmarks.  So, we can not follow what the commit
aa48e47e ("memcg: infrastructure to flush memcg stats") did which
was triggering async flush through queue_work() and caused a lot
performance regression reports.  That got reverted by the commit
1f828223 ("memcg: flush lruvec stats in the refault").

In this patch we kept the stats update codepath very minimal and let the
stats reader side to flush the stats only when the updates are over a
specific threshold.  For now the threshold is (nr_cpus * CHARGE_BATCH).

To evaluate the impact of this patch, an 8 GiB tmpfs file is created on
a system with swap-on-zram and the file was pushed to swap through
memory.force_empty interface.  On reading the whole file, the memcg stat
flush in the refault code path is triggered.  With this patch, we
observed 63% reduction in the read time of 8 GiB file.

Link: https://lkml.kernel.org/r/20211001190040.48086-1-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Reviewed-by: N"Michal Koutný" <mkoutny@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

490bc222

blk-cgroup: blk_cgroup_bio_start() should use irq-safe operations on blkg->iostat_cpu · 5f424003

由 Tejun Heo 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 3c08b093
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c08b0931eed

--------------------------------------------------------------

c3df5fb5 ("cgroup: rstat: fix A-A deadlock on 32bit around
u64_stats_sync") made u64_stats updates irq-safe to avoid A-A deadlocks.
Unfortunately, the conversion missed one in blk_cgroup_bio_start(). Fix it.

Fixes: 2d146aa3 ("mm: memcontrol: switch to rstat")
Cc: stable@vger.kernel.org # v5.13+
Reported-by: syzbot+9738c8815b375ce482a1@syzkaller.appspotmail.com
Signed-off-by: NTejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/YWi7NrQdVlxD6J9W@slm.duckdns.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5f424003

memcg: flush lruvec stats in the refault · 8a975bc4

由 Shakeel Butt 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.15-rc3
commit 1f828223
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f828223b799

-----------------------------------------------------------------------

Prior to the commit 7e1c0d6f ("memcg: switch lruvec stats to rstat")
and the commit aa48e47e ("memcg: infrastructure to flush memcg
stats"), each lruvec memcg stats can be off by (nr_cgroups * nr_cpus *
32) at worst and for unbounded amount of time.  The commit aa48e47e
moved the lruvec stats to rstat infrastructure and the commit
7e1c0d6f bounded the error for all the lruvec stats to (nr_cpus *
32) at worst for at most 2 seconds.  More specifically it decoupled the
number of stats and the number of cgroups from the error rate.

However this reduction in error comes with the cost of triggering the
slowpath of stats update more frequently.  Previously in the slowpath
the kernel adds the stats up the memcg tree.  After aa48e47e, the
kernel triggers the asyn lruvec stats flush through queue_work().  This
causes regression reports from 0day kernel bot [1] as well as from
phoronix test suite [2].

We tried two options to fix the regression:

 1) Increase the threshold to trigger the slowpath in lruvec stats
    update codepath from 32 to 512.

 2) Remove the slowpath from lruvec stats update codepath and instead
    flush the stats in the page refault codepath. The assumption is that
    the kernel timely flush the stats, so, the update tree would be
    small in the refault codepath to not cause the preformance impact.

Following are the results of will-it-scale/page_fault[1|2|3] benchmark
on four settings i.e.  (1) 5.15-rc1 as baseline (2) 5.15-rc1 with
aa48e47e and 7e1c0d6f reverted (3) 5.15-rc1 with option-1
(4) 5.15-rc1 with option-2.

  test       (1)      (2)               (3)               (4)
  pg_f1   368563   406277 (10.23%)   399693  (8.44%)   416398 (12.97%)
  pg_f2   338399   372133  (9.96%)   369180  (9.09%)   381024 (12.59%)
  pg_f3   500853   575399 (14.88%)   570388 (13.88%)   576083 (15.02%)

From the above result, it seems like the option-2 not only solves the
regression but also improves the performance for at least these
benchmarks.

Feng Tang (intel) ran the aim7 benchmark with these two options and
confirms that option-1 reduces the regression but option-2 removes the
regression.

Michael Larabel (phoronix) ran multiple benchmarks with these options
and reported the results at [3] and it shows for most benchmarks
option-2 removes the regression introduced by the commit aa48e47e
("memcg: infrastructure to flush memcg stats").

Based on the experiment results, this patch proposed the option-2 as the
solution to resolve the regression.

Link: https://lore.kernel.org/all/20210726022421.GB21872@xsang-OptiPlex-9020 [1]
Link: https://www.phoronix.com/scan.php?page=article&item=linux515-compile-regress [2]
Link: https://openbenchmarking.org/result/2109226-DEBU-LINUX5104 [3]
Fixes: aa48e47e ("memcg: infrastructure to flush memcg stats")
Signed-off-by: NShakeel Butt <shakeelb@google.com>
Tested-by: NMichael Larabel <Michael@phoronix.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Hillf Danton <hdanton@sina.com>,
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8a975bc4

mm, memcg: remove unused functions · 20894dfe

由 Miaohe Lin 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.15-rc1
commit bec49c06
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bec49c067c67

-----------------------------------------------------------------------

Since commit 2d146aa3 ("mm: memcontrol: switch to rstat"), last user
of memcg_stat_item_in_bytes() is gone.  And since commit fa40d1ee
("mm: vmscan: memcontrol: remove mem_cgroup_select_victim_node()"), only
the declaration of mem_cgroup_select_victim_node() is remained here.
Remove them.

Link: https://lkml.kernel.org/r/20210807082835.61281-2-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Alex Shi <alexs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

20894dfe

memcg: infrastructure to flush memcg stats · 94bb1084

由 Shakeel Butt 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.15-rc1
commit aa48e47e
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aa48e47e3906

------------------------------------------------------

At the moment memcg stats are read in four contexts:

1. memcg stat user interfaces
2. dirty throttling
3. page fault
4. memory reclaim

Currently the kernel flushes the stats for first two cases.  Flushing the
stats for remaining two casese may have performance impact.  Always
flushing the memcg stats on the page fault code path may negatively
impacts the performance of the applications.  In addition flushing in the
memory reclaim code path, though treated as slowpath, can become the
source of contention for the global lock taken for stat flushing because
when system or memcg is under memory pressure, many tasks may enter the
reclaim path.

This patch uses following mechanisms to solve these challenges:

1. Periodically flush the stats from root memcg every 2 seconds.  This
   will time limit the out of sync stats.

2. Asynchronously flush the stats after fixed number of stat updates.
   In the worst case the stat can be out of sync by O(nr_cpus * BATCH) for
   2 seconds.

3. For avoiding thundering herd to flush the stats particularly from
   the memory reclaim context, introduce memcg local spinlock and let only
   one flusher active at a time.  This could have been done through
   cgroup_rstat_lock lock but that lock is used by other subsystem and for
   userspace reading memcg stats.  So, it is better to keep flushers
   introduced by this patch decoupled from cgroup_rstat_lock.  However we
   would have to use irqsafe version of rstat flush but that is fine as
   this code path will be flushing for whole tree and do the work for
   everyone.  No one will be waiting for that worker.

[shakeelb@google.com: fix sleep-in-wrong context bug]
  Link: https://lkml.kernel.org/r/20210716212137.1391164-2-shakeelb@google.com

Link: https://lkml.kernel.org/r/20210714013948.270662-2-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com>
Tested-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflict:
	include/linux/memcontrol.h
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

94bb1084

memcg: switch lruvec stats to rstat · f43d14df

由 Shakeel Butt 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.15-rc1
commit 7e1c0d6f
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7e1c0d6f58207

-------------------------------------------------------------------------

The commit 2d146aa3 ("mm: memcontrol: switch to rstat") switched memcg
stats to rstat infrastructure but skipped the conversion of the lruvec
stats as such stats are read in the performance critical code paths and
flushing stats may have impacted the performances of the applications.
This patch converts the lruvec stats to rstat and later patches add
mechanisms to keep the performance impact to minimum.

The rstat conversion comes with the price i.e.  memory cost.  Effectively
this patch reverts the savings done by the commit f3344adf ("mm:
memcontrol: optimize per-lruvec stats counter memory usage").  However
this cost is justified due to negative impact of the inaccurate lruvec
stats on many heuristics.  One such case is reported in [1].

The memory reclaim code is filled with plethora of heuristics and many of
those heuristics reads the lruvec stats.  So, inaccurate stats can make
such heuristics ineffective.  [1] reports the impact of inaccurate lruvec
stats on the "cache trim mode" heuristic.  Inaccurate lruvec stats can
impact the deactivation and aging anon heuristics as well.

[1] https://lore.kernel.org/linux-mm/20210311004449.1170308-1-ying.huang@intel.com/

Link: https://lkml.kernel.org/r/20210716212137.1391164-1-shakeelb@google.com
Link: https://lkml.kernel.org/r/20210714013948.270662-1-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Michal Koutný <mkoutny@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f43d14df

mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code · 44ca785a

由 Johannes Weiner 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.14-rc4
commit 30def935
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=30def93565e5b

------------------------------------------------------------------------

Dan Carpenter reports:

    The patch 2d146aa3: "mm: memcontrol: switch to rstat" from Apr
    29, 2021, leads to the following static checker warning:

	    kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
	    warn: sleeping in atomic context

    mm/memcontrol.c
      3572  static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
      3573  {
      3574          unsigned long val;
      3575
      3576          if (mem_cgroup_is_root(memcg)) {
      3577                  cgroup_rstat_flush(memcg->css.cgroup);
			    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    This is from static analysis and potentially a false positive.  The
    problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
    which holds an rcu_read_lock().  And the cgroup_rstat_flush() function
    can sleep.

      3578                  val = memcg_page_state(memcg, NR_FILE_PAGES) +
      3579                          memcg_page_state(memcg, NR_ANON_MAPPED);
      3580                  if (swap)
      3581                          val += memcg_page_state(memcg, MEMCG_SWAP);
      3582          } else {
      3583                  if (!swap)
      3584                          val = page_counter_read(&memcg->memory);
      3585                  else
      3586                          val = page_counter_read(&memcg->memsw);
      3587          }
      3588          return val;
      3589  }

__mem_cgroup_threshold() indeed holds the rcu lock.  In addition, the
thresholding code is invoked during stat changes, and those contexts
have irqs disabled as well.  If the lock breaking occurs inside the
flush function, it will result in a sleep from an atomic context.

Use the irqsafe flushing variant in mem_cgroup_usage() to fix this.

Link: https://lkml.kernel.org/r/20210726150019.251820-1-hannes@cmpxchg.org
Fixes: 2d146aa3 ("mm: memcontrol: switch to rstat")
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NChris Down <chris@chrisdown.name>
Reviewed-by: NRik van Riel <riel@surriel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

44ca785a

cgroup: rstat: fix A-A deadlock on 32bit around u64_stats_sync · 45d2e559

由 Tejun Heo 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.14-rc6
commit c3df5fb5
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c3df5fb57fe8

-------------------------------------------------------------------------

0fa294fb ("cgroup: Replace cgroup_rstat_mutex with a spinlock") added
cgroup_rstat_flush_irqsafe() allowing flushing to happen from the irq
context. However, rstat paths use u64_stats_sync to synchronize access to
64bit stat counters on 32bit machines. u64_stats_sync is implemented using
seq_lock and trying to read from an irq context can lead to A-A deadlock if
the irq happens to interrupt the stat update.

Fix it by using the irqsafe variants - u64_stats_update_begin_irqsave() and
u64_stats_update_end_irqrestore() - in the update paths. Note that none of
this matters on 64bit machines. All these are just for 32bit SMP setups.

Note that the interface was introduced way back, its first and currently
only use was recently added by 2d146aa3 ("mm: memcontrol: switch to
rstat"). Stable tagging targets this commit.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NRik van Riel <riel@surriel.com>
Fixes: 2d146aa3 ("mm: memcontrol: switch to rstat")
Cc: stable@vger.kernel.org # v5.13+
Conflict:
	block/blk-cgroup.c
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

45d2e559

kselftests: cgroup: update kmem test for new vmstat implementation · 4abad746

由 Johannes Weiner 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.13-rc1
commit 4bbcc5a4
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4bbcc5a41c54

------------------------------------------------------------------

With memcg having switched to rstat, memory.stat output is precise.
Update the cgroup selftest to reflect the expectations and error
tolerances of the new implementation.

Also add newly tracked types of memory to the memory.stat side of the
equation, since they're included in memory.current and could throw false
positives.

Link: https://lkml.kernel.org/r/20210209163304.77088-9-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4abad746

mm: memcontrol: consolidate lruvec stat flushing · c6963baf

由 Johannes Weiner 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.13-rc1
commit 2cd21c89
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2cd21c89800c

-----------------------------------------------------------------------

There are two functions to flush the per-cpu data of an lruvec into the
rest of the cgroup tree: when the cgroup is being freed, and when a CPU
disappears during hotplug.  The difference is whether all CPUs or just
one is being collected, but the rest of the flushing code is the same.
Merge them into one function and share the common code.

Link: https://lkml.kernel.org/r/20210209163304.77088-8-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c6963baf

mm: memcontrol: switch to rstat · 0030a2ab

由 Johannes Weiner 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.13-rc1
commit 2d146aa3
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2d146aa3aa84

----------------------------------------------------------------------

Replace the memory controller's custom hierarchical stats code with the
generic rstat infrastructure provided by the cgroup core.

The current implementation does batched upward propagation from the
write side (i.e.  as stats change).  The per-cpu batches introduce an
error, which is multiplied by the number of subgroups in a tree.  In
systems with many CPUs and sizable cgroup trees, the error can be large
enough to confuse users (e.g.  32 batch pages * 32 CPUs * 32 subgroups
results in an error of up to 128M per stat item).  This can entirely
swallow allocation bursts inside a workload that the user is expecting
to see reflected in the statistics.

In the past, we've done read-side aggregation, where a memory.stat read
would have to walk the entire subtree and add up per-cpu counts.  This
became problematic with lazily-freed cgroups: we could have large
subtrees where most cgroups were entirely idle.  Hence the switch to
change-driven upward propagation.  Unfortunately, it needed to trade
accuracy for speed due to the write side being so hot.

Rstat combines the best of both worlds: from the write side, it cheaply
maintains a queue of cgroups that have pending changes, so that the read
side can do selective tree aggregation.  This way the reported stats
will always be precise and recent as can be, while the aggregation can
skip over potentially large numbers of idle cgroups.

The way rstat works is that it implements a tree for tracking cgroups
with pending local changes, as well as a flush function that walks the
tree upwards.  The controller then drives this by 1) telling rstat when
a local cgroup stat changes (e.g.  mod_memcg_state) and 2) when a flush
is required to get uptodate hierarchy stats for a given subtree (e.g.
when memory.stat is read).  The controller also provides a flush
callback that is called during the rstat flush walk for each cgroup and
aggregates its local per-cpu counters and propagates them upwards.

This adds a second vmstats to struct mem_cgroup (MEMCG_NR_STAT +
NR_VM_EVENT_ITEMS) to track pending subtree deltas during upward
aggregation.  It removes 3 words from the per-cpu data.  It eliminates
memcg_exact_page_state(), since memcg_page_state() is now exact.

[akpm@linux-foundation.org: merge fix]
[hannes@cmpxchg.org: fix a sleep in atomic section problem]
  Link: https://lkml.kernel.org/r/20210315234100.64307-1-hannes@cmpxchg.org

Link: https://lkml.kernel.org/r/20210209163304.77088-7-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NRoman Gushchin <guro@fb.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflict:
	include/linux/memcontrol.h
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0030a2ab

cgroup: rstat: punt root-level optimization to individual controllers · 5604a5b3

由 Johannes Weiner 提交于 11月 30, 2021

mainline inclusion
from mianline-v5.13-rc1
commit dc26532a
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference : https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dc26532aed0ab

---------------------------------------------------------------------

Current users of the rstat code can source root-level statistics from
the native counters of their respective subsystem, allowing them to
forego aggregation at the root level.  This optimization is currently
implemented inside the generic rstat code, which doesn't track the root
cgroup and doesn't invoke the subsystem flush callbacks on it.

However, the memory controller cannot do this optimization, because
cgroup1 breaks out memory specifically for the local level, including at
the root level.  In preparation for the memory controller switching to
rstat, move the optimization from rstat core to the controllers.

Afterwards, rstat will always track the root cgroup for changes and
invoke the subsystem callbacks on it; and it's up to the subsystem to
special-case and skip aggregation of the root cgroup if it can source
this information through other, cheaper means.

This is the case for the io controller and the cgroup base stats.  In
their respective flush callbacks, check whether the parent is the root
cgroup, and if so, skip the unnecessary upward propagation.

The extra cost of tracking the root cgroup is negligible: on stat
changes, we actually remove a branch that checks for the root.  The
queueing for a flush touches only per-cpu data, and only the first stat
change since a flush requires a (per-cpu) lock.

Link: https://lkml.kernel.org/r/20210209163304.77088-6-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5604a5b3

cgroup: rstat: support cgroup1 · 3191fa63

由 Johannes Weiner 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.13-rc1
commit a7df69b8
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a7df69b81aac

--------------------------------------------------------------------

Rstat currently only supports the default hierarchy in cgroup2.  In
order to replace memcg's private stats infrastructure - used in both
cgroup1 and cgroup2 - with rstat, the latter needs to support cgroup1.

The initialization and destruction callbacks for regular cgroups are
already in place.  Remove the cgroup_on_dfl() guards to handle cgroup1.

The initialization of the root cgroup is currently hardcoded to only
handle cgrp_dfl_root.cgrp.  Move those callbacks to cgroup_setup_root()
and cgroup_destroy_root() to handle the default root as well as the
various cgroup1 roots we may set up during mounting.

The linking of css to cgroups happens in code shared between cgroup1 and
cgroup2 as well.  Simply remove the cgroup_on_dfl() guard.

Linkage of the root css to the root cgroup is a bit trickier: per
default, the root css of a subsystem controller belongs to the default
hierarchy (i.e.  the cgroup2 root).  When a controller is mounted in its
cgroup1 version, the root css is stolen and moved to the cgroup1 root;
on unmount, the css moves back to the default hierarchy.  Annotate
rebind_subsystems() to move the root css linkage along between roots.

Link: https://lkml.kernel.org/r/20210209163304.77088-5-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3191fa63

mm: memcontrol: privatize memcg_page_state query functions · 06876391

由 Johannes Weiner 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.13-rc1
commit a18e6e6e
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a18e6e6e150a

-------------------------------------------------------------------

There are no users outside of the memory controller itself. The rest
of the kernel cares either about node or lruvec stats.

Link: https://lkml.kernel.org/r/20210209163304.77088-4-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Reviewed-by: NRoman Gushchin <guro@fb.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflict:
	include/linux/memcontrol.h
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

06876391

mm: memcontrol: kill mem_cgroup_nodeinfo() · 82f0ea81

由 Johannes Weiner 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.13-rc1
commit a3747b53
category: feature
bugzilla: 185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3747b53b1771

----------------------------------------------------

No need to encapsulate a simple struct member access.

Link: https://lkml.kernel.org/r/20210209163304.77088-3-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Reviewed-by: NRoman Gushchin <guro@fb.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflict:
	mm/memcontrol.c
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

82f0ea81

mm: memcontrol: fix cpuhotplug statistics flushing · 1d36d2c1

由 Johannes Weiner 提交于 11月 30, 2021

mainline inclusion
from mainline-5.13-rc1
commit a3d4c05a
category: feature
bugzilla:185803 https://gitee.com/openeuler/kernel/issues/I4JOG9?from=project-issue
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3d4c05a447486

---------------------------------------------------

Patch series "mm: memcontrol: switch to rstat", v3.

This series converts memcg stats tracking to the streamlined rstat
infrastructure provided by the cgroup core code.  rstat is already used by
the CPU controller and the IO controller.  This change is motivated by
recent accuracy problems in memcg's custom stats code, as well as the
benefits of sharing common infra with other controllers.

The current memcg implementation does batched tree aggregation on the
write side: local stat changes are cached in per-cpu counters, which are
then propagated upward in batches when a threshold (32 pages) is exceeded.
This is cheap, but the error introduced by the lazy upward propagation
adds up: 32 pages times CPUs times cgroups in the subtree.  We've had
complaints from service owners that the stats do not reliably track and
react to allocation behavior as expected, sometimes swallowing the results
of entire test applications.

The original memcg stat implementation used to do tree aggregation
exclusively on the read side: local stats would only ever be tracked in
per-cpu counters, and a memory.stat read would iterate the entire subtree
and sum those counters up.  This didn't keep up with the times:

 - Cgroup trees are much bigger now. We switched to lazily-freed
   cgroups, where deleted groups would hang around until their remaining
   page cache has been reclaimed. This can result in large subtrees that
   are expensive to walk, while most of the groups are idle and their
   statistics don't change much anymore.

 - Automated monitoring increased. With the proliferation of userspace
   oom killing, proactive reclaim, and higher-resolution logging of
   workload trends in general, top-level stat files are polled at least
   once a second in many deployments.

 - The lifetime of cgroups got shorter. Where most cgroup setups in the
   past would have a few large policy-oriented cgroups for everything
   running on the system, newer cgroup deployments tend to create one
   group per application - which gets deleted again as the processes
   exit. An aggregation scheme that doesn't retain child data inside the
   parents loses event history of the subtree.

Rstat addresses all three of those concerns through intelligent,
persistent read-side aggregation.  As statistics change at the local
level, rstat tracks - on a per-cpu basis - only those parts of a subtree
that have changes pending and require aggregation.  The actual
aggregation occurs on the colder read side - which can now skip over
(potentially large) numbers of recently idle cgroups.

===

The test_kmem cgroup selftest is currently failing due to excessive
cumulative vmstat drift from 100 subgroups:

    ok 1 test_kmem_basic
    memory.current = 8810496
    slab + anon + file + kernel_stack = 17074568
    slab = 6101384
    anon = 946176
    file = 0
    kernel_stack = 10027008
    not ok 2 test_kmem_memcg_deletion
    ok 3 test_kmem_proc_kpagecgroup
    ok 4 test_kmem_kernel_stacks
    ok 5 test_kmem_dead_cgroups
    ok 6 test_percpu_basic

As you can see, memory.stat items far exceed memory.current.  The kernel
stack alone is bigger than all of charged memory.  That's because the
memory of the test has been uncharged from memory.current, but the
negative vmstat deltas are still sitting in the percpu caches.

The test at this time isn't even counting percpu, pagetables etc.  yet,
which would further contribute to the error.  The last patch in the series
updates the test to include them - as well as reduces the vmstat
tolerances in general to only expect page_counter batching.

With all patches applied, the (now more stringent) test succeeds:

    ok 1 test_kmem_basic
    ok 2 test_kmem_memcg_deletion
    ok 3 test_kmem_proc_kpagecgroup
    ok 4 test_kmem_kernel_stacks
    ok 5 test_kmem_dead_cgroups
    ok 6 test_percpu_basic

===

A kernel build test confirms that overhead is comparable.  Two kernels are
built simultaneously in a nested tree with several idle siblings:

root - kernelbuild - one - two - three - four - build-a (defconfig, make -j16)
                                             `- build-b (defconfig, make -j16)
                                             `- idle-1
                                             `- ...
                                             `- idle-9

During the builds, kernelbuild/memory.stat is read once a second.

A perf diff shows that the changes in cycle distribution is
minimal. Top 10 kernel symbols:

     0.09%     +0.08%  [kernel.kallsyms]                       [k] __mod_memcg_lruvec_state
     0.00%     +0.06%  [kernel.kallsyms]                       [k] cgroup_rstat_updated
     0.08%     -0.05%  [kernel.kallsyms]                       [k] __mod_memcg_state.part.0
     0.16%     -0.04%  [kernel.kallsyms]                       [k] release_pages
     0.00%     +0.03%  [kernel.kallsyms]                       [k] __count_memcg_events
     0.01%     +0.03%  [kernel.kallsyms]                       [k] mem_cgroup_charge_statistics.constprop.0
     0.10%     -0.02%  [kernel.kallsyms]                       [k] get_mem_cgroup_from_mm
     0.05%     -0.02%  [kernel.kallsyms]                       [k] mem_cgroup_update_lru_size
     0.57%     +0.01%  [kernel.kallsyms]                       [k] asm_exc_page_fault

===

The on-demand aggregated stats are now fully accurate:

$ grep -e nr_inactive_file /proc/vmstat | awk '{print($1,$2*4096)}'; \
  grep -e inactive_file /sys/fs/cgroup/memory.stat

vanilla:                              patched:
nr_inactive_file 1574105088           nr_inactive_file 1027801088
   inactive_file 1577410560              inactive_file 1027801088

===

This patch (of 8):

The memcg hotunplug callback erroneously flushes counts on the local CPU,
not the counts of the CPU going away; those counts will be lost.

Flush the CPU that is actually going away.

Also simplify the code a bit by using mod_memcg_state() and
count_memcg_events() instead of open-coding the upward flush - this is
comparable to how vmstat.c handles hotunplug flushing.

Link: https://lkml.kernel.org/r/20210209163304.77088-1-hannes@cmpxchg.org
Link: https://lkml.kernel.org/r/20210209163304.77088-2-hannes@cmpxchg.org
Fixes: a983b5eb ("mm: memcontrol: fix excessive complexity in memory.stat reporting")
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Reviewed-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1d36d2c1

hugetlbfs: flush TLBs correctly after huge_pmd_unshare · 6a2ec88f

由 Nadav Amit 提交于 11月 30, 2021

stable inclusion
from stable-5.10.82
commit 40bc831ab5f630431010d1ff867390b07418a7ee
category: bugfix
bugzilla: 185820 https://gitee.com/openeuler/kernel/issues/I4DDEL
CVE: CVE-2021-4002

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=40bc831ab5f630431010d1ff867390b07418a7ee

-----------------------------------------------

commit a4a118f2 upstream.

When __unmap_hugepage_range() calls to huge_pmd_unshare() succeed, a TLB
flush is missing.  This TLB flush must be performed before releasing the
i_mmap_rwsem, in order to prevent an unshared PMDs page from being
released and reused before the TLB flush took place.

Arguably, a comprehensive solution would use mmu_gather interface to
batch the TLB flushes and the PMDs page release, however it is not an
easy solution: (1) try_to_unmap_one() and try_to_migrate_one() also call
huge_pmd_unshare() and they cannot use the mmu_gather interface; and (2)
deferring the release of the page reference for the PMDs page until
after i_mmap_rwsem is dropeed can confuse huge_pmd_unshare() into
thinking PMDs are shared when they are not.

Fix __unmap_hugepage_range() by adding the missing TLB flush, and
forcing a flush when unshare is successful.

Fixes: 24669e58 ("hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages)" # 3.6
Signed-off-by: NNadav Amit <namit@vmware.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6a2ec88f

rsi: fix control-message timeout · 99d9b2ed

由 Johan Hovold 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit 62424fe4c2cf38f27c1eef66cbfa226ffe233e90
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=62424fe4c2cf38f27c1eef66cbfa226ffe233e90

--------------------------------

commit 541fd20c upstream.

USB control-message timeouts are specified in milliseconds and should
specifically not vary with CONFIG_HZ.

Use the common control-message timeout define for the five-second
timeout.

Fixes: dad0d04f ("rsi: Add RS9113 wireless driver")
Cc: stable@vger.kernel.org      # 3.15
Signed-off-by: NJohan Hovold <johan@kernel.org>
Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211025120522.6045-5-johan@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

99d9b2ed

media: staging/intel-ipu3: css: Fix wrong size comparison imgu_css_fw_init · b324e6ef

由 Gustavo A. R. Silva 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit 8971158af1e0138bf0576c16f9aa153f151fb6b4
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8971158af1e0138bf0576c16f9aa153f151fb6b4

--------------------------------

commit a44f9d6f upstream.

There is a wrong comparison of the total size of the loaded firmware
css->fw->size with the size of a pointer to struct imgu_fw_header.

Turn binary_header into a flexible-array member[1][2], use the
struct_size() helper and fix the wrong size comparison. Notice
that the loaded firmware needs to contain at least one 'struct
imgu_fw_info' item in the binary_header[] array.

It's also worth mentioning that

	"css->fw->size < struct_size(css->fwp, binary_header, 1)"

with binary_header declared as a flexible-array member is equivalent
to

	"css->fw->size < sizeof(struct imgu_fw_header)"

with binary_header declared as a one-element array (as in the original
code).

The replacement of the one-element array with a flexible-array member
also helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109

Fixes: 09d290f0 ("media: staging/intel-ipu3: css: Add support for firmware management")
Cc: stable@vger.kernel.org
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: NSakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b324e6ef

staging: rtl8192u: fix control-message timeouts · f98a9159

由 Johan Hovold 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit 1cf43e928954f5d511d3bb28c045ab430e9440a8
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1cf43e928954f5d511d3bb28c045ab430e9440a8

--------------------------------

commit 4cfa36d3 upstream.

USB control-message timeouts are specified in milliseconds and should
specifically not vary with CONFIG_HZ.

Fixes: 8fc8598e ("Staging: Added Realtek rtl8192u driver to staging")
Cc: stable@vger.kernel.org      # 2.6.33
Acked-by: NLarry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: NJohan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20211025120910.6339-2-johan@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f98a9159

staging: r8712u: fix control-message timeout · 13505527

由 Johan Hovold 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit 9963ba5b9d495d05bf32f37dc42d69afed46639b
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9963ba5b9d495d05bf32f37dc42d69afed46639b

--------------------------------

commit ce494052 upstream.

USB control-message timeouts are specified in milliseconds and should
specifically not vary with CONFIG_HZ.

Fixes: 2865d42c ("staging: r8712u: Add the new driver to the mainline kernel")
Cc: stable@vger.kernel.org      # 2.6.37
Acked-by: NLarry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: NJohan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20211025120910.6339-3-johan@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

13505527

comedi: vmk80xx: fix bulk and interrupt message timeouts · 79d516bd

由 Johan Hovold 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit 844b02496eaca61c3eb2e7430dfc25a06302bff3
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=844b02496eaca61c3eb2e7430dfc25a06302bff3

--------------------------------

commit a56d3e40 upstream.

USB bulk and interrupt message timeouts are specified in milliseconds
and should specifically not vary with CONFIG_HZ.

Note that the bulk-out transfer timeout was set to the endpoint
bInterval value, which should be ignored for bulk endpoints and is
typically set to zero. This meant that a failing bulk-out transfer
would never time out.

Assume that the 10 second timeout used for all other transfers is more
than enough also for the bulk-out endpoint.

Fixes: 985cafcc ("Staging: Comedi: vmk80xx: Add k8061 support")
Fixes: 951348b3 ("staging: comedi: vmk80xx: wait for URBs to complete")
Cc: stable@vger.kernel.org      # 2.6.31
Signed-off-by: NJohan Hovold <johan@kernel.org>
Reviewed-by: NIan Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20211025114532.4599-6-johan@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

79d516bd

comedi: vmk80xx: fix bulk-buffer overflow · 79067df9

由 Johan Hovold 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit b7fd7f3387f070215e6be341e68eb5c087eeecc0
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b7fd7f3387f070215e6be341e68eb5c087eeecc0

--------------------------------

commit 78cdfd62 upstream.

The driver is using endpoint-sized buffers but must not assume that the
tx and rx buffers are of equal size or a malicious device could overflow
the slab-allocated receive buffer when doing bulk transfers.

Fixes: 985cafcc ("Staging: Comedi: vmk80xx: Add k8061 support")
Cc: stable@vger.kernel.org      # 2.6.31
Signed-off-by: NJohan Hovold <johan@kernel.org>
Reviewed-by: NIan Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20211025114532.4599-5-johan@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

79067df9

comedi: vmk80xx: fix transfer-buffer overflows · 87b35b57

由 Johan Hovold 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit 33d7a470730dfe7c9bfc8da84575cf2cedd60d00
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=33d7a470730dfe7c9bfc8da84575cf2cedd60d00

--------------------------------

commit a23461c4 upstream.

The driver uses endpoint-sized USB transfer buffers but up until
recently had no sanity checks on the sizes.

Commit e1f13c87 ("staging: comedi: check validity of wMaxPacketSize
of usb endpoints found") inadvertently fixed NULL-pointer dereferences
when accessing the transfer buffers in case a malicious device has a
zero wMaxPacketSize.

Make sure to allocate buffers large enough to handle also the other
accesses that are done without a size check (e.g. byte 18 in
vmk80xx_cnt_insn_read() for the VMK8061_MODEL) to avoid writing beyond
the buffers, for example, when doing descriptor fuzzing.

The original driver was for a low-speed device with 8-byte buffers.
Support was later added for a device that uses bulk transfers and is
presumably a full-speed device with a maximum 64-byte wMaxPacketSize.

Fixes: 985cafcc ("Staging: Comedi: vmk80xx: Add k8061 support")
Cc: stable@vger.kernel.org      # 2.6.31
Signed-off-by: NJohan Hovold <johan@kernel.org>
Reviewed-by: NIan Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20211025114532.4599-4-johan@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

87b35b57

comedi: ni_usb6501: fix NULL-deref in command paths · 3a8b46c7

由 Johan Hovold 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit ef143dc0c3defe56730ecd3a9de7b3e1d7e557c1
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ef143dc0c3defe56730ecd3a9de7b3e1d7e557c1

--------------------------------

commit 907767da upstream.

The driver uses endpoint-sized USB transfer buffers but had no sanity
checks on the sizes. This can lead to zero-size-pointer dereferences or
overflowed transfer buffers in ni6501_port_command() and
ni6501_counter_command() if a (malicious) device has smaller max-packet
sizes than expected (or when doing descriptor fuzz testing).

Add the missing sanity checks to probe().

Fixes: a03bb00e ("staging: comedi: add NI USB-6501 support")
Cc: stable@vger.kernel.org      # 3.18
Cc: Luca Ellero <luca.ellero@brickedbrain.com>
Reviewed-by: NIan Abbott <abbotti@mev.co.uk>
Signed-off-by: NJohan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20211027093529.30896-2-johan@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3a8b46c7

comedi: dt9812: fix DMA buffers on stack · 77db7c95

由 Johan Hovold 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit 786f5b03450454557ff858a8bead5d7c0cbf78d6
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=786f5b03450454557ff858a8bead5d7c0cbf78d6

--------------------------------

commit 536de747 upstream.

USB transfer buffers are typically mapped for DMA and must not be
allocated on the stack or transfers will fail.

Allocate proper transfer buffers in the various command helpers and
return an error on short transfers instead of acting on random stack
data.

Note that this also fixes a stack info leak on systems where DMA is not
used as 32 bytes are always sent to the device regardless of how short
the command is.

Fixes: 63274cd7 ("Staging: comedi: add usb dt9812 driver")
Cc: stable@vger.kernel.org      # 2.6.29
Reviewed-by: NIan Abbott <abbotti@mev.co.uk>
Signed-off-by: NJohan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20211027093529.30896-3-johan@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

77db7c95

isofs: Fix out of bound access for corrupted isofs image · b79856ad

由 Jan Kara 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit 86d4aedcbc69c0f84551fb70f953c24e396de2d7
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=86d4aedcbc69c0f84551fb70f953c24e396de2d7

--------------------------------

commit e96a1866 upstream.

When isofs image is suitably corrupted isofs_read_inode() can read data
beyond the end of buffer. Sanity-check the directory entry length before
using it.

Reported-and-tested-by: syzbot+6fc7fb214625d82af7d1@syzkaller.appspotmail.com
CC: stable@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b79856ad

staging: rtl8712: fix use-after-free in rtl8712_dl_fw · 683bd5dd

由 Pavel Skripkin 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit c430094541a80575259a94ff879063ef01473506
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c430094541a80575259a94ff879063ef01473506

--------------------------------

commit c052cc1a upstream.

Syzbot reported use-after-free in rtl8712_dl_fw(). The problem was in
race condition between r871xu_dev_remove() ->ndo_open() callback.

It's easy to see from crash log, that driver accesses released firmware
in ->ndo_open() callback. It may happen, since driver was releasing
firmware _before_ unregistering netdev. Fix it by moving
unregister_netdev() before cleaning up resources.

Call Trace:
...
 rtl871x_open_fw drivers/staging/rtl8712/hal_init.c:83 [inline]
 rtl8712_dl_fw+0xd95/0xe10 drivers/staging/rtl8712/hal_init.c:170
 rtl8712_hal_init drivers/staging/rtl8712/hal_init.c:330 [inline]
 rtl871x_hal_init+0xae/0x180 drivers/staging/rtl8712/hal_init.c:394
 netdev_open+0xe6/0x6c0 drivers/staging/rtl8712/os_intfs.c:380
 __dev_open+0x2bc/0x4d0 net/core/dev.c:1484

Freed by task 1306:
...
 release_firmware+0x1b/0x30 drivers/base/firmware_loader/main.c:1053
 r871xu_dev_remove+0xcc/0x2c0 drivers/staging/rtl8712/usb_intf.c:599
 usb_unbind_interface+0x1d8/0x8d0 drivers/usb/core/driver.c:458

Fixes: 8c213fa5 ("staging: r8712u: Use asynchronous firmware loading")
Cc: stable <stable@vger.kernel.org>
Reported-and-tested-by: syzbot+c55162be492189fb4f51@syzkaller.appspotmail.com
Signed-off-by: NPavel Skripkin <paskripkin@gmail.com>
Link: https://lore.kernel.org/r/20211019211718.26354-1-paskripkin@gmail.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

683bd5dd

printk/console: Allow to disable console output by using console="" or console=null · cc09d39b

由 Petr Mladek 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit ab4af56ae2508df3bdabcb192874086f5f07d98c
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ab4af56ae2508df3bdabcb192874086f5f07d98c

--------------------------------

commit 3cffa06a upstream.

The commit 48021f98 ("printk: handle blank console arguments
passed in.") prevented crash caused by empty console= parameter value.

Unfortunately, this value is widely used on Chromebooks to disable
the console output. The above commit caused performance regression
because the messages were pushed on slow console even though nobody
was watching it.

Use ttynull driver explicitly for console="" and console=null
parameters. It has been created for exactly this purpose.

It causes that preferred_console is set. As a result, ttySX and ttyX
are not used as a fallback. And only ttynull console gets registered by
default.

It still allows to register other consoles either by additional console=
parameters or SPCR. It prevents regression because it worked this way even
before. Also it is a sane semantic. Preventing output on all consoles
should be done another way, for example, by introducing mute_console
parameter.

Link: https://lore.kernel.org/r/20201006025935.GA597@jagdpanzerIV.localdomainSuggested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20201111135450.11214-3-pmladek@suse.com
Cc: Yi Fan <yfa@google.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cc09d39b

binder: don't detect sender/target during buffer cleanup · 9a57dc1c

由 Todd Kjos 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit 07d1db141e478917600fa32be11e5b828ff9ed13
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=07d1db141e478917600fa32be11e5b828ff9ed13

--------------------------------

commit 32e9f56a upstream.

When freeing txn buffers, binder_transaction_buffer_release()
attempts to detect whether the current context is the target by
comparing current->group_leader to proc->tsk. This is an unreliable
test. Instead explicitly pass an 'is_failure' boolean.

Detecting the sender was being used as a way to tell if the
transaction failed to be sent.  When cleaning up after
failing to send a transaction, there is no need to close
the fds associated with a BINDER_TYPE_FDA object. Now
'is_failure' can be used to accurately detect this case.

Fixes: 44d8047f ("binder: use standard functions to allocate fds")
Cc: stable <stable@vger.kernel.org>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: NTodd Kjos <tkjos@google.com>
Link: https://lore.kernel.org/r/20211015233811.3532235-1-tkjos@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9a57dc1c

usb-storage: Add compatibility quirk flags for iODD 2531/2541 · f4a504c6

由 James Buren 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit 42681b90c4db8bf29fc2cc93789ec093bb209296
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=42681b90c4db8bf29fc2cc93789ec093bb209296

--------------------------------

commit 05c8f1b6 upstream.

These drive enclosures have firmware bugs that make it impossible to mount
a new virtual ISO image after Linux ejects the old one if the device is
locked by Linux. Windows bypasses this problem by the fact that they do
not lock the device. Add a quirk to disable device locking for these
drive enclosures.
Acked-by: NAlan Stern <stern@rowland.harvard.edu>
Signed-off-by: NJames Buren <braewoods+lkml@braewoods.net>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20211014015504.2695089-1-braewoods+lkml@braewoods.netSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f4a504c6

usb: musb: Balance list entry in musb_gadget_queue · 8cc54473

由 Viraj Shah 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit 1309753b7841db412063bcce3f2a0eddf1443e9d
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1309753b7841db412063bcce3f2a0eddf1443e9d

--------------------------------

commit 21b5fcdc upstream.

musb_gadget_queue() adds the passed request to musb_ep::req_list. If the
endpoint is idle and it is the first request then it invokes
musb_queue_resume_work(). If the function returns an error then the
error is passed to the caller without any clean-up and the request
remains enqueued on the list. If the caller enqueues the request again
then the list corrupts.

Remove the request from the list on error.

Fixes: ea2f35c0 ("usb: musb: Fix sleeping function called from invalid context for hdrc glue")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NViraj Shah <viraj.shah@linutronix.de>
Link: https://lore.kernel.org/r/20211021093644.4734-1-viraj.shah@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8cc54473

usb: gadget: Mark USB_FSL_QE broken on 64-bit · a40c4f80

由 Geert Uytterhoeven 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit 27409143122f9c68f74da42ce987cc6badb40784
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=27409143122f9c68f74da42ce987cc6badb40784

--------------------------------

commit a0548b26 upstream.

On 64-bit:

    drivers/usb/gadget/udc/fsl_qe_udc.c: In function ‘qe_ep0_rx’:
    drivers/usb/gadget/udc/fsl_qe_udc.c:842:13: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
      842 |     vaddr = (u32)phys_to_virt(in_be32(&bd->buf));
	  |             ^
    In file included from drivers/usb/gadget/udc/fsl_qe_udc.c:41:
    drivers/usb/gadget/udc/fsl_qe_udc.c:843:28: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
      843 |     frame_set_data(pframe, (u8 *)vaddr);
	  |                            ^

The driver assumes physical and virtual addresses are 32-bit, hence it
cannot work on 64-bit platforms.
Acked-by: NLi Yang <leoyang.li@nxp.com>
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20211027080849.3276289-1-geert@linux-m68k.org
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a40c4f80

usb: ehci: handshake CMD_RUN instead of STS_HALT · ca668928

由 Neal Liu 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit 94e5305a381658064949e2f427f4a7591f4194aa
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=94e5305a381658064949e2f427f4a7591f4194aa

--------------------------------

commit 7f2d7378 upstream.

For Aspeed, HCHalted status depends on not only Run/Stop but also
ASS/PSS status.
Handshake CMD_RUN on startup instead.
Tested-by: NTao Ren <rentao.bupt@gmail.com>
Reviewed-by: NTao Ren <rentao.bupt@gmail.com>
Acked-by: NAlan Stern <stern@rowland.harvard.edu>
Signed-off-by: NNeal Liu <neal_liu@aspeedtech.com>
Link: https://lore.kernel.org/r/20210910073619.26095-1-neal_liu@aspeedtech.com
Cc: Joel Stanley <joel@jms.id.au>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ca668928

Revert "x86/kvm: fix vcpu-id indexed array sizes" · ec196695

由 Juergen Gross 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit a8db6fd04d58b92d0698ee56e42134f26b236a90
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a8db6fd04d58b92d0698ee56e42134f26b236a90

--------------------------------

commit 1e254d0d upstream.

This reverts commit 76b4f357.

The commit has the wrong reasoning, as KVM_MAX_VCPU_ID is not defining the
maximum allowed vcpu-id as its name suggests, but the number of vcpu-ids.
So revert this patch again.
Suggested-by: NEduardo Habkost <ehabkost@redhat.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210913135745.13944-2-jgross@suse.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ec196695

KVM: x86: avoid warning with -Wbitwise-instead-of-logical · 970d9071

由 Paolo Bonzini 提交于 11月 30, 2021

stable inclusion
from stable-5.10.79
commit ecf58653f1e4ab88b4eb62db8fe799826d99d5ec
bugzilla: 185793 https://gitee.com/openeuler/kernel/issues/I4K65C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ecf58653f1e4ab88b4eb62db8fe799826d99d5ec

--------------------------------

commit 3d5e7a28 upstream.

This is a new warning in clang top-of-tree (will be clang 14):

In file included from arch/x86/kvm/mmu/mmu.c:27:
arch/x86/kvm/mmu/spte.h:318:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
        return __is_bad_mt_xwr(rsvd_check, spte) |
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                 ||
arch/x86/kvm/mmu/spte.h:318:9: note: cast one or both operands to int to silence this warning

The code is fine, but change it anyway to shut up this clever clogs
of a compiler.

Reported-by: torvic9@mailbox.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[nathan: Backport to 5.10, which does not have 961f8445]
Signed-off-by: NNathan Chancellor <nathan@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NBin Li <huawei.libin@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

970d9071

ovl: warn about orphan metacopy · 0f3d73a1

由 Kevin Locke 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 0a8d0b64
category: bugfix
bugzilla: 185806 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a8d0b64dd6acfbc9e9b79022654bbe1ade4a29a

-------------------------------------------------

When the lower file of a metacopy is inaccessible, -EIO is returned.  For
users not familiar with overlayfs internals, such as myself, the meaning of
this error may not be apparent or easy to determine, since the (metacopy)
file is present and open/stat succeed when accessed outside of the overlay.

Add a rate-limited warning for orphan metacopy to give users a hint when
investigating such errors.

Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxi23Zsmfb4rCed1n=On0NNA5KZD74jjjeyz+et32sk-gg@mail.gmail.com/Signed-off-by: NKevin Locke <kevin@kevinlocke.name>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NZheng Liang <zhengliang6@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0f3d73a1

ext4: fix e2fsprogs checksum failure for mounted filesystem · b939c55f

由 Jan Kara 提交于 11月 30, 2021

mainline inclusion
from mainline-v5.15
commit b2bbb92f
category: bugfix
bugzilla: 179956 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b2bbb92f7042e8075fb036bf97043339576330c3

-------------------------------------------------

Commit 81414b4d ("ext4: remove redundant sb checksum
recomputation") removed checksum recalculation after updating
superblock free space / inode counters in ext4_fill_super() based on
the fact that we will recalculate the checksum on superblock
writeout.

That is correct assumption but until the writeout happens (which can
take a long time) the checksum is incorrect in the buffer cache and if
programs such as tune2fs or resize2fs is called shortly after a file
system is mounted can fail.  So return back the checksum recalculation
and add a comment explaining why.

Fixes: 81414b4d ("ext4: remove redundant sb checksum recomputation")
Cc: stable@kernel.org
Reported-by: NBoyang Xue <bxue@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210812124737.21981-1-jack@suse.czSigned-off-by: NZheng Liang <zhengliang6@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b939c55f

ima: Fix warning: no previous prototype for function 'ima_add_kexec_buffer' · e64530ee

由 Lakshmi Ramasubramanian 提交于 11月 30, 2021

mainline inclusion
from mainline-5.14
commit: c6791349
category: bugfix
bugzilla: 182971 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c67913492fec317bc53ffdff496b6ba856d2868c

---------------------------

The function prototype for ima_add_kexec_buffer() is present
in 'linux/ima.h'.  But this header file is not included in
ima_kexec.c where the function is implemented.  This results
in the following compiler warning when "-Wmissing-prototypes" flag
is turned on:

  security/integrity/ima/ima_kexec.c:81:6: warning: no previous prototype
  for function 'ima_add_kexec_buffer' [-Wmissing-prototypes]

Include the header file 'linux/ima.h' in ima_kexec.c to fix
the compiler warning.

Fixes: dce92f6b (arm64: Enable passing IMA log to next kernel on kexec)
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NLakshmi Ramasubramanian <nramas@linux.microsoft.com>
Acked-by: NRob Herring <robh@kernel.org>
Signed-off-by: NMimi Zohar <zohar@linux.ibm.com>
Signed-off-by: NGuo Zihua <guozihua@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e64530ee

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功