提交 · ce16dc8c4fc9586c7aa67e39b869f90999f93c1c · openeuler / Kernel

03 7月, 2021 14 次提交

proc: Track /proc/$pid/attr/ opener mm_struct · ce16dc8c

由 Kees Cook 提交于 6月 18, 2021

stable inclusion
from stable-5.10.44
commit f70102cb369cde6ab7551ca58152d00fd3478fec
bugzilla: 109295
CVE: NA

--------------------------------

commit 591a22c1 upstream.

Commit bfb819ea ("proc: Check /proc/$pid/attr/ writes against file opener")
tried to make sure that there could not be a confusion between the opener of
a /proc/$pid/attr/ file and the writer. It used struct cred to make sure
the privileges didn't change. However, there were existing cases where a more
privileged thread was passing the opened fd to a differently privileged thread
(during container setup). Instead, use mm_struct to track whether the opener
and writer are still the same process. (This is what several other proc files
already do, though for different reasons.)
Reported-by: NChristian Brauner <christian.brauner@ubuntu.com>
Reported-by: NAndrea Righi <andrea.righi@canonical.com>
Tested-by: NAndrea Righi <andrea.righi@canonical.com>
Fixes: bfb819ea ("proc: Check /proc/$pid/attr/ writes against file opener")
Cc: stable@vger.kernel.org
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ce16dc8c

mtd: mtd_blkdevs: Initialize rq.limits.discard_granularity · 6aea4216

由 Zhihao Cheng 提交于 6月 15, 2021

hulk inclusion
category: bugfix
bugzilla: 109283
CVE: NA

-------------------------------------------------

Since commit b35fd742("block: check queue's limits.discard_granularity
in __blkdev_issue_discard()") checks rq.limits.discard_granularity in
__blkdev_issue_discard(), we may get following warnings on formatted ftl:

  WARNING: CPU: 2 PID: 7313 at block/blk-lib.c:51
  __blkdev_issue_discard+0x2a7/0x390

Reproducer:
  1. ftl_format /dev/mtd0
  2. modprobe ftl
  3. mkfs.vfat /dev/ftla
  4. mount -odiscard /dev/ftla temp
  5. dd if=/dev/zero of=temp/tst bs=1M count=10 oflag=direct
  6. dd if=/dev/zero of=temp/tst bs=1M count=10 oflag=direct

Fix it by initializing rq.limits.discard_granularity if device supports
discard operation.
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NTao Hou <houtao1@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6aea4216

block, bfq: set next_rq to waker_bfqq->next_rq in waker injection · ccc60759

由 Jia Cheng Hu 提交于 6月 15, 2021

mainline inclusion
from mainline-v5.12-rc1
commit d4fc3640
category: bugfix
bugzilla: 107810
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d4fc3640ff361a09e359867e0bca898abd2b7ecb

-----------------------------------------------

Since commit c5089591c3ba ("block, bfq: detect wakers and
unconditionally inject their I/O"), when the in-service bfq_queue, say
Q, is temporarily empty, BFQ checks whether there are I/O requests to
inject (also) from the waker bfq_queue for Q. To this goal, the value
pointed by bfqq->waker_bfqq->next_rq must be controlled. However, the
current implementation mistakenly looks at bfqq->next_rq, which
instead points to the next request of the currently served queue.

This mistake evidently causes losses of throughput in scenarios with
waker bfq_queues.

This commit corrects this mistake.

Fixes: c5089591c3ba ("block, bfq: detect wakers and unconditionally inject their I/O")
Signed-off-by: NJia Cheng Hu <jia.jiachenghu@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ccc60759

bdev: Do not return EBUSY if bdev discard races with write · d54cb0be

由 Jan Kara 提交于 6月 10, 2021

mainline inclusion
from mainline-5.12-rc1
commit 	767630c6
category: bugfix
bugzilla: 107770
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=767630c63bb23acf022adb265574996ca39a4645

-------------------------------------------------

blkdev_fallocate() tries to detect whether a discard raced with an
overlapping write by calling invalidate_inode_pages2_range(). However
this check can give both false negatives (when writing using direct IO
or when writeback already writes out the written pagecache range) and
false positives (when write is not actually overlapping but ends in the
same page when blocksize < pagesize). This actually causes issues for
qemu which is getting confused by EBUSY errors.

Fix the problem by removing this conflicting write detection since it is
inherently racy and thus of little use anyway.
Reported-by: NMaxim Levitsky <mlevitsk@redhat.com>
CC: "Darrick J. Wong" <darrick.wong@oracle.com>
Link: https://lore.kernel.org/qemu-devel/20201111153913.41840-1-mlevitsk@redhat.comSigned-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d54cb0be

powerpc/perf: Invoke per-CPU variable access with disabled interrupts · 164b8c99

由 Athira Rajeev 提交于 6月 10, 2021

mainline inclusion
from mainline-5.11-rc1
commit f66de7ac
category: bugfix
bugzilla: 108586
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f66de7ac4849eb42a7b18e26b8ee49e08130fd27

---------------------------

The power_pmu_event_init() callback access per-cpu variable
(cpu_hw_events) to check for event constraints and Branch Stack
(BHRB). Current usage is to disable preemption when accessing the
per-cpu variable, but this does not prevent timer callback from
interrupting event_init. Fix this by using 'local_irq_save/restore'
to make sure the code path is invoked with disabled interrupts.

This change is tested in mambo simulator to ensure that, if a timer
interrupt comes in during the per-cpu access in event_init, it will be
soft masked and replayed later. For testing purpose, introduced a
udelay() in power_pmu_event_init() to make sure a timer interrupt arrives
while in per-cpu variable access code between local_irq_save/resore.
As expected the timer interrupt was replayed later during local_irq_restore
called from power_pmu_event_init. This was confirmed by adding
breakpoint in mambo and checking the backtrace when timer_interrupt
was hit.
Reported-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NAthira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1606814880-1720-1-git-send-email-atrajeev@linux.vnet.ibm.comSigned-off-by: NYang Jihong <yangjihong1@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

164b8c99

perf annotate: Fix jump parsing for C++ code. · 0a32c351

由 Martin Liška 提交于 6月 10, 2021

mainline inclusion
from mainline-5.12-rc1
commit 1f0e6edc
category: bugfix
bugzilla: 107381
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f0e6edcd968ff19211245f7da6039e983aa51e5

---------------------------

Considering the following testcase:

  int
  foo(int a, int b)
  {
     for (unsigned i = 0; i < 1000000000; i++)
       a += b;
     return a;
  }

  int main()
  {
     foo (3, 4);
     return 0;
  }

'perf annotate' displays:

  86.52 │40055e: → ja   40056c <foo(int, int)+0x26>
  13.37 │400560:   mov  -0x18(%rbp),%eax
        │400563:   add  %eax,-0x14(%rbp)
        │400566:   addl $0x1,-0x4(%rbp)
   0.11 │40056a: → jmp  400557 <foo(int, int)+0x11>
        │40056c:   mov  -0x14(%rbp),%eax
        │40056f:   pop  %rbp

and the 'ja 40056c' does not link to the location in the function.  It's
caused by fact that comma is wrongly parsed, it's part of function
signature.

With my patch I see:

  86.52 │   ┌──ja   26
  13.37 │   │  mov  -0x18(%rbp),%eax
        │   │  add  %eax,-0x14(%rbp)
        │   │  addl $0x1,-0x4(%rbp)
   0.11 │   │↑ jmp  11
        │26:└─→mov  -0x14(%rbp),%eax

and 'o' output prints:

  86.52 │4005┌── ↓ ja   40056c <foo(int, int)+0x26>
  13.37 │4005│0:   mov  -0x18(%rbp),%eax
        │4005│3:   add  %eax,-0x14(%rbp)
        │4005│6:   addl $0x1,-0x4(%rbp)
   0.11 │4005│a: ↑ jmp  400557 <foo(int, int)+0x11>
        │4005└─→   mov  -0x14(%rbp),%eax

On the contrary, compiling the very same file with gcc -x c, the parsing
is fine because function arguments are not displayed:

  jmp  400543 <foo+0x1d>

Committer testing:

Before:

  $ cat cpp_args_annotate.c
  int
  foo(int a, int b)
  {
     for (unsigned i = 0; i < 1000000000; i++)
       a += b;
     return a;
  }

  int main()
  {
     foo (3, 4);
     return 0;
  }
  $ gcc --version |& head -1
  gcc (GCC) 10.2.1 20201125 (Red Hat 10.2.1-9)
  $ gcc -g cpp_args_annotate.c -o cpp_args_annotate
  $ perf record ./cpp_args_annotate
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 0.275 MB perf.data (7188 samples) ]
  $ perf annotate --stdio2 foo
  Samples: 7K of event 'cycles:u', 4000 Hz, Event count (approx.): 7468429289, [percent: local period]
  foo() /home/acme/c/cpp_args_annotate
  Percent
              0000000000401106 <foo>:
              foo():
              int
              foo(int a, int b)
              {
                push %rbp
                mov  %rsp,%rbp
                mov  %edi,-0x14(%rbp)
                mov  %esi,-0x18(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                movl $0x0,-0x4(%rbp)
              ↓ jmp  1d
              a += b;
   13.45  13:   mov  -0x18(%rbp),%eax
                add  %eax,-0x14(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                addl $0x1,-0x4(%rbp)
    0.09  1d:   cmpl $0x3b9ac9ff,-0x4(%rbp)
   86.46      ↑ jbe  13
              return a;
                mov  -0x14(%rbp),%eax
              }
                pop  %rbp
              ← retq
  $

I.e. works for C, now lets switch to C++:

  $ g++ -g cpp_args_annotate.c -o cpp_args_annotate
  $ perf record ./cpp_args_annotate
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.268 MB perf.data (6976 samples) ]
  $ perf annotate --stdio2 foo
  Samples: 6K of event 'cycles:u', 4000 Hz, Event count (approx.): 7380681761, [percent: local period]
  foo() /home/acme/c/cpp_args_annotate
  Percent
              0000000000401106 <foo(int, int)>:
              foo(int, int):
              int
              foo(int a, int b)
              {
                push %rbp
                mov  %rsp,%rbp
                mov  %edi,-0x14(%rbp)
                mov  %esi,-0x18(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                movl $0x0,-0x4(%rbp)
                cmpl $0x3b9ac9ff,-0x4(%rbp)
   86.53      → ja   40112c <foo(int, int)+0x26>
              a += b;
   13.32        mov  -0x18(%rbp),%eax
    0.00        add  %eax,-0x14(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                addl $0x1,-0x4(%rbp)
    0.15      → jmp  401117 <foo(int, int)+0x11>
              return a;
                mov  -0x14(%rbp),%eax
              }
                pop  %rbp
              ← retq
  $

Reproduced.

Now with this patch:

Reusing the C++ built binary, as we can see here:

  $ readelf -wi cpp_args_annotate | grep producer
    <c>   DW_AT_producer    : (indirect string, offset: 0x2e): GNU C++14 10.2.1 20201125 (Red Hat 10.2.1-9) -mtune=generic -march=x86-64 -g
  $

And furthermore:

  $ file cpp_args_annotate
  cpp_args_annotate: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4fe3cab260204765605ec630d0dc7a7e93c361a9, for GNU/Linux 3.2.0, with debug_info, not stripped
  $ perf buildid-list -i cpp_args_annotate
  4fe3cab260204765605ec630d0dc7a7e93c361a9
  $ perf buildid-list | grep cpp_args_annotate
  4fe3cab260204765605ec630d0dc7a7e93c361a9 /home/acme/c/cpp_args_annotate
  $

It now works:

  $ perf annotate --stdio2 foo
  Samples: 6K of event 'cycles:u', 4000 Hz, Event count (approx.): 7380681761, [percent: local period]
  foo() /home/acme/c/cpp_args_annotate
  Percent
              0000000000401106 <foo(int, int)>:
              foo(int, int):
              int
              foo(int a, int b)
              {
                push %rbp
                mov  %rsp,%rbp
                mov  %edi,-0x14(%rbp)
                mov  %esi,-0x18(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                movl $0x0,-0x4(%rbp)
          11:   cmpl $0x3b9ac9ff,-0x4(%rbp)
   86.53      ↓ ja   26
              a += b;
   13.32        mov  -0x18(%rbp),%eax
    0.00        add  %eax,-0x14(%rbp)
              for (unsigned i = 0; i < 1000000000; i++)
                addl $0x1,-0x4(%rbp)
    0.15      ↑ jmp  11
              return a;
          26:   mov  -0x14(%rbp),%eax
              }
                pop  %rbp
              ← retq
  $
Signed-off-by: NMartin Liška <mliska@suse.cz>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Link: http://lore.kernel.org/lkml/13e1a405-edf9-e4c2-4327-a9b454353730@suse.czSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0a32c351

perf tools: Fix arm64 build error with gcc-11 · e97ad706

由 Jianlin Lv 提交于 6月 10, 2021

mainline inclusion
from mainline-5.12-rc1
commit 06701297
category: bugfix
bugzilla: 107300
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=067012974c8ae31a8886046df082aeba93592972

---------------------------

gcc version: 11.0.0 20210208 (experimental) (GCC)

Following build error on arm64:

.......
In function ‘printf’,
    inlined from ‘regs_dump__printf’ at util/session.c:1141:3,
    inlined from ‘regs__printf’ at util/session.c:1169:2:
/usr/include/aarch64-linux-gnu/bits/stdio2.h:107:10: \
  error: ‘%-5s’ directive argument is null [-Werror=format-overflow=]

107 |   return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, \
                __va_arg_pack ());

......
In function ‘fprintf’,
  inlined from ‘perf_sample__fprintf_regs.isra’ at \
    builtin-script.c:622:14:
/usr/include/aarch64-linux-gnu/bits/stdio2.h:100:10: \
    error: ‘%5s’ directive argument is null [-Werror=format-overflow=]
  100 |   return __fprintf_chk (__stream, __USE_FORTIFY_LEVEL - 1, __fmt,
  101 |                         __va_arg_pack ());

cc1: all warnings being treated as errors
.......

This patch fixes Wformat-overflow warnings. Add helper function to
convert NULL to "unknown".
Signed-off-by: NJianlin Lv <Jianlin.Lv@arm.com>
Reviewed-by: NJohn Garry <john.garry@huawei.com>
Acked-by: NJiri Olsa <jolsa@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: iecedge@gmail.com
Cc: linux-csky@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Link: http://lore.kernel.org/lkml/20210218031245.2078492-1-Jianlin.Lv@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e97ad706

perf record: Fix memory leak in vDSO found using ASAN · 4f4a0593

由 Namhyung Kim 提交于 6月 10, 2021

mainline inclusion
from mainline-5.12-rc5
commit 41d58541
category: bugfix
bugzilla: 107103
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=41d585411311abf187e5f09042978fe7073a9375

---------------------------

I got several memory leak reports from Asan with a simple command.  It
was because VDSO is not released due to the refcount.  Like in
__dsos_addnew_id(), it should put the refcount after adding to the list.

  $ perf record true
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.030 MB perf.data (10 samples) ]

  =================================================================
  ==692599==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 439 byte(s) in 1 object(s) allocated from:
    #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559bce4aa8ee in dso__new_id util/dso.c:1256
    #2 0x559bce59245a in __machine__addnew_vdso util/vdso.c:132
    #3 0x559bce59245a in machine__findnew_vdso util/vdso.c:347
    #4 0x559bce50826c in map__new util/map.c:175
    #5 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
    #6 0x559bce512f6b in machines__deliver_event util/session.c:1481
    #7 0x559bce515107 in perf_session__deliver_event util/session.c:1551
    #8 0x559bce51d4d2 in do_flush util/ordered-events.c:244
    #9 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
    #10 0x559bce519bea in __perf_session__process_events util/session.c:2268
    #11 0x559bce519bea in perf_session__process_events util/session.c:2297
    #12 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
    #13 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
    #14 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
    #15 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
    #16 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #17 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #18 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #19 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #20 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308

  Indirect leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7fea52341037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x559bce520907 in nsinfo__copy util/namespaces.c:169
    #2 0x559bce50821b in map__new util/map.c:168
    #3 0x559bce503c92 in machine__process_mmap2_event util/machine.c:1787
    #4 0x559bce512f6b in machines__deliver_event util/session.c:1481
    #5 0x559bce515107 in perf_session__deliver_event util/session.c:1551
    #6 0x559bce51d4d2 in do_flush util/ordered-events.c:244
    #7 0x559bce51d4d2 in __ordered_events__flush util/ordered-events.c:323
    #8 0x559bce519bea in __perf_session__process_events util/session.c:2268
    #9 0x559bce519bea in perf_session__process_events util/session.c:2297
    #10 0x559bce2e7a52 in process_buildids /home/namhyung/project/linux/tools/perf/builtin-record.c:1017
    #11 0x559bce2e7a52 in record__finish_output /home/namhyung/project/linux/tools/perf/builtin-record.c:1234
    #12 0x559bce2ed4f6 in __cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2026
    #13 0x559bce2ed4f6 in cmd_record /home/namhyung/project/linux/tools/perf/builtin-record.c:2858
    #14 0x559bce422db4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:313
    #15 0x559bce2acac8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:365
    #16 0x559bce2acac8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:409
    #17 0x559bce2acac8 in main /home/namhyung/project/linux/tools/perf/perf.c:539
    #18 0x7fea51e76d09 in __libc_start_main ../csu/libc-start.c:308

  SUMMARY: AddressSanitizer: 471 byte(s) leaked in 2 allocation(s).
Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
Acked-by: NJiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210315045641.700430-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4f4a0593

perf parse-events: Check if the software events array slots are populated · 7ec638a2

由 Arnaldo Carvalho de Melo 提交于 6月 10, 2021

mainline inclusion
from mainline-5.13-rc3
commit 3b2f17ad
category: bugfix
bugzilla: 78520
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3b2f17ad1770e51b8b4e68b5069c4f1ee477eff8

---------------------------

To avoid a NULL pointer dereference when the kernel supports the new
feature but the tooling still hasn't an entry for it.

This happened with the recently added PERF_COUNT_SW_CGROUP_SWITCHES
software event.
Reported-by: NThomas Richter <tmricht@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: https://lore.kernel.org/linux-perf-users/YKVESEKRjKtILhog@kernel.org/Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7ec638a2

perf symbol-elf: Fix memory leak by freeing sdt_note.args · 76421a96

由 Riccardo Mancini 提交于 6月 10, 2021

mainline inclusion
from mainline-5.13-rc4
commit 69c9ffed
category: bugfix
bugzilla: 78200
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=69c9ffed6cede9c11697861f654946e3ae95a930

---------------------------

Reported by ASan.
Signed-off-by: NRiccardo Mancini <rickyman7@gmail.com>
Acked-by: NIan Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Fabian Hemmer <copy@copy.sh>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Remi Bernon <rbernon@codeweavers.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Link: http://lore.kernel.org/lkml/20210602220833.285226-1-rickyman7@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

76421a96

perf env: Fix memory leak of bpf_prog_info_linear member · 245506ad

由 Riccardo Mancini 提交于 6月 10, 2021

mainline inclusion
from mainline-5.13-rc4
commit 67069a1f
category: bugfix
bugzilla: 78194
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=67069a1f0fe5f9eeca86d954fff2087f5542a008

---------------------------

ASan reported a memory leak caused by info_linear not being deallocated.

The info_linear was allocated during in perf_event__synthesize_one_bpf_prog().

This patch adds the corresponding free() when bpf_prog_info_node
is freed in perf_env__purge_bpf().

  $ sudo ./perf record -- sleep 5
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.025 MB perf.data (8 samples) ]

  =================================================================
  ==297735==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 7688 byte(s) in 19 object(s) allocated from:
      #0 0x4f420f in malloc (/home/user/linux/tools/perf/perf+0x4f420f)
      #1 0xc06a74 in bpf_program__get_prog_info_linear /home/user/linux/tools/lib/bpf/libbpf.c:11113:16
      #2 0xb426fe in perf_event__synthesize_one_bpf_prog /home/user/linux/tools/perf/util/bpf-event.c:191:16
      #3 0xb42008 in perf_event__synthesize_bpf_events /home/user/linux/tools/perf/util/bpf-event.c:410:9
      #4 0x594596 in record__synthesize /home/user/linux/tools/perf/builtin-record.c:1490:8
      #5 0x58c9ac in __cmd_record /home/user/linux/tools/perf/builtin-record.c:1798:8
      #6 0x58990b in cmd_record /home/user/linux/tools/perf/builtin-record.c:2901:8
      #7 0x7b2a20 in run_builtin /home/user/linux/tools/perf/perf.c:313:11
      #8 0x7b12ff in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8
      #9 0x7b2583 in run_argv /home/user/linux/tools/perf/perf.c:409:2
      #10 0x7b0d79 in main /home/user/linux/tools/perf/perf.c:539:3
      #11 0x7fa357ef6b74 in __libc_start_main /usr/src/debug/glibc-2.33-8.fc34.x86_64/csu/../csu/libc-start.c:332:16
Signed-off-by: NRiccardo Mancini <rickyman7@gmail.com>
Acked-by: NIan Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lore.kernel.org/lkml/20210602224024.300485-1-rickyman7@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

245506ad

scsi: iscsi: Fix iSCSI cls conn state · 3e776ff4

由 Mike Christie 提交于 6月 11, 2021

mainline inclusion
from mainline-v5.12-rc8
commit 0dcf8feb
category: bugfix
bugzilla: 107093
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0dcf8febcb7b9d42bec98bc068e01d1a6ea578b8

-----------------------------------------------

In commit 9e67600e ("scsi: iscsi: Fix race condition between login and
sync thread") I missed that libiscsi was now setting the iSCSI class state,
and that patch ended up resetting the state during conn stoppage and using
the wrong state value during ep_disconnect. This patch moves the setting of
the class state to the class module and then fixes the two issues above.

Link: https://lore.kernel.org/r/20210406171746.5016-1-michael.christie@oracle.com
Fixes: 9e67600e ("scsi: iscsi: Fix race condition between login and sync thread")
Cc: Gulam Mohamed <gulam.mohamed@oracle.com>
Signed-off-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3e776ff4

scsi: iscsi: Fix race condition between login and sync thread · 367d3770

由 Gulam Mohamed 提交于 6月 11, 2021

mainline inclusion
from mainline-v5.12-rc6
commit 9e67600e
category: bugfix
bugzilla: 107093
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9e67600ed6b8565da4b85698ec659b5879a6c1c6

-----------------------------------------------

A kernel panic was observed due to a timing issue between the sync thread
and the initiator processing a login response from the target. The session
reopen can be invoked both from the session sync thread when iscsid
restarts and from iscsid through the error handler. Before the initiator
receives the response to a login, another reopen request can be sent from
the error handler/sync session. When the initial login response is
subsequently processed, the connection has been closed and the socket has
been released.

To fix this a new connection state, ISCSI_CONN_BOUND, is added:

 - Set the connection state value to ISCSI_CONN_DOWN upon
   iscsi_if_ep_disconnect() and iscsi_if_stop_conn()

 - Set the connection state to the newly created value ISCSI_CONN_BOUND
   after bind connection (transport->bind_conn())

 - In iscsi_set_param(), return -ENOTCONN if the connection state is not
   either ISCSI_CONN_BOUND or ISCSI_CONN_UP

Link: https://lore.kernel.org/r/20210325093248.284678-1-gulam.mohamed@oracle.comReviewed-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NGulam Mohamed <gulam.mohamed@oracle.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

index 91074fd97f64..f4bf62b007a0 100644
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

367d3770

Revert "perf kmem: Do not pass additional arguments · e441298f

由 Li Huafei 提交于 6月 09, 2021

to 'perf record'"

hulk inclusion
category: bugfix
bugzilla: 51797
CVE: NA

--------------------------------

This reverts commit 5549b4660d62946828db854252e5fb66e6007e88.

There should be no problem with mainline. Follow the prompts given
by 'perf kmem -h' to use it.
Signed-off-by: NLi Huafei <lihuafei1@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e441298f

15 6月, 2021 26 次提交

neighbour: allow NUD_NOARP entries to be forced GCed · 0a2c3dde

由 David Ahern 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit d17d47da59f726dc4c87caebda3a50333d7e2fd3
bugzilla: 109284
CVE: NA

--------------------------------

commit 7a6b1ab7 upstream.

IFF_POINTOPOINT interfaces use NUD_NOARP entries for IPv6. It's possible to
fill up the neighbour table with enough entries that it will overflow for
valid connections after that.

This behaviour is more prevalent after commit 58956317 ("neighbor:
Improve garbage collection") is applied, as it prevents removal from
entries that are not NUD_FAILED, unless they are more than 5s old.

Fixes: 58956317 (neighbor: Improve garbage collection)
Reported-by: NKasper Dupont <kasperd@gjkwv.06.feb.2021.kasperd.net>
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0a2c3dde

xen-netback: take a reference to the RX task thread · 424eedfc

由 Roger Pau Monne 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 6b53db8c4c14b4e7256f058d202908b54a7b85b4
bugzilla: 109284
CVE: NA

--------------------------------

commit 107866a8 upstream.

Do this in order to prevent the task from being freed if the thread
returns (which can be triggered by the frontend) before the call to
kthread_stop done as part of the backend tear down. Not taking the
reference will lead to a use-after-free in that scenario. Such
reference was taken before but dropped as part of the rework done in
2ac061ce.

Reintroduce the reference taking and add a comment this time
explaining why it's needed.

This is XSA-374 / CVE-2021-28691.

Fixes: 2ac061ce ('xen/netback: cleanup init and deinit code')
Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
Cc: stable@vger.kernel.org
Reviewed-by: NJan Beulich <jbeulich@suse.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

424eedfc

netfilter: nf_tables: missing error reporting for not selected expressions · 9ec4c9eb

由 Pablo Neira Ayuso 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 316de9a88c83c672c18d35bd76034d84e3769fe9
bugzilla: 109284
CVE: NA

--------------------------------

commit c781471d upstream.

Sometimes users forget to turn on nftables extensions from Kconfig that
they need. In such case, the error reporting from userspace is
misleading:

 $ sudo nft add rule x y counter
 Error: Could not process rule: No such file or directory
 add rule x y counter
 ^^^^^^^^^^^^^^^^^^^^

Add missing NL_SET_BAD_ATTR() to provide a hint:

 $ nft add rule x y counter
 Error: Could not process rule: No such file or directory
 add rule x y counter
              ^^^^^^^

Fixes: 83d9dcba ("netfilter: nf_tables: extended netlink error reporting for expressions")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9ec4c9eb

i2c: qcom-geni: Suspend and resume the bus during SYSTEM_SLEEP_PM ops · 3c5fe07e

由 Roja Rani Yarubandi 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit eddf2d9f76b01201dd778f2d36d75b8050217cf7
bugzilla: 109284
CVE: NA

--------------------------------

commit 57648e86 upstream.

Mark bus as suspended during system suspend to block the future
transfers. Implement geni_i2c_resume_noirq() to resume the bus.

Fixes: 37692de5 ("i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller")
Signed-off-by: NRoja Rani Yarubandi <rojay@codeaurora.org>
Reviewed-by: NStephen Boyd <swboyd@chromium.org>
Signed-off-by: NWolfram Sang <wsa@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3c5fe07e

lib/lz4: explicitly support in-place decompression · 49ac5146

由 Gao Xiang 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit f20eef4d068637dc48ed24887ebc7b1faa860ae5
bugzilla: 109284
CVE: NA

--------------------------------

commit 89b15863 upstream.

LZ4 final literal copy could be overlapped when doing
in-place decompression, so it's unsafe to just use memcpy()
on an optimized memcpy approach but memmove() instead.

Upstream LZ4 has updated this years ago [1] (and the impact
is non-sensible [2] plus only a few bytes remain), this commit
just synchronizes LZ4 upstream code to the kernel side as well.

It can be observed as EROFS in-place decompression failure
on specific files when X86_FEATURE_ERMS is unsupported,
memcpy() optimization of commit 59daa706 ("x86, mem:
Optimize memcpy by avoiding memory false dependece") will
be enabled then.

Currently most modern x86-CPUs support ERMS, these CPUs just
use "rep movsb" approach so no problem at all. However, it can
still be verified with forcely disabling ERMS feature...

arch/x86/lib/memcpy_64.S:
        ALTERNATIVE_2 "jmp memcpy_orig", "", X86_FEATURE_REP_GOOD, \
-                     "jmp memcpy_erms", X86_FEATURE_ERMS
+                     "jmp memcpy_orig", X86_FEATURE_ERMS

We didn't observe any strange on arm64/arm/x86 platform before
since most memcpy() would behave in an increasing address order
("copy upwards" [3]) and it's the correct order of in-place
decompression but it really needs an update to memmove() for sure
considering it's an undefined behavior according to the standard
and some unique optimization already exists in the kernel.

[1] https://github.com/lz4/lz4/commit/33cb8518ac385835cc17be9a770b27b40cd0e15b
[2] https://github.com/lz4/lz4/pull/717#issuecomment-497818921
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=12518

Link: https://lkml.kernel.org/r/20201122030749.2698994-1-hsiangkao@redhat.comSigned-off-by: NGao Xiang <hsiangkao@redhat.com>
Reviewed-by: NNick Terrell <terrelln@fb.com>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Li Guifu <bluce.liguifu@huawei.com>
Cc: Guo Xuenan <guoxuenan@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

49ac5146

x86/kvm: Disable all PV features on crash · 1218cfc2

由 Vitaly Kuznetsov 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 334c59d58de5faf449d9c9feaa8c50dd8b4046a7
bugzilla: 109284
CVE: NA

--------------------------------

commit 3d6b8413 upstream.

Crash shutdown handler only disables kvmclock and steal time, other PV
features remain active so we risk corrupting memory or getting some
side-effects in kdump kernel. Move crash handler to kvm.c and unify
with CPU offline.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-5-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1218cfc2

x86/kvm: Disable kvmclock on all CPUs on shutdown · 44b95584

由 Vitaly Kuznetsov 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 3b0becf8b1ecf642a9edaf4c9628ffc641e490d6
bugzilla: 109284
CVE: NA

--------------------------------

commit c02027b5 upstream.

Currenly, we disable kvmclock from machine_shutdown() hook and this
only happens for boot CPU. We need to disable it for all CPUs to
guard against memory corruption e.g. on restore from hibernate.

Note, writing '0' to kvmclock MSR doesn't clear memory location, it
just prevents hypervisor from updating the location so for the short
while after write and while CPU is still alive, the clock remains usable
and correct so we don't need to switch to some other clocksource.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-4-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAndrea Righi <andrea.righi@canonical.com>
Signed-off-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

44b95584

x86/kvm: Teardown PV features on boot CPU as well · a44ed230

由 Vitaly Kuznetsov 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 38b858da1c58ad46519a257764e059e663b59ff2
bugzilla: 109284
CVE: NA

--------------------------------

commit 8b79feff upstream.

Various PV features (Async PF, PV EOI, steal time) work through memory
shared with hypervisor and when we restore from hibernation we must
properly teardown all these features to make sure hypervisor doesn't
write to stale locations after we jump to the previously hibernated kernel
(which can try to place anything there). For secondary CPUs the job is
already done by kvm_cpu_down_prepare(), register syscore ops to do
the same for boot CPU.

Krzysztof:
This fixes memory corruption visible after second resume from
hibernation:

  BUG: Bad page state in process dbus-daemon  pfn:18b01
  page:ffffea000062c040 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 compound_mapcount: -30591
  flags: 0xfffffc0078141(locked|error|workingset|writeback|head|mappedtodisk|reclaim)
  raw: 000fffffc0078141 dead0000000002d0 dead000000000100 0000000000000000
  raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
  bad because of flags: 0x78141(locked|error|workingset|writeback|head|mappedtodisk|reclaim)
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-3-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAndrea Righi <andrea.righi@canonical.com>
[krzysztof: Extend the commit message, adjust for v5.10 context]
Signed-off-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a44ed230

KVM: arm64: Fix debug register indexing · 81baccea

由 Marc Zyngier 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit b327c97747595b462a003a11e6728ebd860cd285
bugzilla: 109284
CVE: NA

--------------------------------

commit cb853ded upstream.

Commit 03fdfb26 ("KVM: arm64: Don't write junk to sysregs on
reset") flipped the register number to 0 for all the debug registers
in the sysreg table, hereby indicating that these registers live
in a separate shadow structure.

However, the author of this patch failed to realise that all the
accessors are using that particular index instead of the register
encoding, resulting in all the registers hitting index 0. Not quite
a valid implementation of the architecture...

Address the issue by fixing all the accessors to use the CRm field
of the encoding, which contains the debug register index.

Fixes: 03fdfb26 ("KVM: arm64: Don't write junk to sysregs on reset")
Reported-by: NRicardo Koller <ricarkol@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

81baccea

KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode · 03033fda

由 Sean Christopherson 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit b3ee3f50ab1bf7b60ba4a8346dca05ba3412fead
bugzilla: 109284
CVE: NA

--------------------------------

commit 0884335a upstream.

Drop bits 63:32 on loads/stores to/from DRs and CRs when the vCPU is not
in 64-bit mode.  The APM states bits 63:32 are dropped for both DRs and
CRs:

  In 64-bit mode, the operand size is fixed at 64 bits without the need
  for a REX prefix. In non-64-bit mode, the operand size is fixed at 32
  bits and the upper 32 bits of the destination are forced to 0.

Fixes: 7ff76d58 ("KVM: SVM: enhance MOV CR intercept handler")
Fixes: cae3797a ("KVM: SVM: enhance mov DR intercept handler")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210422022128.3464144-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

03033fda

btrfs: fix unmountable seed device after fstrim · 3ea08020

由 Anand Jain 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit fe910d20e2d8e0736bbea9c1efe6a49535e807ea
bugzilla: 109284
CVE: NA

--------------------------------

commit 5e753a81 upstream.

The following test case reproduces an issue of wrongly freeing in-use
blocks on the readonly seed device when fstrim is called on the rw sprout
device. As shown below.

Create a seed device and add a sprout device to it:

  $ mkfs.btrfs -fq -dsingle -msingle /dev/loop0
  $ btrfstune -S 1 /dev/loop0
  $ mount /dev/loop0 /btrfs
  $ btrfs dev add -f /dev/loop1 /btrfs
  BTRFS info (device loop0): relocating block group 290455552 flags system
  BTRFS info (device loop0): relocating block group 1048576 flags system
  BTRFS info (device loop0): disk added /dev/loop1
  $ umount /btrfs

Mount the sprout device and run fstrim:

  $ mount /dev/loop1 /btrfs
  $ fstrim /btrfs
  $ umount /btrfs

Now try to mount the seed device, and it fails:

  $ mount /dev/loop0 /btrfs
  mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.

Block 5292032 is missing on the readonly seed device:

 $ dmesg -kt | tail
 <snip>
 BTRFS error (device loop0): bad tree block start, want 5292032 have 0
 BTRFS warning (device loop0): couldn't read-tree root
 BTRFS error (device loop0): open_ctree failed

>From the dump-tree of the seed device (taken before the fstrim). Block
5292032 belonged to the block group starting at 5242880:

  $ btrfs inspect dump-tree -e /dev/loop0 | grep -A1 BLOCK_GROUP
  <snip>
  item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24
  	block group used 114688 chunk_objectid 256 flags METADATA
  <snip>

>From the dump-tree of the sprout device (taken before the fstrim).
fstrim used block-group 5242880 to find the related free space to free:

  $ btrfs inspect dump-tree -e /dev/loop1 | grep -A1 BLOCK_GROUP
  <snip>
  item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24
  	block group used 32768 chunk_objectid 256 flags METADATA
  <snip>

BPF kernel tracing the fstrim command finds the missing block 5292032
within the range of the discarded blocks as below:

  kprobe:btrfs_discard_extent {
  	printf("freeing start %llu end %llu num_bytes %llu:\n",
  		arg1, arg1+arg2, arg2);
  }

  freeing start 5259264 end 5406720 num_bytes 147456
  <snip>

Fix this by avoiding the discard command to the readonly seed device.
Reported-by: NChris Murphy <lists@colorremedies.com>
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3ea08020

drm/msm/dpu: always use mdp device to scale bandwidth · 8e6eb409

由 Dmitry Baryshkov 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 05e41f6f1c4e8c42edb9715b6629d9ab2af61064
bugzilla: 109284
CVE: NA

--------------------------------

commit a670ff57 upstream.

Currently DPU driver scales bandwidth and core clock for sc7180 only,
while the rest of chips get static bandwidth votes. Make all chipsets
scale bandwidth and clock per composition requirements like sc7180 does.
Drop old voting path completely.

Tested on RB3 (SDM845) and RB5 (SM8250).
Signed-off-by: NDmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://lore.kernel.org/r/20210401020533.3956787-2-dmitry.baryshkov@linaro.orgSigned-off-by: NRob Clark <robdclark@chromium.org>
Signed-off-by: NAmit Pundir <amit.pundir@linaro.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8e6eb409

mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY · 82ac4173

由 Mina Almasry 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 2eb4ec9c2c3535b9755c484183cc5c4d90fd37ff
bugzilla: 109284
CVE: NA

--------------------------------

[ Upstream commit d84cf06e ]

The userfaultfd hugetlb tests cause a resv_huge_pages underflow.  This
happens when hugetlb_mcopy_atomic_pte() is called with !is_continue on
an index for which we already have a page in the cache.  When this
happens, we allocate a second page, double consuming the reservation,
and then fail to insert the page into the cache and return -EEXIST.

To fix this, we first check if there is a page in the cache which
already consumed the reservation, and return -EEXIST immediately if so.

There is still a rare condition where we fail to copy the page contents
AND race with a call for hugetlb_no_page() for this index and again we
will underflow resv_huge_pages.  That is fixed in a more complicated
patch not targeted for -stable.

Test:

  Hacked the code locally such that resv_huge_pages underflows produce a
  warning, then:

  ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10
	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
  ./tools/testing/selftests/vm/userfaultfd hugetlb 10
	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success

Both tests succeed and produce no warnings.  After the test runs number
of free/resv hugepages is correct.

[mike.kravetz@oracle.com: changelog fixes]

Link: https://lkml.kernel.org/r/20210528004649.85298-1-almasrymina@google.com
Fixes: 8fb5debc ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: NMina Almasry <almasrymina@google.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

82ac4173

btrfs: fix deadlock when cloning inline extents and low on available space · 7abe8761

由 Filipe Manana 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit baa6763123e2b63b8289943c7211ba0e3220432f
bugzilla: 109284
CVE: NA

--------------------------------

commit 76a6d5cd upstream.

There are a few cases where cloning an inline extent requires copying data
into a page of the destination inode. For these cases we are allocating
the required data and metadata space while holding a leaf locked. This can
result in a deadlock when we are low on available space because allocating
the space may flush delalloc and two deadlock scenarios can happen:

1) When starting writeback for an inode with a very small dirty range that
   fits in an inline extent, we deadlock during the writeback when trying
   to insert the inline extent, at cow_file_range_inline(), if the extent
   is going to be located in the leaf for which we are already holding a
   read lock;

2) After successfully starting writeback, for non-inline extent cases,
   the async reclaim thread will hang waiting for an ordered extent to
   complete if the ordered extent completion needs to modify the leaf
   for which the clone task is holding a read lock (for adding or
   replacing file extent items). So the cloning task will wait forever
   on the async reclaim thread to make progress, which in turn is
   waiting for the ordered extent completion which in turn is waiting
   to acquire a write lock on the same leaf.

So fix this by making sure we release the path (and therefore the leaf)
every time we need to copy the inline extent's data into a page of the
destination inode, as by that time we do not need to have the leaf locked.

Fixes: 05a5a762 ("Btrfs: implement full reflink support for inline extents")
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7abe8761

btrfs: abort in rename_exchange if we fail to insert the second ref · 21714f16

由 Josef Bacik 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 0df50d47d17401f9f140dfbe752a65e5d72f9932
bugzilla: 109284
CVE: NA

--------------------------------

commit dc09ef35 upstream.

Error injection stress uncovered a problem where we'd leave a dangling
inode ref if we failed during a rename_exchange.  This happens because
we insert the inode ref for one side of the rename, and then for the
other side.  If this second inode ref insert fails we'll leave the first
one dangling and leave a corrupt file system behind.  Fix this by
aborting if we did the insert for the first inode ref.

CC: stable@vger.kernel.org # 4.9+
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

21714f16

btrfs: fixup error handling in fixup_inode_link_counts · f7a793f2

由 Josef Bacik 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 48568f3944ee7357e8fed394804745bd981e978a
bugzilla: 109284
CVE: NA

--------------------------------

commit 011b28ac upstream.

This function has the following pattern

	while (1) {
		ret = whatever();
		if (ret)
			goto out;
	}
	ret = 0
out:
	return ret;

However several places in this while loop we simply break; when there's
a problem, thus clearing the return value, and in one case we do a
return -EIO, and leak the memory for the path.

Fix this by re-arranging the loop to deal with ret == 1 coming from
btrfs_search_slot, and then simply delete the

	ret = 0;
out:

bit so everybody can break if there is an error, which will allow for
proper error handling to occur.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f7a793f2

btrfs: return errors from btrfs_del_csums in cleanup_ref_head · 7ef76414

由 Josef Bacik 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 466d83fdbbe345f3cfd5f7b2633f740ecad67853
bugzilla: 109284
CVE: NA

--------------------------------

commit 856bd270 upstream.

We are unconditionally returning 0 in cleanup_ref_head, despite the fact
that btrfs_del_csums could fail.  We need to return the error so the
transaction gets aborted properly, fix this by returning ret from
btrfs_del_csums in cleanup_ref_head.
Reviewed-by: NQu Wenruo <wqu@suse.com>
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7ef76414

btrfs: fix error handling in btrfs_del_csums · b71bc6fb

由 Josef Bacik 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 5a89982fa2bba459b82323655df986945a853bbe
bugzilla: 109284
CVE: NA

--------------------------------

commit b86652be upstream.

Error injection stress would sometimes fail with checksums on disk that
did not have a corresponding extent.  This occurred because the pattern
in btrfs_del_csums was

	while (1) {
		ret = btrfs_search_slot();
		if (ret < 0)
			break;
	}
	ret = 0;
out:
	btrfs_free_path(path);
	return ret;

If we got an error from btrfs_search_slot we'd clear the error because
we were breaking instead of goto out.  Instead of using goto out, simply
handle the cases where we may leave a random value in ret, and get rid
of the

	ret = 0;
out:

pattern and simply allow break to have the proper error reporting.  With
this fix we properly abort the transaction and do not commit thinking we
successfully deleted the csum.
Reviewed-by: NQu Wenruo <wqu@suse.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b71bc6fb

btrfs: mark ordered extent and inode with error if we fail to finish · d005f5ff

由 Josef Bacik 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit b547a16b24918edd63042f9d81c0d310212d2e94
bugzilla: 109284
CVE: NA

--------------------------------

commit d61bec08 upstream.

While doing error injection testing I saw that sometimes we'd get an
abort that wouldn't stop the current transaction commit from completing.
This abort was coming from finish ordered IO, but at this point in the
transaction commit we should have gotten an error and stopped.

It turns out the abort came from finish ordered io while trying to write
out the free space cache.  It occurred to me that any failure inside of
finish_ordered_io isn't actually raised to the person doing the writing,
so we could have any number of failures in this path and think the
ordered extent completed successfully and the inode was fine.

Fix this by marking the ordered extent with BTRFS_ORDERED_IOERR, and
marking the mapping of the inode with mapping_set_error, so any callers
that simply call fdatawait will also get the error.

With this we're seeing the IO error on the free space inode when we fail
to do the finish_ordered_io.

CC: stable@vger.kernel.org # 4.19+
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d005f5ff

powerpc/kprobes: Fix validation of prefixed instructions across page boundary · 72c88ac7

由 Naveen N. Rao 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 5e5e63bacbe8f1ef9688e7804275eb88cf0be51a
bugzilla: 109284
CVE: NA

--------------------------------

commit 82123a3d upstream.

When checking if the probed instruction is the suffix of a prefixed
instruction, we access the instruction at the previous word. If the
probed instruction is the very first word of a module, we can end up
trying to access an invalid page.

Fix this by skipping the check for all instructions at the beginning of
a page. Prefixed instructions cannot cross a 64-byte boundary and as
such, we don't expect to encounter a suffix as the very first word in a
page for kernel text. Even if there are prefixed instructions crossing
a page boundary (from a module, for instance), the instruction will be
illegal, so preventing probing on the suffix of such prefix instructions
isn't worthwhile.

Fixes: b4657f76 ("powerpc/kprobes: Don't allow breakpoints on suffixes")
Cc: stable@vger.kernel.org # v5.8+
Reported-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0df9a032a05576a2fa8e97d1b769af2ff0eafbd6.1621416666.git.naveen.n.rao@linux.vnet.ibm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

72c88ac7

x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing · 2c048ec2

由 Thomas Gleixner 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 42f75a4381a4ffb1b7488f90c657ea0b5461d3b7
bugzilla: 109284
CVE: NA

--------------------------------

commit 7d65f9e8 upstream.

PIC interrupts do not support affinity setting and they can end up on
any online CPU. Therefore, it's required to mark the associated vectors
as system-wide reserved. Otherwise, the corresponding irq descriptors
are copied to the secondary CPUs but the vectors are not marked as
assigned or reserved. This works correctly for the IO/APIC case.

When the IO/APIC is disabled via config, kernel command line or lack of
enumeration then all legacy interrupts are routed through the PIC, but
nothing marks them as system-wide reserved vectors.

As a consequence, a subsequent allocation on a secondary CPU can result in
allocating one of these vectors, which triggers the BUG() in
apic_update_vector() because the interrupt descriptor slot is not empty.

Imran tried to work around that by marking those interrupts as allocated
when a CPU comes online. But that's wrong in case that the IO/APIC is
available and one of the legacy interrupts, e.g. IRQ0, has been switched to
PIC mode because then marking them as allocated will fail as they are
already marked as system vectors.

Stay consistent and update the legacy vectors after attempting IO/APIC
initialization and mark them as system vectors in case that no IO/APIC is
available.

Fixes: 69cde000 ("x86/vector: Use matrix allocator for vector assignment")
Reported-by: NImran Khan <imran.f.khan@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210519233928.2157496-1-imran.f.khan@oracle.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2c048ec2

drm/amdgpu: make sure we unpin the UVD BO · 328724d3

由 Nirmoy Das 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 3a6b69221f96f87c680bbc9fba01db3415b18f27
bugzilla: 109284
CVE: NA

--------------------------------

commit 07438603 upstream.

Releasing pinned BOs is illegal now. UVD 6 was missing from:
commit 2f40801d ("drm/amdgpu: make sure we unpin the UVD BO")

Fixes: 2f40801d ("drm/amdgpu: make sure we unpin the UVD BO")
Cc: stable@vger.kernel.org
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

328724d3

drm/amdgpu: Don't query CE and UE errors · 11bdbef4

由 Luben Tuikov 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 58da0b509e4b8f4a3a4b1b2e23871d108f81338a
bugzilla: 109284
CVE: NA

--------------------------------

commit dce3d8e1 upstream.

On QUERY2 IOCTL don't query counts of correctable
and uncorrectable errors, since when RAS is
enabled and supported on Vega20 server boards,
this takes insurmountably long time, in O(n^3),
which slows the system down to the point of it
being unusable when we have GUI up.

Fixes: ae363a21 ("drm/amdgpu: Add a new flag to AMDGPU_CTX_OP_QUERY_STATE2")
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

11bdbef4

nfc: fix NULL ptr dereference in llcp_sock_getname() after failed connect · 31c01c41

由 Krzysztof Kozlowski 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 48ee0db61c8299022ec88c79ad137f290196cac2
bugzilla: 109284
CVE: NA

--------------------------------

commit 4ac06a1e upstream.

It's possible to trigger NULL pointer dereference by local unprivileged
user, when calling getsockname() after failed bind() (e.g. the bind
fails because LLCP_SAP_MAX used as SAP):

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  CPU: 1 PID: 426 Comm: llcp_sock_getna Not tainted 5.13.0-rc2-next-20210521+ #9
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014
  Call Trace:
   llcp_sock_getname+0xb1/0xe0
   __sys_getpeername+0x95/0xc0
   ? lockdep_hardirqs_on_prepare+0xd5/0x180
   ? syscall_enter_from_user_mode+0x1c/0x40
   __x64_sys_getpeername+0x11/0x20
   do_syscall_64+0x36/0x70
   entry_SYSCALL_64_after_hwframe+0x44/0xae

This can be reproduced with Syzkaller C repro (bind followed by
getpeername):
https://syzkaller.appspot.com/x/repro.c?x=14def446e00000

Cc: <stable@vger.kernel.org>
Fixes: d646960f ("NFC: Initial LLCP support")
Reported-by: syzbot+80fb126e7f7d8b1a5914@syzkaller.appspotmail.com
Reported-by: Nbutt3rflyh4ck <butterflyhuangxx@gmail.com>
Signed-off-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20210531072138.5219-1-krzysztof.kozlowski@canonical.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

31c01c41

x86/sev: Check SME/SEV support in CPUID first · bbcd90c0

由 Pu Wen 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 445477e9274efd08459b7ccf19578a63c3596082
bugzilla: 109284
CVE: NA

--------------------------------

commit 009767db upstream.

The first two bits of the CPUID leaf 0x8000001F EAX indicate whether SEV
or SME is supported, respectively. It's better to check whether SEV or
SME is actually supported before accessing the MSR_AMD64_SEV to check
whether SEV or SME is enabled.

This is both a bare-metal issue and a guest/VM issue. Since the first
generation Hygon Dhyana CPU doesn't support the MSR_AMD64_SEV, reading that
MSR results in a #GP - either directly from hardware in the bare-metal
case or via the hypervisor (because the RDMSR is actually intercepted)
in the guest/VM case, resulting in a failed boot. And since this is very
early in the boot phase, rdmsrl_safe()/native_read_msr_safe() can't be
used.

So check the CPUID bits first, before accessing the MSR.

 [ tlendacky: Expand and improve commit message. ]
 [ bp: Massage commit message. ]

Fixes: eab696d8 ("x86/sev: Do not require Hypervisor CPUID bit for SEV guests")
Signed-off-by: NPu Wen <puwen@hygon.cn>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NTom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@vger.kernel.org> # v5.10+
Link: https://lkml.kernel.org/r/20210602070207.2480-1-puwen@hygon.cnSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bbcd90c0

x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid() · f227b84b

由 Thomas Gleixner 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 942c5864de85dc14602ec875e88e0337896db6d9
bugzilla: 109284
CVE: NA

--------------------------------

commit 9bfecd05 upstream.

While digesting the XSAVE-related horrors which got introduced with
the supervisor/user split, the recent addition of ENQCMD-related
functionality got on the radar and turned out to be similarly broken.

update_pasid(), which is only required when X86_FEATURE_ENQCMD is
available, is invoked from two places:

 1) From switch_to() for the incoming task

 2) Via a SMP function call from the IOMMU/SMV code

#1 is half-ways correct as it hacks around the brokenness of get_xsave_addr()
   by enforcing the state to be 'present', but all the conditionals in that
   code are completely pointless for that.

   Also the invocation is just useless overhead because at that point
   it's guaranteed that TIF_NEED_FPU_LOAD is set on the incoming task
   and all of this can be handled at return to user space.

#2 is broken beyond repair. The comment in the code claims that it is safe
   to invoke this in an IPI, but that's just wishful thinking.

   FPU state of a running task is protected by fregs_lock() which is
   nothing else than a local_bh_disable(). As BH-disabled regions run
   usually with interrupts enabled the IPI can hit a code section which
   modifies FPU state and there is absolutely no guarantee that any of the
   assumptions which are made for the IPI case is true.

   Also the IPI is sent to all CPUs in mm_cpumask(mm), but the IPI is
   invoked with a NULL pointer argument, so it can hit a completely
   unrelated task and unconditionally force an update for nothing.
   Worse, it can hit a kernel thread which operates on a user space
   address space and set a random PASID for it.

The offending commit does not cleanly revert, but it's sufficient to
force disable X86_FEATURE_ENQCMD and to remove the broken update_pasid()
code to make this dysfunctional all over the place. Anything more
complex would require more surgery and none of the related functions
outside of the x86 core code are blatantly wrong, so removing those
would be overkill.

As nothing enables the PASID bit in the IA32_XSS MSR yet, which is
required to make this actually work, this cannot result in a regression
except for related out of tree train-wrecks, but they are broken already
today.

Fixes: 20f0afd1 ("x86/mmu: Allocate/free a PASID")
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NAndy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/87mtsd6gr9.ffs@nanos.tec.linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f227b84b

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功