提交 · d16f801d191621e93ba1555fe0183f0904f4e9a4 · openeuler / Kernel

19 10月, 2021 40 次提交

switch file_open_root() to struct path · d16f801d

由 Al Viro 提交于 10月 19, 2021

mainline inclusion
from mainline-5.14-rc1
commit ffb37ca3
category: bugfix
bugzilla: 181657 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ffb37ca3bd16ce6ea2df2f87fde9a31e94ebb54b

---------------------------

... and provide file_open_root_mnt(), using the root of given mount.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

Conflicts:
	Documentation/filesystems/porting.rst
	[ Non-bugfix 14e43bf4("vfs: don't unnecessarily clone
	  write access for writable fd") is not applied. ]
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
[Roberto Sassu: Adjust file_open_root() called by load_digest_list()]
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d16f801d

kyber: introduce kyber_depth_updated() · fddff7d8

由 Yang Yang 提交于 10月 19, 2021

mainline inclusion
from mainline-5.12-rc1
commit ffa772cf
category: bugfix
bugzilla: 182133 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ffa772cfe9356ce94d3061335c2681f60e7c1c5b

-------------------------------------------------

Hang occurs when user changes the scheduler queue depth, by writing to
the 'nr_requests' sysfs file of that device.

The details of the environment that we found the problem are as follows:
  an eMMC block device
  total driver tags: 16
  default queue_depth: 32
  kqd->async_depth initialized in kyber_init_sched() with queue_depth=32

Then we change queue_depth to 256, by writing to the 'nr_requests' sysfs
file. But kqd->async_depth don't be updated after queue_depth changes.
Now the value of async depth is too small for queue_depth=256, this may
cause hang.

This patch introduces kyber_depth_updated(), so that kyber can update
async depth when queue depth changes.
Signed-off-by: NYang Yang <yang.yang@vivo.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fddff7d8

perf annotate: Add itrace options support · 0be5dff4

由 Yang Jihong 提交于 10月 19, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 4bcbe438
category: feature
bugzilla: 182229 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4bcbe438b3baaeb532dd50a5f002aed56c197e2a

-------------------------------------------------

The "auxtrace_info" and "auxtrace" functions are not set in "tool" member of
"annotate". As a result, perf annotate does not support parsing itrace data.

Before:

  # perf record -e arm_spe_0/branch_filter=1/ -a sleep 1
  [ perf record: Woken up 9 times to write data ]
  [ perf record: Captured and wrote 20.874 MB perf.data ]
  # perf annotate --stdio
  Error:
  The perf.data data has no samples!

Solution:

1. Add itrace options in help,
2. Set hook functions of "id_index", "auxtrace_info" and "auxtrace" in perf_tool.

After:

  # perf record --all-user -e arm_spe_0/branch_filter=1/ ls
  Couldn't synthesize bpf events.
  perf.data
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.010 MB perf.data ]
  # perf annotate --stdio
   Percent |      Source code & Disassembly of libc-2.28.so for branch-miss (1 samples, percent: local period)
  ------------------------------------------------------------------------------------------------------------
           :
           :
           :
           :           Disassembly of section .text:
           :
           :           0000000000066180 <__getdelim@@GLIBC_2.17>:
      0.00 :   66180:  stp     x29, x30, [sp, #-96]!
      0.00 :   66184:  cmp     x0, #0x0
      0.00 :   66188:  ccmp    x1, #0x0, #0x4, ne  // ne = any
      0.00 :   6618c:  mov     x29, sp
      0.00 :   66190:  stp     x24, x25, [sp, #56]
      0.00 :   66194:  stp     x26, x27, [sp, #72]
      0.00 :   66198:  str     x28, [sp, #88]
      0.00 :   6619c:  b.eq    66450 <__getdelim@@GLIBC_2.17+0x2d0>  // b.none
      0.00 :   661a0:  stp     x22, x23, [x29, #40]
      0.00 :   661a4:  mov     x22, x1
      0.00 :   661a8:  ldr     w1, [x3]
      0.00 :   661ac:  mov     w23, w2
      0.00 :   661b0:  stp     x20, x21, [x29, #24]
      0.00 :   661b4:  mov     x20, x3
      0.00 :   661b8:  mov     x21, x0
      0.00 :   661bc:  tbnz    w1, #15, 66360 <__getdelim@@GLIBC_2.17+0x1e0>
      0.00 :   661c0:  ldr     x0, [x3, #136]
      0.00 :   661c4:  ldr     x2, [x0, #8]
      0.00 :   661c8:  str     x19, [x29, #16]
      0.00 :   661cc:  mrs     x19, tpidr_el0
      0.00 :   661d0:  sub     x19, x19, #0x700
      0.00 :   661d4:  cmp     x2, x19
      0.00 :   661d8:  b.eq    663f0 <__getdelim@@GLIBC_2.17+0x270>  // b.none
      0.00 :   661dc:  mov     w1, #0x1                        // #1
      0.00 :   661e0:  ldaxr   w2, [x0]
      0.00 :   661e4:  cmp     w2, #0x0
      0.00 :   661e8:  b.ne    661f4 <__getdelim@@GLIBC_2.17+0x74>  // b.any
      0.00 :   661ec:  stxr    w3, w1, [x0]
      0.00 :   661f0:  cbnz    w3, 661e0 <__getdelim@@GLIBC_2.17+0x60>
      0.00 :   661f4:  b.ne    66448 <__getdelim@@GLIBC_2.17+0x2c8>  // b.any
      0.00 :   661f8:  ldr     x0, [x20, #136]
      0.00 :   661fc:  ldr     w1, [x20]
      0.00 :   66200:  ldr     w2, [x0, #4]
      0.00 :   66204:  str     x19, [x0, #8]
      0.00 :   66208:  add     w2, w2, #0x1
      0.00 :   6620c:  str     w2, [x0, #4]
      0.00 :   66210:  tbnz    w1, #5, 66388 <__getdelim@@GLIBC_2.17+0x208>
      0.00 :   66214:  ldr     x19, [x29, #16]
      0.00 :   66218:  ldr     x0, [x21]
      0.00 :   6621c:  cbz     x0, 66228 <__getdelim@@GLIBC_2.17+0xa8>
      0.00 :   66220:  ldr     x0, [x22]
      0.00 :   66224:  cbnz    x0, 6623c <__getdelim@@GLIBC_2.17+0xbc>
      0.00 :   66228:  mov     x0, #0x78                       // #120
      0.00 :   6622c:  str     x0, [x22]
      0.00 :   66230:  bl      20710 <malloc@plt>
      0.00 :   66234:  str     x0, [x21]
      0.00 :   66238:  cbz     x0, 66428 <__getdelim@@GLIBC_2.17+0x2a8>
      0.00 :   6623c:  ldr     x27, [x20, #8]
      0.00 :   66240:  str     x19, [x29, #16]
      0.00 :   66244:  ldr     x19, [x20, #16]
      0.00 :   66248:  sub     x19, x19, x27
      0.00 :   6624c:  cmp     x19, #0x0
      0.00 :   66250:  b.le    66398 <__getdelim@@GLIBC_2.17+0x218>
      0.00 :   66254:  mov     x25, #0x0                       // #0
      0.00 :   66258:  b       662d8 <__getdelim@@GLIBC_2.17+0x158>
      0.00 :   6625c:  nop
      0.00 :   66260:  add     x24, x19, x25
      0.00 :   66264:  ldr     x3, [x22]
      0.00 :   66268:  add     x26, x24, #0x1
      0.00 :   6626c:  ldr     x0, [x21]
      0.00 :   66270:  cmp     x3, x26
      0.00 :   66274:  b.cs    6629c <__getdelim@@GLIBC_2.17+0x11c>  // b.hs, b.nlast
      0.00 :   66278:  lsl     x3, x3, #1
      0.00 :   6627c:  cmp     x3, x26
      0.00 :   66280:  csel    x26, x3, x26, cs  // cs = hs, nlast
      0.00 :   66284:  mov     x1, x26
      0.00 :   66288:  bl      206f0 <realloc@plt>
      0.00 :   6628c:  cbz     x0, 66438 <__getdelim@@GLIBC_2.17+0x2b8>
      0.00 :   66290:  str     x0, [x21]
      0.00 :   66294:  ldr     x27, [x20, #8]
      0.00 :   66298:  str     x26, [x22]
      0.00 :   6629c:  mov     x2, x19
      0.00 :   662a0:  mov     x1, x27
      0.00 :   662a4:  add     x0, x0, x25
      0.00 :   662a8:  bl      87390 <explicit_bzero@@GLIBC_2.25+0x50>
      0.00 :   662ac:  ldr     x0, [x20, #8]
      0.00 :   662b0:  add     x19, x0, x19
      0.00 :   662b4:  str     x19, [x20, #8]
      0.00 :   662b8:  cbnz    x28, 66410 <__getdelim@@GLIBC_2.17+0x290>
      0.00 :   662bc:  mov     x0, x20
      0.00 :   662c0:  bl      73b80 <__underflow@@GLIBC_2.17>
      0.00 :   662c4:  cmn     w0, #0x1
      0.00 :   662c8:  b.eq    66410 <__getdelim@@GLIBC_2.17+0x290>  // b.none
      0.00 :   662cc:  ldp     x27, x19, [x20, #8]
      0.00 :   662d0:  mov     x25, x24
      0.00 :   662d4:  sub     x19, x19, x27
      0.00 :   662d8:  mov     x2, x19
      0.00 :   662dc:  mov     w1, w23
      0.00 :   662e0:  mov     x0, x27
      0.00 :   662e4:  bl      807b0 <memchr@@GLIBC_2.17>
      0.00 :   662e8:  cmp     x0, #0x0
      0.00 :   662ec:  mov     x28, x0
      0.00 :   662f0:  sub     x0, x0, x27
      0.00 :   662f4:  csinc   x19, x19, x0, eq  // eq = none
      0.00 :   662f8:  mov     x0, #0x7fffffffffffffff         // #9223372036854775807
      0.00 :   662fc:  sub     x0, x0, x25
      0.00 :   66300:  cmp     x19, x0
      0.00 :   66304:  b.lt    66260 <__getdelim@@GLIBC_2.17+0xe0>  // b.tstop
      0.00 :   66308:  adrp    x0, 17f000 <sys_sigabbrev@@GLIBC_2.17+0x320>
      0.00 :   6630c:  ldr     x0, [x0, #3624]
      0.00 :   66310:  mrs     x2, tpidr_el0
      0.00 :   66314:  ldr     x19, [x29, #16]
      0.00 :   66318:  mov     w3, #0x4b                       // #75
      0.00 :   6631c:  ldr     w1, [x20]
      0.00 :   66320:  mov     x24, #0xffffffffffffffff        // #-1
      0.00 :   66324:  str     w3, [x2, x0]
      0.00 :   66328:  tbnz    w1, #15, 66340 <__getdelim@@GLIBC_2.17+0x1c0>
      0.00 :   6632c:  ldr     x0, [x20, #136]
      0.00 :   66330:  ldr     w1, [x0, #4]
      0.00 :   66334:  sub     w1, w1, #0x1
      0.00 :   66338:  str     w1, [x0, #4]
      0.00 :   6633c:  cbz     w1, 663b8 <__getdelim@@GLIBC_2.17+0x238>
      0.00 :   66340:  mov     x0, x24
      0.00 :   66344:  ldr     x28, [sp, #88]
      0.00 :   66348:  ldp     x20, x21, [x29, #24]
      0.00 :   6634c:  ldp     x22, x23, [x29, #40]
      0.00 :   66350:  ldp     x24, x25, [sp, #56]
      0.00 :   66354:  ldp     x26, x27, [sp, #72]
      0.00 :   66358:  ldp     x29, x30, [sp], #96
      0.00 :   6635c:  ret
    100.00 :   66360:  tbz     w1, #5, 66218 <__getdelim@@GLIBC_2.17+0x98>
      0.00 :   66364:  ldp     x20, x21, [x29, #24]
      0.00 :   66368:  mov     x24, #0xffffffffffffffff        // #-1
      0.00 :   6636c:  ldp     x22, x23, [x29, #40]
      0.00 :   66370:  mov     x0, x24
      0.00 :   66374:  ldp     x24, x25, [sp, #56]
      0.00 :   66378:  ldp     x26, x27, [sp, #72]
      0.00 :   6637c:  ldr     x28, [sp, #88]
      0.00 :   66380:  ldp     x29, x30, [sp], #96
      0.00 :   66384:  ret
      0.00 :   66388:  mov     x24, #0xffffffffffffffff        // #-1
      0.00 :   6638c:  ldr     x19, [x29, #16]
      0.00 :   66390:  b       66328 <__getdelim@@GLIBC_2.17+0x1a8>
      0.00 :   66394:  nop
      0.00 :   66398:  mov     x0, x20
      0.00 :   6639c:  bl      73b80 <__underflow@@GLIBC_2.17>
      0.00 :   663a0:  cmn     w0, #0x1
      0.00 :   663a4:  b.eq    66438 <__getdelim@@GLIBC_2.17+0x2b8>  // b.none
      0.00 :   663a8:  ldp     x27, x19, [x20, #8]
      0.00 :   663ac:  sub     x19, x19, x27
      0.00 :   663b0:  b       66254 <__getdelim@@GLIBC_2.17+0xd4>
      0.00 :   663b4:  nop
      0.00 :   663b8:  str     xzr, [x0, #8]
      0.00 :   663bc:  ldxr    w2, [x0]
      0.00 :   663c0:  stlxr   w3, w1, [x0]
      0.00 :   663c4:  cbnz    w3, 663bc <__getdelim@@GLIBC_2.17+0x23c>
      0.00 :   663c8:  cmp     w2, #0x1
      0.00 :   663cc:  b.le    66340 <__getdelim@@GLIBC_2.17+0x1c0>
      0.00 :   663d0:  mov     x1, #0x81                       // #129
      0.00 :   663d4:  mov     x2, #0x1                        // #1
      0.00 :   663d8:  mov     x3, #0x0                        // #0
      0.00 :   663dc:  mov     x8, #0x62                       // #98
      0.00 :   663e0:  svc     #0x0
      0.00 :   663e4:  ldp     x20, x21, [x29, #24]
      0.00 :   663e8:  ldp     x22, x23, [x29, #40]
      0.00 :   663ec:  b       66370 <__getdelim@@GLIBC_2.17+0x1f0>
      0.00 :   663f0:  ldr     w2, [x0, #4]
      0.00 :   663f4:  add     w2, w2, #0x1
      0.00 :   663f8:  str     w2, [x0, #4]
      0.00 :   663fc:  tbz     w1, #5, 66214 <__getdelim@@GLIBC_2.17+0x94>
      0.00 :   66400:  mov     x24, #0xffffffffffffffff        // #-1
      0.00 :   66404:  ldr     x19, [x29, #16]
      0.00 :   66408:  b       66330 <__getdelim@@GLIBC_2.17+0x1b0>
      0.00 :   6640c:  nop
      0.00 :   66410:  ldr     x0, [x21]
      0.00 :   66414:  strb    wzr, [x0, x24]
      0.00 :   66418:  ldr     w1, [x20]
      0.00 :   6641c:  ldr     x19, [x29, #16]
      0.00 :   66420:  b       66328 <__getdelim@@GLIBC_2.17+0x1a8>
      0.00 :   66424:  nop
      0.00 :   66428:  mov     x24, #0xffffffffffffffff        // #-1
      0.00 :   6642c:  ldr     w1, [x20]
      0.00 :   66430:  b       66328 <__getdelim@@GLIBC_2.17+0x1a8>
      0.00 :   66434:  nop
      0.00 :   66438:  mov     x24, #0xffffffffffffffff        // #-1
      0.00 :   6643c:  ldr     w1, [x20]
      0.00 :   66440:  ldr     x19, [x29, #16]
      0.00 :   66444:  b       66328 <__getdelim@@GLIBC_2.17+0x1a8>
      0.00 :   66448:  bl      e3ba0 <pthread_setcanceltype@@GLIBC_2.17+0x30>
      0.00 :   6644c:  b       661f8 <__getdelim@@GLIBC_2.17+0x78>
      0.00 :   66450:  adrp    x0, 17f000 <sys_sigabbrev@@GLIBC_2.17+0x320>
      0.00 :   66454:  ldr     x0, [x0, #3624]
      0.00 :   66458:  mrs     x1, tpidr_el0
      0.00 :   6645c:  mov     w2, #0x16                       // #22
      0.00 :   66460:  mov     x24, #0xffffffffffffffff        // #-1
      0.00 :   66464:  str     w2, [x1, x0]
      0.00 :   66468:  b       66370 <__getdelim@@GLIBC_2.17+0x1f0>
      0.00 :   6646c:  ldr     w1, [x20]
      0.00 :   66470:  mov     x4, x0
      0.00 :   66474:  tbnz    w1, #15, 6648c <__getdelim@@GLIBC_2.17+0x30c>
      0.00 :   66478:  ldr     x0, [x20, #136]
      0.00 :   6647c:  ldr     w1, [x0, #4]
      0.00 :   66480:  sub     w1, w1, #0x1
      0.00 :   66484:  str     w1, [x0, #4]
      0.00 :   66488:  cbz     w1, 66494 <__getdelim@@GLIBC_2.17+0x314>
      0.00 :   6648c:  mov     x0, x4
      0.00 :   66490:  bl      20e40 <gnu_get_libc_version@@GLIBC_2.17+0x130>
      0.00 :   66494:  str     xzr, [x0, #8]
      0.00 :   66498:  ldxr    w2, [x0]
      0.00 :   6649c:  stlxr   w3, w1, [x0]
      0.00 :   664a0:  cbnz    w3, 66498 <__getdelim@@GLIBC_2.17+0x318>
      0.00 :   664a4:  cmp     w2, #0x1
      0.00 :   664a8:  b.le    6648c <__getdelim@@GLIBC_2.17+0x30c>
      0.00 :   664ac:  mov     x1, #0x81                       // #129
      0.00 :   664b0:  mov     x2, #0x1                        // #1
      0.00 :   664b4:  mov     x3, #0x0                        // #0
      0.00 :   664b8:  mov     x8, #0x62                       // #98
      0.00 :   664bc:  svc     #0x0
      0.00 :   664c0:  b       6648c <__getdelim@@GLIBC_2.17+0x30c>
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Tested-by: NLeo Yan <leo.yan@linaro.org>
Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210615091704.259202-1-yangjihong1@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NLi Huafei <lihuafei1@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0be5dff4

mm: Fix the uninitialized use in overcommit_policy_handler · 33e71cb7

由 Chen Jun 提交于 10月 19, 2021

maillist inclusion
category: bugfix
bugzilla: 182215 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://lore.kernel.org/linux-mm/20210922014122.47219-1-chenjun102@huawei.com/T/#u

-------------------------------------------------

An unexpected value of /proc/sys/vm/overcommit_memory we will get,
after running the following program.

int main()
{
    int fd = open("/proc/sys/vm/overcommit_memory", O_RDWR)
    write(fd, "1", 1);
    write(fd, "2", 1);
    close(fd);
}

write(fd, "2", 1) will pass *ppos = 1 to proc_dointvec_minmax.
proc_dointvec_minmax will return 0 without setting new_policy.

t.data = &new_policy;
ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos)
      -->do_proc_dointvec
         -->__do_proc_dointvec
              if (write) {
                if (proc_first_pos_non_zero_ignore(ppos, table))
                  goto out;

sysctl_overcommit_memory = new_policy;

so sysctl_overcommit_memory will be set to an uninitialized value.

Check whether new_policy has been changed by proc_dointvec_minmax.

Fixes: 56f3547b ("mm: adjust vm_committed_as_batch according to vm overcommit policy"
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

33e71cb7

memcg: enable accounting for ldt_struct objects · 5a15bdbc

由 Vasily Averin 提交于 10月 19, 2021

mainline inclusion
from mainline-v5.15-rc1
commit ec403e2a
bugzilla: 181858 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ec403e2ae0dfc85996aad6e944a98a16e6dfcc6d

--------------------------------

Each task can request own LDT and force the kernel to allocate up to 64Kb
memory per-mm.

There are legitimate workloads with hundreds of processes and there can be
hundreds of workloads running on large machines.  The unaccounted memory
can cause isolation issues between the workloads particularly on highly
utilized machines.

It makes sense to account for this objects to restrict the host's memory
consumption from inside the memcg-limited container.

Link: https://lkml.kernel.org/r/38010594-50fe-c06d-7cb0-d1f77ca422f3@virtuozzo.comSigned-off-by: NVasily Averin <vvs@virtuozzo.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yutian Yang <nglaive@gmail.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLi Ming <limingming.li@huawei.com>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5a15bdbc

memcg: enable accounting for posix_timers_cache slab · 9f73f243

由 Vasily Averin 提交于 10月 19, 2021

mainline inclusion
from mainline-v5.15-rc1
commit c509723e
bugzilla: 181858 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c509723ec27e925bb91a20682c448e95d4bc8c9f

--------------------------------

A program may create multiple interval timers using timer_create().  For
each timer the kernel preallocates a "queued real-time signal",
Consequently, the number of timers is limited by the RLIMIT_SIGPENDING
resource limit.  The allocated object is quite small, ~250 bytes, but even
the default signal limits allow to consume up to 100 megabytes per user.

It makes sense to account for them to limit the host's memory consumption
from inside the memcg-limited container.

Link: https://lkml.kernel.org/r/57795560-025c-267c-6b1a-dea852d95530@virtuozzo.comSigned-off-by: NVasily Averin <vvs@virtuozzo.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yutian Yang <nglaive@gmail.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLi Ming <limingming.li@huawei.com>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9f73f243

memcg: enable accounting for signals · e8cd2ee2

由 Vasily Averin 提交于 10月 19, 2021

mainline inclusion
from mainline-v5.15-rc1
commit 5f58c398
bugzilla: 181858 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5f58c39819ff78ca5ddbba2b3cd8ff4779b19bb5

--------------------------------

When a user send a signal to any another processes it forces the kernel to
allocate memory for 'struct sigqueue' objects.  The number of signals is
limited by RLIMIT_SIGPENDING resource limit, but even the default settings
allow each user to consume up to several megabytes of memory.

It makes sense to account for these allocations to restrict the host's
memory consumption from inside the memcg-limited container.

Link: https://lkml.kernel.org/r/e34e958c-e785-712e-a62a-2c7b66c646c7@virtuozzo.comSigned-off-by: NVasily Averin <vvs@virtuozzo.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yutian Yang <nglaive@gmail.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLi Ming <limingming.li@huawei.com>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e8cd2ee2

memcg: enable accounting for new namesapces and struct nsproxy · befac4c1

由 Vasily Averin 提交于 10月 19, 2021

mainline inclusion
from mainline-v5.15-rc1
commit 30acd0bd
bugzilla: 181858 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=30acd0bdfb86548172168a0cc71d455944de0683

--------------------------------

Container admin can create new namespaces and force kernel to allocate up
to several pages of memory for the namespaces and its associated
structures.

Net and uts namespaces have enabled accounting for such allocations.  It
makes sense to account for rest ones to restrict the host's memory
consumption from inside the memcg-limited container.

Link: https://lkml.kernel.org/r/5525bcbf-533e-da27-79b7-158686c64e13@virtuozzo.comSigned-off-by: NVasily Averin <vvs@virtuozzo.com>
Acked-by: NSerge Hallyn <serge@hallyn.com>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Acked-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yutian Yang <nglaive@gmail.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

Conflicts:
	kernel/time/namespace.c
Signed-off-by: NLi Ming <limingming.li@huawei.com>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

befac4c1

memcg: enable accounting for fasync_cache · b4e5feb6

由 Vasily Averin 提交于 10月 19, 2021

mainline inclusion
from mainline-v5.15-rc1
commit 839d6820
bugzilla: 181858 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=839d68206de869b8cb4272c5ea10da2ef7ec34cb

--------------------------------

fasync_struct is used by almost all character device drivers to set up the
fasync queue, and for regular files by the file lease code.  This
structure is quite small but long-living and it can be assigned for any
open file.

It makes sense to account for its allocations to restrict the host's
memory consumption from inside the memcg-limited container.

Link: https://lkml.kernel.org/r/1b408625-d71c-0b26-b0b6-9baf00f93e69@virtuozzo.comSigned-off-by: NVasily Averin <vvs@virtuozzo.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yutian Yang <nglaive@gmail.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLi Ming <limingming.li@huawei.com>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b4e5feb6

memcg: enable accounting for mnt_cache entries · b2f25c6a

由 Vasily Averin 提交于 10月 19, 2021

mainline inclusion
from mainline-v5.15-rc1
commit 79f6540b
bugzilla: 181858 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=79f6540ba88dfb383ecf057a3425e668105ca774

--------------------------------

Patch series "memcg accounting from OpenVZ", v7.

OpenVZ uses memory accounting 20+ years since v2.2.x linux kernels.
Initially we used our own accounting subsystem, then partially committed
it to upstream, and a few years ago switched to cgroups v1.  Now we're
rebasing again, revising our old patches and trying to push them upstream.

We try to protect the host system from any misuse of kernel memory
allocation triggered by untrusted users inside the containers.

Patch-set is addressed mostly to cgroups maintainers and cgroups@ mailing
list, though I would be very grateful for any comments from maintainersi
of affected subsystems or other people added in cc:

Compared to the upstream, we additionally account the following kernel objects:
- network devices and its Tx/Rx queues
- ipv4/v6 addresses and routing-related objects
- inet_bind_bucket cache objects
- VLAN group arrays
- ipv6/sit: ip_tunnel_prl
- scm_fp_list objects used by SCM_RIGHTS messages of Unix sockets
- nsproxy and namespace objects itself
- IPC objects: semaphores, message queues and share memory segments
- mounts
- pollfd and select bits arrays
- signals and posix timers
- file lock
- fasync_struct used by the file lease code and driver's fasync queues
- tty objects
- per-mm LDT

We have an incorrect/incomplete/obsoleted accounting for few other kernel
objects: sk_filter, af_packets, netlink and xt_counters for iptables.
They require rework and probably will be dropped at all.

Also we're going to add an accounting for nft, however it is not ready
yet.

We have not tested performance on upstream, however, our performance team
compares our current RHEL7-based production kernel and reports that they
are at least not worse as the according original RHEL7 kernel.

This patch (of 10):

The kernel allocates ~400 bytes of 'struct mount' for any new mount.
Creating a new mount namespace clones most of the parent mounts, and this
can be repeated many times.  Additionally, each mount allocates up to
PATH_MAX=4096 bytes for mnt->mnt_devname.

It makes sense to account for these allocations to restrict the host's
memory consumption from inside the memcg-limited container.

Link: https://lkml.kernel.org/r/045db11f-4a45-7c9b-2664-5b32c2b44943@virtuozzo.comSigned-off-by: NVasily Averin <vvs@virtuozzo.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Yutian Yang <nglaive@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLi Ming <limingming.li@huawei.com>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b2f25c6a

memcg: charge fs_context and legacy_fs_context · 1f91195e

由 Yutian Yang 提交于 10月 19, 2021

mainline inclusion
from mainline-v5.15-rc1
commit bb902cb4
bugzilla: 181858 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bb902cb47cf93b33cd92b3b7a4019330a03ef57f

--------------------------------

This patch adds accounting flags to fs_context and legacy_fs_context
allocation sites so that kernel could correctly charge these objects.

We have written a PoC to demonstrate the effect of the missing-charging
bugs.  The PoC takes around 1,200MB unaccounted memory, while it is
charged for only 362MB memory usage.  We evaluate the PoC on QEMU x86_64
v5.2.90 + Linux kernel v5.10.19 + Debian buster.  All the limitations
including ulimits and sysctl variables are set as default.  Specifically,
the hard NOFILE limit and nr_open in sysctl are both 1,048,576.

/*------------------------- POC code ----------------------------*/

                        } while (0)

static inline int fsopen(const char *fs_name, unsigned int flags)
{
        return syscall(__NR_fsopen, fs_name, flags);
}

static char thread_stack[512][STACK_SIZE];

int thread_fn(void* arg)
{
  for (int i = 0; i< 800000; ++i) {
    int fsfd = fsopen("nfs", FSOPEN_CLOEXEC);
    if (fsfd == -1) {
      errExit("fsopen");
    }
  }
  while(1);
  return 0;
}

int main(int argc, char *argv[]) {
  int thread_pid;
  for (int i = 0; i < 1; ++i) {
    thread_pid = clone(thread_fn, thread_stack[i] + STACK_SIZE, \
      SIGCHLD, NULL);
  }
  while(1);
  return 0;
}

/*-------------------------- end --------------------------------*/

Link: https://lkml.kernel.org/r/1626517201-24086-1-git-send-email-nglaive@gmail.comSigned-off-by: NYutian Yang <nglaive@gmail.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <shenwenbo@zju.edu.cn>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLi Ming <limingming.li@huawei.com>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1f91195e

memcg: enable accounting for pids in nested pid namespaces · 0823d756

由 Vasily Averin 提交于 10月 19, 2021

mainline inclusion
from mainline-v5.15-rc1
commit fab827db
bugzilla: 181858 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fab827dbee8c2e06ca4ba000fa6c48bcf9054aba

--------------------------------

Commit 5d097056 ("kmemcg: account certain kmem allocations to memcg")
enabled memcg accounting for pids allocated from init_pid_ns.pid_cachep,
but forgot to adjust the setting for nested pid namespaces.  As a result,
pid memory is not accounted exactly where it is really needed, inside
memcg-limited containers with their own pid namespaces.

Pid was one the first kernel objects enabled for memcg accounting.
init_pid_ns.pid_cachep marked by SLAB_ACCOUNT and we can expect that any
new pids in the system are memcg-accounted.

Though recently I've noticed that it is wrong.  nested pid namespaces
creates own slab caches for pid objects, nested pids have increased size
because contain id both for all parent and for own pid namespaces.  The
problem is that these slab caches are _NOT_ marked by SLAB_ACCOUNT, as a
result any pids allocated in nested pid namespaces are not
memcg-accounted.

Pid struct in nested pid namespace consumes up to 500 bytes memory, 100000
such objects gives us up to ~50Mb unaccounted memory, this allow container
to exceed assigned memcg limits.

Link: https://lkml.kernel.org/r/8b6de616-fd1a-02c6-cbdb-976ecdcfa604@virtuozzo.com
Fixes: 5d097056 ("kmemcg: account certain kmem allocations to memcg")
Cc: stable@vger.kernel.org
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLi Ming <limingming.li@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0823d756

blk-mq: fix divide by zero crash in tg_may_dispatch() · 21d91581

由 Yu Kuai 提交于 10月 19, 2021

hulk inclusion
category: bugfix
bugzilla: 177149 https://gitee.com/openeuler/kernel/issues/I4DDEL

-----------------------------------------------

If blk-throttle is enabled and io is issued before
blk_throtl_register_queue() is done. Divide by zero crash will be
triggered in tg_may_dispatch() because 'throtl_slice' is uninitialized.

Thus introduce a new flag QUEUE_FLAG_THROTL_INIT_DONE. It will be set
after blk_throtl_register_queue() is done, and will be checked before
applying any config.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

21d91581

ext4: prevent getting empty inode buffer · a3e3ff2d

由 Zhang Yi 提交于 10月 19, 2021

hulk inclusion
category: bugfix
bugzilla: 174653 https://gitee.com/openeuler/kernel/issues/I4DDEL
---------------------------

In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
inode buffer when the inode monopolize an inode block for performance
reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
buffer to make it fine, but we could miss this call if something bad
happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
empty inode buffer and trigger ext4 error.

For example, if we remove a nonexistent xattr on inode A,
ext4_xattr_set_handle() will return ENODATA before invoking
ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
will get checksum error message in ext4_iget() when getting inode again.

  EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid

Even worse, if we allocate another inode B at the same inode block, it
will corrupt the inode A on disk when write back inode B.

So this patch initialize the inode buffer by filling the in-mem inode
contents if we skip read I/O, ensure that the buffer is really uptodate.
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a3e3ff2d

ext4: move ext4_fill_raw_inode() related functions · fce5c640

由 Zhang Yi 提交于 10月 19, 2021

hulk inclusion
category: bugfix
bugzilla: 174653 https://gitee.com/openeuler/kernel/issues/I4DDEL
---------------------------

In preparation for calling ext4_fill_raw_inode() in
__ext4_get_inode_loc(), move three related functions before
__ext4_get_inode_loc(), no logical change.
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fce5c640

ext4: factor out ext4_fill_raw_inode() · 40b43297

由 Zhang Yi 提交于 10月 19, 2021

hulk inclusion
category: bugfix
bugzilla: 174653 https://gitee.com/openeuler/kernel/issues/I4DDEL
---------------------------

Factor out ext4_fill_raw_inode() from ext4_do_update_inode(), which is
use to fill the in-mem inode contents into the inode table buffer, in
preparation for initializing the exclusive inode buffer without reading
the block in __ext4_get_inode_loc().
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

40b43297

ext4: make the updating inode data procedure atomic · 77c57232

由 Zhang Yi 提交于 10月 19, 2021

mainline inclusion
from mainline-5.15-rc1
commit baaae979
category: bugfix
bugzilla: 174653 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=baaae979b112642a41b71c71c599d875c067d257

---------------------------

Now that ext4_do_update_inode() return error before filling the whole
inode data if we fail to set inode blocks in ext4_inode_blocks_set().
This error should never happen in theory since sb->s_maxbytes should not
have allowed this, we have already init sb->s_maxbytes according to this
feature in ext4_fill_super(). So even through that could only happen due
to the filesystem corruption, we'd better to return after we finish
updating the inode because it may left an uninitialized buffer and we
could read this buffer later in "errors=continue" mode.

This patch make the updating inode data procedure atomic, call
EXT4_ERROR_INODE() after we dropping i_raw_lock after something bad
happened, make sure that the inode is integrated, and also drop a BUG_ON
and do some small cleanups.
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210826130412.3921207-4-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

77c57232

ext4: move inode eio simulation behind io completeion · cc1cdb17

由 Zhang Yi 提交于 10月 19, 2021

mainline inclusion
from mainline-5.15-rc1
commit 0904c9ae
category: bugfix
bugzilla: 174653 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0904c9ae3465c7acc066a564a76b75c0af83e6c7

---------------------------

No EIO simulation is required if the buffer is uptodate, so move the
simulation behind read bio completeion just like inode/block bitmap
simulation does.
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210826130412.3921207-2-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cc1cdb17

sched: Aware multi-core system for optimize loadtracking · 17b31a72

由 Yu Jiahua 提交于 10月 19, 2021

hulk inclusion
category: feature
bugzilla: 181656 https://gitee.com/openeuler/kernel/issues/I4DDEL

-------------------------------------------------

Optimize load tracking feature have uncertain impact
on scheduler in multi-core system. Therefore, an aware
switch is needed to percept the number of cpus on system,
if one than one cpu is detected, optimize load tracking
feature will be disable.
Signed-off-by: NYu Jiahua <yujiahua1@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

17b31a72

livepatch: Fix compile warnning · 25b368f4

由 Ye Weihua 提交于 10月 19, 2021

hulk inclusion
category: bugfix
bugzilla: 181325 https://gitee.com/openeuler/kernel/issues/I4DDEL

--------------------------------

An error is reported during version building: error: ISO C90 forbids mixed
declarations and code.

Fix it by moving the variable definition forward.
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

25b368f4

md: revert io stats accounting · b84a2bb4

由 Guoqing Jiang 提交于 10月 19, 2021

mainline inclusion
from mainline-v5.14-rc1
commit ad3fc798
category: bugfix
bugzilla: 169402 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ad3fc798800fb7ca04c1dfc439dba946818048d8

-------------------------------------------------

The commit 41d2d848 ("md: improve io stats accounting") could cause
double fault problem per the report [1], and also it is not correct to
change ->bi_end_io if md don't own it, so let's revert it.

And io stats accounting will be replemented in later commits.

[1]. https://lore.kernel.org/linux-raid/3bf04253-3fad-434a-63a7-20214e38cf26@gmail.com/T/#t

Fixes: 41d2d848 ("md: improve io stats accounting")
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>
Signed-off-by: NLuo Meng <luomeng12@huawei.com>

Conflicts:
	drivers/md/md.c
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b84a2bb4

sched/idle: Reported an error when an illegal negative value is passed · fe426fa7

由 Li Hua 提交于 10月 19, 2021

hulk inclusion
category: bugfix
bugzilla: 180842 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reported an error when an illegal negative value is passed. The allowed value is
0 to INT_MAX.
Signed-off-by: NLi Hua <hucool.lihua@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fe426fa7

sched/idle: Optimize the loop time algorithm to reduce multicore disturb · 247b8dd6

由 Li Hua 提交于 10月 19, 2021

hulk inclusion
category: bugfix
bugzilla: 180841 https://gitee.com/openeuler/kernel/issues/I4DDEL

The idle cpu will call ktime_get frequently. The smp_rmb() will influence the
share cache, especially arm a15 share L2 cache. The performance drop will enlarge
when the loop time is millisecond level.
Signed-off-by: NLi Hua <hucool.lihua@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

247b8dd6

serial: 8250: 8250_omap: Fix possible array out of bounds access · fd954cee

由 Vignesh Raghavendra 提交于 10月 19, 2021

mainline inclusion
from mainline v5.11-rc1
commit d4548b14
category: bugfix
bugzilla: 176665 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.59&id=d4548b14dd7e5c698f81ce23ce7b69a896373b45

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d4548b14dd7e5c698f81ce23ce7b69a896373b45

--------------------------------

k3_soc_devices array is missing a sentinel entry which may result in out
of bounds access as reported by kernel KASAN.

Fix this by adding a sentinel entry.

Fixes: 439c7183 ("serial: 8250: 8250_omap: Disable RX interrupt after DMA enable")
Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: NVignesh Raghavendra <vigneshr@ti.com>
Link: https://lore.kernel.org/r/20201111112653.2710-1-vigneshr@ti.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYi Yang <yiyang13@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fd954cee

once: Fix panic when module unload · 4cb945e8

由 Kefeng Wang 提交于 10月 19, 2021

mainline inclusion
from mainline-5.14-rc6
commit 1027b96e
category: bugfix
bugzilla: 176712 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1027b96ec9d34f9abab69bc1a4dc5b1ad8ab1349

-------------------------------------------------
DO_ONCE
DEFINE_STATIC_KEY_TRUE(___once_key);
__do_once_done
  once_disable_jump(once_key);
    INIT_WORK(&w->work, once_deferred);
    struct once_work *w;
    w->key = key;
    schedule_work(&w->work);                     module unload
                                                   //*the key is
destroy*
process_one_work
  once_deferred
    BUG_ON(!static_key_enabled(work->key));
       static_key_count((struct static_key *)x)    //*access key, crash*

When module uses DO_ONCE mechanism, it could crash due to the above
concurrency problem, we could reproduce it with link[1].

Fix it by add/put module refcount in the once work process.

[1] https://lore.kernel.org/netdev/eaa6c371-465e-57eb-6be9-f4b16b9d7cbf@huawei.com/

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Reported-by: NMinmin chen <chenmingmin@huawei.com>
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
(cherry picked from commit 1027b96e)
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4cb945e8

ext4: wipe ext4_dir_entry2 upon file deletion · cbf92ac9

由 Leah Rumancik 提交于 10月 19, 2021

mainline inclusion
from mainline-5.13-rc1
commit 6c091273
category: bugfix
bugzilla: 176545 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6c0912739699d8e4b6a87086401bf3ad3c59502d

-------------------------------------------------

Upon file deletion, zero out all fields in ext4_dir_entry2 besides rec_len.
In case sensitive data is stored in filenames, this ensures no potentially
sensitive data is left in the directory entry upon deletion. Also, wipe
these fields upon moving a directory entry during the conversion to an
htree and when splitting htree nodes.

The data wiped may still exist in the journal, but there are future
commits planned to address this.
Signed-off-by: NLeah Rumancik <leah.rumancik@gmail.com>
Link: https://lore.kernel.org/r/20210422180834.2242353-1-leah.rumancik@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cbf92ac9

livepatch: move arch_klp_mem_recycle after the return value judgment · 0a6cfcc1

由 Ye Weihua 提交于 10月 19, 2021

hulk inclusion
category: bugfix
bugzilla: 176976 https://gitee.com/openeuler/kernel/issues/I4DDEL

--------------------------------

Before enable a livepatch, we apply for a piece of memory for func_node to
store function information and release it after disable this livepatch.

However, in some special cases, for example, the livepatch code is running,
disable fails. In these cases, the applied memory should not be released.
Otherwise, the livepatch cannot be disabled.

So, we move arch_klp_mem_recycle after the return value judgment to solve
this problem.

Fixes: ec7ce700674f ("livepatch: put memory alloc and free out stop machine")
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0a6cfcc1

livepatch/x86: only check stack top · 71d8599d

由 Ye Weihua 提交于 10月 19, 2021

hulk inclusion
category: feature
bugzilla: 119440 https://gitee.com/openeuler/kernel/issues/I4DDEL

--------------------------------

Enable stack optimize on x86.
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

71d8599d

livepatch/ppc64: only check stack top · ec2244b5

由 Ye Weihua 提交于 10月 19, 2021

hulk inclusion
category: feature
bugzilla: 119440 https://gitee.com/openeuler/kernel/issues/I4DDEL

--------------------------------

Enable stack optimize on ppc64.
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ec2244b5

livepatch/ppc32: only check stack top · 5989ac1f

由 Ye Weihua 提交于 10月 19, 2021

hulk inclusion
category: feature
bugzilla: 119440 https://gitee.com/openeuler/kernel/issues/I4DDEL

--------------------------------

Enable stack optimize on ppc32.
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5989ac1f

livepatch/arm: only check stack top · e582ce90

由 Ye Weihua 提交于 10月 19, 2021

hulk inclusion
category: feature
bugzilla: 119440 https://gitee.com/openeuler/kernel/issues/I4DDEL

--------------------------------

Enable stack optimize on arm.
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e582ce90

livepatch/arm64: only check stack top · 3c4a37c0

由 Ye Weihua 提交于 10月 19, 2021

hulk inclusion
category: feature
bugzilla: 119440 https://gitee.com/openeuler/kernel/issues/I4DDEL

--------------------------------

Based on the commit 'livepatch: checks only of the replaced instruction
is on the stack', the livepatch only needs to check the replaced
instructions during stack check.

If the instructions to be replaced do not contain a jump instruction,
the instructions may only appear at the top of the stack. Thus, after
confirming that the instructions to be replaced do not contain a jump
instruction, only the top of the stack instead of entire stack may be
checked.

Each function in livepatch has a force tag. When the value is
KLP_STACK_OPTIMIZE, the function of checking only the top of the stack
is enabled to speed up the check.
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3c4a37c0

livepatch: checks only if the replaced instruction is on the stack · 86e35fae

由 Ye Weihua 提交于 10月 19, 2021

hulk inclusion
category: feature
bugzilla: 119440 https://gitee.com/openeuler/kernel/issues/I4DDEL

--------------------------------

When the CONFIG_LIVEPATCH_STOP_MACHINE_CONSISTENCY macro is turned
on, the system checks whether the function to patch is on the stack
under the stop_machine. If the function is on the stack, the livepatch
cannot be patched and returns a busy signal.

Hotspot functions are easily on the stack under the stop_machine
condition. As a result, the livpatch success rate is low when the
patch includes a hot function.

For the repalced function, only the first seceral instructions are
rewritten, and the rest of the instructions are the same as the
original ones. Therefore, if the force flag is KLP_STACK_OPTIMIZE,
only need to check whether the replaced instructions are on the
stack.
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

86e35fae

livepatch: Add state describe for force · a9fdad35

由 Ye Weihua 提交于 10月 19, 2021

hulk inclusion
category: feature
bugzilla: 119440 https://gitee.com/openeuler/kernel/issues/I4DDEL

--------------------------------

The force field is divided into three states. KLP_NORMAL_FORCE
indicates that a hot patch is installed according to the initial
rule. KLP_ENFORCMENT indicates that the hot patch of the function
must be installed. KLP_STACK_OPTIMIZE is prepared for stack
optimization policy.
Signed-off-by: NYe Weihua <yeweihua4@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a9fdad35

blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED · 17ca3935

由 Yu Kuai 提交于 10月 19, 2021

mainline inclusion
from mainline-5.14-rc6
commit 454bb677
category: bugfix
bugzilla: 175277 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=454bb6775202d94f0f489c4632efecdb62d3c904

-------------------------------------------------

We run a test that delete and recover devcies frequently(two devices on
the same host), and we found that 'active_queues' is super big after a
period of time.

If device a and device b share a tag set, and a is deleted, then
blk_mq_exit_queue() will clear BLK_MQ_F_TAG_QUEUE_SHARED because there
is only one queue that are using the tag set. However, if b is still
active, the active_queues of b might never be cleared even if b is
deleted.

Thus clear active_queues before BLK_MQ_F_TAG_QUEUE_SHARED is cleared.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210731062130.1533893-1-yukuai3@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

17ca3935

sysctl: Refactor IAS framework · 189fa7a4

由 Zheng Zucheng 提交于 10月 19, 2021

hulk inclusion
category: feature
bugzilla: 177206 https://gitee.com/openeuler/kernel/issues/I4DDEL

--------------------------------

Refactor intelligent aware scheduler framework
Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

189fa7a4

io_uring: ensure symmetry in handling iter types in loop_rw_iter() · f4f98f3b

由 Jens Axboe 提交于 10月 19, 2021

stable inclusion
from stable-5.10.68
commit ce8f81b76d3bef7b9fe6c8f84d029ab898b19469
bugzilla: 182253 https://gitee.com/openeuler/kernel/issues/I4DDEL
CVE: CVE-2021-41073

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ce8f81b76d3bef7b9fe6c8f84d029ab898b19469

--------------------------------

commit 16c8d2df upstream.

When setting up the next segment, we check what type the iter is and
handle it accordingly. However, when incrementing and processed amount
we do not, and both iter advance and addr/len are adjusted, regardless
of type. Split the increment side just like we do on the setup side.

Fixes: 4017eb91 ("io_uring: make loop_rw_iter() use original user supplied pointers")
Cc: stable@vger.kernel.org
Reported-by: NValentina Palmiotti <vpalmiotti@gmail.com>
Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f4f98f3b

ext4: fix race writing to an inline_data file while its xattrs are changing · 0458938a

由 Theodore Ts'o 提交于 10月 19, 2021

mainline inclusion
from mainline
commit a54c4613
bugzilla: 181006 https://gitee.com/openeuler/kernel/issues/I4DDEL
CVE: CVE-2021-40490

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a54c4613dac1500b40e4ab55199f7c51f028e848

--------------------------------

The location of the system.data extended attribute can change whenever
xattr_sem is not taken.  So we need to recalculate the i_inline_off
field since it mgiht have changed between ext4_write_begin() and
ext4_write_end().

This means that caching i_inline_off is probably not helpful, so in
the long run we should probably get rid of it and shrink the in-memory
ext4 inode slightly, but let's fix the race the simple way for now.

Cc: stable@kernel.org
Fixes: f19d5870 ("ext4: add normal write support for inline data")
Reported-by: syzbot+13146364637c7363a7de@syzkaller.appspotmail.com
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0458938a

memcg: enable accounting of ipc resources · 40076619

由 Vasily Averin 提交于 10月 19, 2021

mainline inclusion
from mainline
commit 18319498
bugzilla: 181329 https://gitee.com/openeuler/kernel/issues/I4DDEL
CVE: CVE-2021-3759

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18319498fdd4cdf8c1c2c48cd432863b1f915d6f

--------------------------------

When user creates IPC objects it forces kernel to allocate memory for
these long-living objects.

It makes sense to account them to restrict the host's memory consumption
from inside the memcg-limited container.

This patch enables accounting for IPC shared memory segments, messages
semaphores and semaphore's undo lists.

Link: https://lkml.kernel.org/r/d6507b06-4df6-78f8-6c54-3ae86e3b5339@virtuozzo.comSigned-off-by: NVasily Averin <vvs@virtuozzo.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Yutian Yang <nglaive@gmail.com>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>

Conflicts:
	ipc/msg.c
	ipc/sem.c
	ipc/shm.c
Reviewed-by: NWang Hui <john.wanghui@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

40076619

vt_kdsetmode: extend console locking · 9d897316

由 Linus Torvalds 提交于 10月 19, 2021

mainline inclusion
from mainline
commit 2287a51b
bugzilla: 181008 https://gitee.com/openeuler/kernel/issues/I4DDEL
CVE: CVE-2021-3753

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2287a51ba822384834dafc1c798453375d1107c7

--------------------------------

As per the long-suffering comment.
Reported-by: NMinh Yuan <yuanmingbuaa@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9d897316

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功