提交 · 6555852b976cedf9b3c726fa352fc1d0c572915e · openeuler / Kernel

03 7月, 2021 40 次提交

dm btree remove: assign new_root only when removal succeeds · 6555852b

由 Hou Tao 提交于 3年前

mainline inclusion
from mainline-next
commit b8e0c7f90e6f99ee64ea60e39253d5fcfb445f9e
category: bugfix
bugzilla: 167383
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b8e0c7f90e6f99ee64ea60e39253d5fcfb445f9e

--------------------------------

remove_raw() in dm_btree_remove() may fail due to IO read error
(e.g. read the content of origin block fails during shadowing),
and the value of shadow_spine::root is uninitialized, but
the uninitialized value is still assign to new_root in the
end of dm_btree_remove().

For dm-thin, the value of pmd->details_root or pmd->root will become
an uninitialized value, so if trying to read details_info tree again
out-of-bound memory may occur as showed below:

  general protection fault, probably for non-canonical address 0x3fdcb14c8d7520
  CPU: 4 PID: 515 Comm: dmsetup Not tainted 5.13.0-rc6
  Hardware name: QEMU Standard PC
  RIP: 0010:metadata_ll_load_ie+0x14/0x30
  Call Trace:
   sm_metadata_count_is_more_than_one+0xb9/0xe0
   dm_tm_shadow_block+0x52/0x1c0
   shadow_step+0x59/0xf0
   remove_raw+0xb2/0x170
   dm_btree_remove+0xf4/0x1c0
   dm_pool_delete_thin_device+0xc3/0x140
   pool_message+0x218/0x2b0
   target_message+0x251/0x290
   ctl_ioctl+0x1c4/0x4d0
   dm_ctl_ioctl+0xe/0x20
   __x64_sys_ioctl+0x7b/0xb0
   do_syscall_64+0x40/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixing it by only assign new_root when removal succeeds
Signed-off-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NLuo Meng <luomeng12@huawei.com>
Reviewed-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6555852b

scsi: libiscsi: Reset max/exp cmdsn during recovery · 07ae5ac2

由 Mike Christie 提交于 3年前

mainline inclusion
from mainline-v5.12-rc1
commit c8447e4c
category: bugfix
bugzilla: 107448
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c8447e4c2eb77dbb96012ae96e7c83179cecf880

-----------------------------------------------

If we lose the session then relogin, but the new cmdsn window has shrunk
(due to something like an admin changing a setting) we will have the old
exp/max_cmdsn values and will never be able to update them. For example,
max_cmdsn would be 64, but if on the target the user set the window to be
smaller then the target could try to return the max_cmdsn as 32. We will
see that new max_cmdsn in the rsp but because it's lower than the old
max_cmdsn when the window was larger we will not update it.

So this patch has us reset the window values during session cleanup so they
can be updated after a new login.

Link: https://lore.kernel.org/r/20210207044608.27585-8-michael.christie@oracle.comReviewed-by: NLee Duncan <lduncan@suse.com>
Signed-off-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

07ae5ac2

scsi: iscsi_tcp: Fix shost can_queue initialization · 73db803f

由 Mike Christie 提交于 3年前

mainline inclusion
from mainline-v5.12-rc1
commit 25c400db
category: bugfix
bugzilla: 107448
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=25c400db2083732a5fbdd72f0d3a0337119b2fa5

-----------------------------------------------

We are setting the shost's can_queue after we add the host which is too
late, because the SCSI midlayer will have allocated the tag set based on
the can_queue value at that time. This patch has us use the
iscsi_host_get_max_scsi_cmds() helper to figure out the number of SCSI
cmds.

It also fixes up the template can_queue so it reflects the max SCSI cmds we
can support like how other drivers work.

Link: https://lore.kernel.org/r/20210207044608.27585-7-michael.christie@oracle.comReviewed-by: NLee Duncan <lduncan@suse.com>
Signed-off-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

73db803f

scsi: libiscsi: Add helper to calculate max SCSI cmds per session · 809e429c

由 Mike Christie 提交于 3年前

mainline inclusion
from mainline-v5.12-rc1
commit b4046922
category: bugfix
bugzilla: 107448
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b4046922b3c0740ad50a6e9c59e12f4dc43946d4

-----------------------------------------------

This patch just breaks out the code that calculates the number of SCSI cmds
that will be used for a SCSI session. It also adds a check that we don't go
over the host's can_queue value.

Link: https://lore.kernel.org/r/20210207044608.27585-6-michael.christie@oracle.comReviewed-by: NLee Duncan <lduncan@suse.com>
Signed-off-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

809e429c

scsi: libiscsi: Fix iSCSI host workq destruction · 020462b1

由 Mike Christie 提交于 3年前

mainline inclusion
from mainline-v5.12-rc1
commit c435f0a9
category: bugfix
bugzilla: 107448
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c435f0a9ecb7435e70f447b7231ca52de589b252

-----------------------------------------------

We allocate the iSCSI host workq in iscsi_host_alloc() so iscsi_host_free()
should do the destruction. Drivers can then do their error/goto handling
and call iscsi_host_free() to clean up what has been allocated in
iscsi_host_alloc().

Link: https://lore.kernel.org/r/20210207044608.27585-5-michael.christie@oracle.comReviewed-by: NLee Duncan <lduncan@suse.com>
Signed-off-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

020462b1

scsi: libiscsi: Fix iscsi_task use after free() · b53d1beb

由 Mike Christie 提交于 3年前

mainline inclusion
from mainline-v5.12-rc1
commit 14936b1e
category: bugfix
bugzilla: 107448
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=14936b1ed249916c28642d0db47a51b085ce13b4

-----------------------------------------------

The following bug was reported and debugged by wubo40@huawei.com:

When testing kernel 4.18 version, NULL pointer dereference problem occurs
in iscsi_eh_cmd_timed_out() function.

I think this bug in the upstream is still exists.

The analysis reasons are as follows:

1) For some reason, I/O command did not complete within the timeout
   period. The block layer timer works, call scsi_times_out() to handle I/O
   timeout logic.  At the same time the command just completes.

2) scsi_times_out() call iscsi_eh_cmd_timed_out() to process timeout logic.
   Although there is an NULL judgment for the task, the task has not been
   released yet now.

3) iscsi_complete_task() calls __iscsi_put_task(). The task reference count
   reaches zero, the conditions for free task is met, then
   iscsi_free_task() frees the task, and sets sc->SCp.ptr = NULL. After
   iscsi_eh_cmd_timed_out() passes the task judgment check, there can still
   be NULL dereference scenarios.

   CPU0                                                CPU3

    |- scsi_times_out()                                 |-
iscsi_complete_task()
    |                                                   |
    |- iscsi_eh_cmd_timed_out()                         |-
__iscsi_put_task()
    |                                                   |
    |- task=sc->SCp.ptr, task is not NUL, check passed  |-
iscsi_free_task(task)
    |                                                   |
    |                                                   |-> sc->SCp.ptr
= NULL
    |                                                   |
    |- task is NULL now, NULL pointer dereference       |
    |                                                   |
   \|/                                                 \|/

Calltrace:
[380751.840862] BUG: unable to handle kernel NULL pointer dereference at
0000000000000138
[380751.843709] PGD 0 P4D 0
[380751.844770] Oops: 0000 [#1] SMP PTI
[380751.846283] CPU: 0 PID: 403 Comm: kworker/0:1H Kdump: loaded
Tainted: G
[380751.851467] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
[380751.856521] Workqueue: kblockd blk_mq_timeout_work
[380751.858527] RIP: 0010:iscsi_eh_cmd_timed_out+0x15e/0x2e0 [libiscsi]
[380751.861129] Code: 83 ea 01 48 8d 74 d0 08 48 8b 10 48 8b 4a 50 48 85
c9 74 2c 48 39 d5 74
[380751.868811] RSP: 0018:ffffc1e280a5fd58 EFLAGS: 00010246
[380751.870978] RAX: ffff9fd1e84e15e0 RBX: ffff9fd1e84e6dd0 RCX:
0000000116acc580
[380751.873791] RDX: ffff9fd1f97a9400 RSI: ffff9fd1e84e1800 RDI:
ffff9fd1e4d6d420
[380751.876059] RBP: ffff9fd1e4d49000 R08: 0000000116acc580 R09:
0000000116acc580
[380751.878284] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff9fd1e6e931e8
[380751.880500] R13: ffff9fd1e84e6ee0 R14: 0000000000000010 R15:
0000000000000003
[380751.882687] FS:  0000000000000000(0000) GS:ffff9fd1fac00000(0000)
knlGS:0000000000000000
[380751.885236] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[380751.887059] CR2: 0000000000000138 CR3: 000000011860a001 CR4:
00000000003606f0
[380751.889308] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[380751.891523] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[380751.893738] Call Trace:
[380751.894639]  scsi_times_out+0x60/0x1c0
[380751.895861]  blk_mq_check_expired+0x144/0x200
[380751.897302]  ? __switch_to_asm+0x35/0x70
[380751.898551]  blk_mq_queue_tag_busy_iter+0x195/0x2e0
[380751.900091]  ? __blk_mq_requeue_request+0x100/0x100
[380751.901611]  ? __switch_to_asm+0x41/0x70
[380751.902853]  ? __blk_mq_requeue_request+0x100/0x100
[380751.904398]  blk_mq_timeout_work+0x54/0x130
[380751.905740]  process_one_work+0x195/0x390
[380751.907228]  worker_thread+0x30/0x390
[380751.908713]  ? process_one_work+0x390/0x390
[380751.910350]  kthread+0x10d/0x130
[380751.911470]  ? kthread_flush_work_fn+0x10/0x10
[380751.913007]  ret_from_fork+0x35/0x40

crash> dis -l iscsi_eh_cmd_timed_out+0x15e
xxxxx/drivers/scsi/libiscsi.c: 2062

1970 enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd
*sc)
{
...
1984         spin_lock_bh(&session->frwd_lock);
1985         task = (struct iscsi_task *)sc->SCp.ptr;
1986         if (!task) {
1987                 /*
1988                  * Raced with completion. Blk layer has taken
ownership
1989                  * so let timeout code complete it now.
1990                  */
1991                 rc = BLK_EH_DONE;
1992                 goto done;
1993         }

...

2052         for (i = 0; i < conn->session->cmds_max; i++) {
2053                 running_task = conn->session->cmds[i];
2054                 if (!running_task->sc || running_task == task ||
2055                      running_task->state != ISCSI_TASK_RUNNING)
2056                         continue;
2057
2058                 /*
2059                  * Only check if cmds started before this one have
made
2060                  * progress, or this could never fail
2061                  */
2062                 if (time_after(running_task->sc->jiffies_at_alloc,
2063                                task->sc->jiffies_at_alloc))    <---
2064                         continue;
2065
...
}

carsh> struct scsi_cmnd ffff9fd1e6e931e8
struct scsi_cmnd {
  ...
  SCp = {
    ptr = 0x0,   <--- iscsi_task
    this_residual = 0,
    ...
  },
}

To prevent this, we take a ref to the cmd under the back (completion) lock
so if the completion side were to call iscsi_complete_task() on the task
while the timer/eh paths are not holding the back_lock it will not be freed
from under us.

Note that this requires the previous patch, "scsi: libiscsi: Drop
taskqueuelock" because bnx2i sleeps in its cleanup_task callout if the cmd
is aborted. If the EH/timer and completion path are racing we don't know
which path will do the last put. The previous patch moved the operations we
needed to do under the forward lock to cleanup_queued_task.  Once that has
run we can drop the forward lock for the cmd and bnx2i no longer has to
worry about if the EH, timer or completion path did the ast put and if the
forward lock is held or not since it won't be.

Link: https://lore.kernel.org/r/20210207044608.27585-4-michael.christie@oracle.comReported-by: NWu Bo <wubo40@huawei.com>
Reviewed-by: NLee Duncan <lduncan@suse.com>
Signed-off-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b53d1beb

scsi: libiscsi: Drop taskqueuelock · 41b8b943

由 Mike Christie 提交于 3年前

mainline inclusion
from mainline-v5.12-rc1
commit 5923d64b
category: bugfix
bugzilla: 107448
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5923d64b7ab63dcc6f0df946098f50902f9540d1

-----------------------------------------------

The purpose of the taskqueuelock was to handle the issue where a bad target
decides to send a R2T and before its data has been sent decides to send a
cmd response to complete the cmd. The following patches fix up the
frwd/back locks so they are taken from the queue/xmit (frwd) and completion
(back) paths again. To get there this patch removes the taskqueuelock which
for iSCSI xmit wq based drivers was taken in the queue, xmit and completion
paths.

Instead of the lock, we just make sure we have a ref to the task when we
queue a R2T, and then we always remove the task from the requeue list in
the xmit path or the forced cleanup paths.

Link: https://lore.kernel.org/r/20210207044608.27585-3-michael.christie@oracle.comReviewed-by: NLee Duncan <lduncan@suse.com>
Signed-off-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

41b8b943

ext4: stop return ENOSPC from ext4_issue_zeroout · 8119d09e

由 yangerkun 提交于 3年前

hulk inclusion
category: bugfix
bugzilla: 167373
CVE: NA

---------------------------

Our testcase(briefly described as fsstress on dm thin-provisioning which
ext4 see volume size with 100G but actual size 10G) trigger a hungtask
bug since ext4_writepages fall into a infinite loop:

static int ext4_writepages(xxx)
{
    ...
   while (!done && mpd.first_page <= mpd.last_page) {
       ...
       ret = mpage_prepare_extent_to_map(&mpd);
       if (!ret) {
           ...
           ret = mpage_map_and_submit_extent(handle,
&mpd,&give_up_on_write);
           <----- will return -ENOSPC
           ...
       }
       ...
       if (ret == -ENOSPC && sbi->s_journal) {
           <------ we cannot break since we will get ENOSPC forever
           jbd2_journal_force_commit_nested(sbi->s_journal);
           ret = 0;
           continue;
       }
       ...
   }
}

Got ENOSPC with follow stack:
...
ext4_ext_map_blocks
  ext4_ext_convert_to_initialized
    ext4_ext_zeroout
      ext4_issue_zeroout
        ...
        submit_bio_wait <-- bio to thinpool will return ENOSPC

Actually the ENOSPC from thin-provisioning means that a EIO from block
device. We need convert the err as EIO to stop confuse ext4.
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8119d09e

scsi: sd: Call sd_revalidate_disk() for ioctl(BLKRRPART) · 83327791

由 Christoph Hellwig 提交于 3年前

mainline inclusion
from mainline-5.13
commit d1b7f920
category: bugfix
bugzilla: 167359
CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d1b7f92035c6fb42529ada531e2cbf3534544c82

---------------------------

While the disk state has nothing to do with partitions, BLKRRPART is used
to force a full revalidate after things like a disk format for historical
reasons. Restore that behavior.

Link: https://lore.kernel.org/r/20210617115504.1732350-1-hch@lst.de
Fixes: 471bd0af ("sd: use bdev_check_media_change")
Reported-by: NXiang Chen <chenxiang66@hisilicon.com>
Tested-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

83327791

powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst and add 64bit part · ac6d1d39

由 Jason Yan 提交于 3年前

maillist inclusion
category: feature
feature: PowerPC64 kaslr support
bugzilla: 109306
CVE: NA

Reference: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200330022023.3691-7-yanaijie@huawei.com/

-------------------------------------------------

Now we support both 32 and 64 bit KASLR for fsl booke. Add document for
64 bit part and rename kaslr-booke32.rst to kaslr-booke.rst.
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Cc: Scott Wood <oss@buserror.net>
Cc: Diana Craciun <diana.craciun@nxp.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NCui GaoSheng <cuigaosheng1@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ac6d1d39

powerpc/fsl_booke/64: clear the original kernel if randomized · 4d82e78e

由 Jason Yan 提交于 3年前

maillist inclusion
category: feature
feature: PowerPC64 kaslr support
bugzilla: 109306
CVE: NA

Reference: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200330022023.3691-6-yanaijie@huawei.com/

-------------------------------------------------

The original kernel still exists in the memory, clear it now.
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Cc: Scott Wood <oss@buserror.net>
Cc: Diana Craciun <diana.craciun@nxp.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NCui GaoSheng <cuigaosheng1@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4d82e78e

powerpc/fsl_booke/64: do not clear the BSS for the second pass · 26fde390

由 Jason Yan 提交于 3年前

maillist inclusion
category: feature
feature: PowerPC64 kaslr support
bugzilla: 109306
CVE: NA

Reference: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200306064033.3398-5-yanaijie@huawei.com/

-------------------------------------------------

The BSS section has already cleared out in the first pass. No need to
clear it again. This can save some time when booting with KASLR
enabled.
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Cc: Scott Wood <oss@buserror.net>
Cc: Diana Craciun <diana.craciun@nxp.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NCui GaoSheng <cuigaosheng1@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

26fde390

powerpc/fsl_booke/64: implement KASLR for fsl_booke64 · c2c939a1

由 Jason Yan 提交于 3年前

maillist inclusion
category: feature
feature: PowerPC64 kaslr support
bugzilla: 109306
CVE: NA

Reference: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200330022023.3691-4-yanaijie@huawei.com/

-------------------------------------------------

The implementation for Freescale BookE64 is similar as BookE32. One
difference is that Freescale BookE64 set up a TLB mapping of 1G during
booting. Another difference is that ppc64 needs the kernel to be
64K-aligned. So we can randomize the kernel in this 1G mapping and make
it 64K-aligned. This can save some code to creat another TLB map at
early boot. The disadvantage is that we only have about 1G/64K = 16384
slots to put the kernel in.

To support secondary cpu boot up, a variable __kaslr_offset was added in
first_256B section. This can help secondary cpu get the kaslr offset
before the 1:1 mapping has been setup.
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Cc: Scott Wood <oss@buserror.net>
Cc: Diana Craciun <diana.craciun@nxp.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NCui GaoSheng <cuigaosheng1@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c2c939a1

powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper · d357bda8

由 Jason Yan 提交于 3年前

maillist inclusion
category: feature
feature: PowerPC64 kaslr support
bugzilla: 109306
CVE: NA

Reference: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200330022023.3691-3-yanaijie@huawei.com/

-------------------------------------------------

Like the 32bit code, we introduce reloc_kernel_entry() helper to prepare
for the KASLR 64bit version. And move the C declaration of this function
out of CONFIG_PPC32 and use long instead of int for the parameter 'addr'.
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Cc: Scott Wood <oss@buserror.net>
Cc: Diana Craciun <diana.craciun@nxp.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Reviewed-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NCui GaoSheng <cuigaosheng1@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d357bda8

powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and kaslr_early_init() · d1e9c99a

由 Jason Yan 提交于 3年前

maillist inclusion
category: feature
feature: PowerPC64 kaslr support
bugzilla: 109306
CVE: NA

Reference: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200330022023.3691-2-yanaijie@huawei.com/

-------------------------------------------------

Some code refactor in kaslr_legal_offset() and kaslr_early_init(). No
functional change. This is a preparation for KASLR fsl_booke64.
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Cc: Scott Wood <oss@buserror.net>
Cc: Diana Craciun <diana.craciun@nxp.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NCui GaoSheng <cuigaosheng1@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d1e9c99a

arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required · 255cd474

由 Catalin Marinas 提交于 3年前

mainline inclusion
from mainline-5.11-rc1
commit 2687275a
category: bugfix
bugzilla: 115452
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2687275a5843d1089687f08fc64eb3f3b026a169

---------------------------

mem_init() currently relies on knowing the boundaries of the crashkernel
reservation to map such region with page granularity for later
unmapping via set_memory_valid(..., 0). If the crashkernel reservation
is deferred, such boundaries are not known when the linear mapping is
created. Simply parse the command line for "crashkernel" and, if found,
create the linear map with NO_BLOCK_MAPPINGS.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Tested-by: NNicolas Saenz Julienne <nsaenzjulienne@suse.de>
Reviewed-by: NNicolas Saenz Julienne <nsaenzjulienne@suse.de>
Acked-by: NJames Morse <james.morse@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Link: https://lore.kernel.org/r/20201119175556.18681-1-catalin.marinas@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

Conflicts:
	arch/arm64/mm/mmu.c
Signed-off-by: NLi Huafei <lihuafei1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

255cd474

exec: Move unshare_files to fix posix file locking during exec · cb3a25fd

由 Eric W. Biederman 提交于 3年前

mainline inclusion
from mainline-5.11-rc1
commit 	b6043501
category: bugfix
bugzilla: 108432
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b6043501289ebf169ae19b810a882d517377302f

-------------------------------------------------

Many moons ago the binfmts were doing some very questionable things
with file descriptors and an unsharing of the file descriptor table
was added to make things better[1][2].  The helper steal_lockss was
added to avoid breaking the userspace programs[3][4][6].

Unfortunately it turned out that steal_locks did not work for network
file systems[5], so it was removed to see if anyone would
complain[7][8].  It was thought at the time that NPTL would not be
affected as the unshare_files happened after the other threads were
killed[8].  Unfortunately because there was an unshare_files in
binfmt_elf.c before the threads were killed this analysis was
incorrect.

This unshare_files in binfmt_elf.c resulted in the unshares_files
happening whenever threads were present.  Which led to unshare_files
being moved to the start of do_execve[9].

Later the problems were rediscovered and the suggested approach was to
readd steal_locks under a different name[10].  I happened to be
reviewing patches and I noticed that this approach was a step
backwards[11].

I proposed simply moving unshare_files[12] and it was pointed
out that moving unshare_files without auditing the code was
also unsafe[13].

There were then several attempts to solve this[14][15][16] and I even
posted this set of changes[17].  Unfortunately because auditing all of
execve is time consuming this change did not make it in at the time.

Well now that I am cleaning up exec I have made the time to read
through all of the binfmts and the only playing with file descriptors
is either the security modules closing them in
security_bprm_committing_creds or is in the generic code in fs/exec.c.
None of it happens before begin_new_exec is called.

So move unshare_files into begin_new_exec, after the point of no
return.  If memory is very very very low and the application calling
exec is sharing file descriptor tables between processes we might fail
past the point of no return.  Which is unfortunate but no different
than any of the other places where we allocate memory after the point
of no return.

This movement allows another process that shares the file table, or
another thread of the same process and that closes files or changes
their close on exec behavior and races with execve to cause some
unexpected things to happen.  There is only one time of check to time
of use race and it is just there so that execve fails instead of
an interpreter failing when it tries to open the file it is supposed
to be interpreting.   Failing later if userspace is being silly is
not a problem.

With this change it the following discription from the removal
of steal_locks[8] finally becomes true.

    Apps using NPTL are not affected, since all other threads are killed before
    execve.

    Apps using LinuxThreads are only affected if they

      - have multiple threads during exec (LinuxThreads doesn't kill other
        threads, the app may do it with pthread_kill_other_threads_np())
      - rely on POSIX locks being inherited across exec

    Both conditions are documented, but not their interaction.

    Apps using clone() natively are affected if they

      - use clone(CLONE_FILES)
      - rely on POSIX locks being inherited across exec

I have investigated some paths to make it possible to solve this
without moving unshare_files but they all look more complicated[18].
Reported-by: NDaniel P. Berrangé <berrange@redhat.com>
Reported-by: NJeff Layton <jlayton@redhat.com>
History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
[1] 02cda956de0b ("[PATCH] unshare_files"
[2] 04e9bcb4d106 ("[PATCH] use new unshare_files helper")
[3] 088f5d7244de ("[PATCH] add steal_locks helper")
[4] 02c541ec8ffa ("[PATCH] use new steal_locks helper")
[5] https://lkml.kernel.org/r/E1FLIlF-0007zR-00@dorka.pomaz.szeredi.hu
[6] https://lkml.kernel.org/r/0060321191605.GB15997@sorel.sous-sol.org
[7] https://lkml.kernel.org/r/E1FLwjC-0000kJ-00@dorka.pomaz.szeredi.hu
[8] c89681ed ("[PATCH] remove steal_locks()")
[9] fd8328be ("[PATCH] sanitize handling of shared descriptor tables in failing execve()")
[10] https://lkml.kernel.org/r/20180317142520.30520-1-jlayton@kernel.org
[11] https://lkml.kernel.org/r/87r2nwqk73.fsf@xmission.com
[12] https://lkml.kernel.org/r/87bmfgvg8w.fsf@xmission.com
[13] https://lkml.kernel.org/r/20180322111424.GE30522@ZenIV.linux.org.uk
[14] https://lkml.kernel.org/r/20180827174722.3723-1-jlayton@kernel.org
[15] https://lkml.kernel.org/r/20180830172423.21964-1-jlayton@kernel.org
[16] https://lkml.kernel.org/r/20180914105310.6454-1-jlayton@kernel.org
[17] https://lkml.kernel.org/r/87a7ohs5ow.fsf@xmission.com
[18] https://lkml.kernel.org/r/87pn8c1uj6.fsf_-_@x220.int.ebiederm.orgAcked-by: NChristian Brauner <christian.brauner@ubuntu.com>
v1: https://lkml.kernel.org/r/20200817220425.9389-1-ebiederm@xmission.com
Link: https://lkml.kernel.org/r/20201120231441.29911-1-ebiederm@xmission.comSigned-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cb3a25fd

exec: Don't open code get_close_on_exec · ae75d1e3

由 Eric W. Biederman 提交于 3年前

mainline inclusion
from mainline-5.11-rc1
commit 	878f12db
category: bugfix
bugzilla: 108432
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=878f12dbb8f514799d126544d59be4d2675caac3

-------------------------------------------------

Al Viro pointed out that using the phrase "close_on_exec(fd,
rcu_dereference_raw(current->files->fdt))" instead of wrapping it in
rcu_read_lock(), rcu_read_unlock() is a very questionable
optimization[1].

Once wrapped with rcu_read_lock()/rcu_read_unlock() that phrase
becomes equivalent the helper function get_close_on_exec so
simplify the code and make it more robust by simply using
get_close_on_exec.

[1] https://lkml.kernel.org/r/20201207222214.GA4115853@ZenIV.linux.org.ukSuggested-by: NAl Viro <viro@ftp.linux.org.uk>
Link: https://lkml.kernel.org/r/87k0tqr6zi.fsf_-_@x220.int.ebiederm.orgSigned-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ae75d1e3

ARM: mm: Fix PXN process with LPAE feature · 5ee71dc0

由 Kefeng Wang 提交于 3年前

hulk inclusion
category: bugfix
bugzilla: 167379
CVE: NA

Reference: https://lore.kernel.org/linux-arm-kernel/20210610123556.171328-1-wangkefeng.wang@huawei.com/

-------------------------------------------------

When user code execution with privilege mode, it will lead to
infinite loop in the page fault handler if ARM_LPAE enabled,

The issue could be reproduced with
  "echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT"

As Permission fault shows in ARM spec,
  IFSR format when using the Short-descriptor translation table format
    Permission fault:       01101 First level      01111 Second level
  IFSR format when using the Long-descriptor translation table format
    Permission fault:       0011LL LL bits indicate levelb.

Add is_permission_fault() function to check permission fault and die
if permission fault occurred under instruction fault in do_page_fault().

Fixes: 1d4d3715 ("ARM: 8235/1: Support for the PXN CPU feature on ARMv7")
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5ee71dc0

ARM: mm: Provide die_kernel_fault() helper · aa6bc375

由 Kefeng Wang 提交于 3年前

hulk inclusion
category: bugfix
bugzilla: 167379
CVE: NA

Reference: https://lore.kernel.org/linux-arm-kernel/20210610123556.171328-1-wangkefeng.wang@huawei.com/

-------------------------------------------------

Provide die_kernel_fault() helper to do the kernel fault reporting,
which with msg argument, it could report different message in different
scenes, and the later patch "ARM: mm: Fix PXN process with LPAE feature"
will use it.
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

aa6bc375

ARM: mm: Kill page table base print in show_pte() · 6fb8f8ec

由 Kefeng Wang 提交于 3年前

hulk inclusion
category: bugfix
bugzilla: 167379
CVE: NA

Reference: https://lore.kernel.org/linux-arm-kernel/20210610123556.171328-1-wangkefeng.wang@huawei.com/

-------------------------------------------------

Now the show_pts() will dump the virtual (hashed) address of page
table base, it is useless, kill it.
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6fb8f8ec

ARM: mm: Cleanup access_error() · ff099425

由 Kefeng Wang 提交于 3年前

hulk inclusion
category: bugfix
bugzilla: 167379
CVE: NA

Reference: https://lore.kernel.org/linux-arm-kernel/20210610123556.171328-1-wangkefeng.wang@huawei.com/

-------------------------------------------------

Now the write fault check in do_page_fault() and access_error() twice,
we can cleanup access_error(), and make the fault check and vma flags set
into do_page_fault() directly, then pass the vma flags to __do_page_fault.

No functional change.
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ff099425

ARM: mm: Kill task_struct argument for __do_page_fault() · 50f7f12b

由 Kefeng Wang 提交于 3年前

hulk inclusion
category: bugfix
bugzilla: 167379
CVE: NA

Reference: https://lore.kernel.org/linux-arm-kernel/20210610123556.171328-1-wangkefeng.wang@huawei.com/

-------------------------------------------------

The __do_page_fault() won't use task_struct argument, kill it
and also use current->mm directly in do_page_fault().

No functional change.
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

50f7f12b

ARM: mm: Rafactor the __do_page_fault() · 536ae3c5

由 Kefeng Wang 提交于 3年前

hulk inclusion
category: bugfix
bugzilla: 167379
CVE: NA

Reference: https://lore.kernel.org/linux-arm-kernel/20210610123556.171328-1-wangkefeng.wang@huawei.com/

-------------------------------------------------

Clean up the multiple goto statements and drops local variable
vm_fault_t fault, which will make the __do_page_fault() much
more readability.

No functional change.
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

536ae3c5

fanotify: fix copy_event_to_user() fid error clean up · a2ce8bd3

由 Matthew Bobrowski 提交于 3年前

mainline inclusion
from mainline-5.14
commit	f644bc44
category: bugfix
bugzilla: 110075
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f644bc449b37cc32d3ce7b36a88073873aa21bd5

-------------------------------------------------

Ensure that clean up is performed on the allocated file descriptor and
struct file object in the event that an error is encountered while copying
fid info objects. Currently, we return directly to the caller when an error
is experienced in the fid info copying helper, which isn't ideal given that
the listener process could be left with a dangling file descriptor in their
fdtable.

Fixes: 5e469c83 ("fanotify: copy event fid info to user")
Fixes: 44d705b0 ("fanotify: report name info for FAN_DIR_MODIFY event")
Link: https://lore.kernel.org/linux-fsdevel/YMKv1U7tNPK955ho@google.com/T/#m15361cd6399dad4396aad650de25dbf6b312288e
Link: https://lore.kernel.org/r/1ef8ae9100101eb1a91763c516c2e9a3a3b112bd.1623376346.git.repnop@google.comSigned-off-by: NMatthew Bobrowski <repnop@google.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a2ce8bd3

block: fix inflight statistics of part0 · 377f3132

由 Jeffle Xu 提交于 3年前

mainline inclusion
from mainline-5.11-rc1
commit b0d97557
category: bugfix
bugzilla: 108592
CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0d97557ebfc9d5ba5f2939339a9fdd267abafeb

---------------------------

The inflight of partition 0 doesn't include inflight IOs to all
sub-partitions, since currently mq calculates inflight of specific
partition by simply camparing the value of the partition pointer.

Thus the following case is possible:

$ cat /sys/block/vda/inflight
       0        0
$ cat /sys/block/vda/vda1/inflight
       0      128

While single queue device (on a previous version, e.g. v3.10) has no
this issue:

$cat /sys/block/sda/sda3/inflight
       0       33
$cat /sys/block/sda/inflight
       0       33

Partition 0 should be specially handled since it represents the whole
disk. This issue is introduced since commit bf0ddaba ("blk-mq: fix
sysfs inflight counter").

Besides, this patch can also fix the inflight statistics of part 0 in
/proc/diskstats. Before this patch, the inflight statistics of part 0
doesn't include that of sub partitions. (I have marked the 'inflight'
field with asterisk.)

$cat /proc/diskstats
 259       0 nvme0n1 45974469 0 367814768 6445794 1 0 1 0 *0* 111062 6445794 0 0 0 0 0 0
 259       2 nvme0n1p1 45974058 0 367797952 6445727 0 0 0 0 *33* 111001 6445727 0 0 0 0 0 0

This is introduced since commit f299b7c7 ("blk-mq: provide internal
in-flight variant").

Fixes: bf0ddaba ("blk-mq: fix sysfs inflight counter")
Fixes: f299b7c7 ("blk-mq: provide internal in-flight variant")
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
[axboe: adapt for 5.11 partition change]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

Conflicts:
	block/blk-mq.c
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

377f3132

debugfs: fix security_locked_down() call for SELinux · c2180fdf

由 Ondrej Mosnacek 提交于 3年前

mainline inclusion
from mainline-5.13-rc4
commit	5881fa8d
category: bugfix
bugzilla: 78400
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5881fa8dc2de9697a89451f6518e8b3a796c09c6

-------------------------------------------------

When (ia->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)) is zero, then
the SELinux implementation of the locked_down hook might report a denial
even though the operation would actually be allowed.

To fix this, make sure that security_locked_down() is called only when
the return value will be taken into account (i.e. when changing one of
the problematic attributes).

Note: this was introduced by commit 5496197f ("debugfs: Restrict
debugfs when the kernel is locked down"), but it didn't matter at that
time, as the SELinux support came in later.

Fixes: 59438b46 ("security,lockdown,selinux: implement SELinux lockdown")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NOndrej Mosnacek <omosnace@redhat.com>
Link: https://lore.kernel.org/r/20210507125304.144394-1-omosnace@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

Conflicts:
        fs/debugfs/inode.c
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c2180fdf

vti6: fix ipv4 pmtu check to honor ip header df · f747fc0e

由 Eyal Birger 提交于 3年前

mainline inclusion
from mainline-5.12-rc7
commit 4c382558
category: bugfix
bugzilla: 107190
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4c38255892c06b9de2fb3bf6aea63f4ebdff3d11

-------------------------------------------------

Frag needed should only be sent if the header enables DF.

This fix allows IPv4 packets larger than MTU to pass the vti6 interface
and be fragmented after encapsulation, aligning behavior with
non-vti6 xfrm.

Fixes: ccd740cb ("vti6: Add pmtu handling to vti6_xmit.")
Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

Conflicts:
	net/ipv6/ip6_vti.c
Signed-off-by: NZiyang Xuan <xuanziyang2@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f747fc0e

vti: fix ipv4 pmtu check to honor ip header df · 55935973

由 Eyal Birger 提交于 3年前

mainline inclusion
from mainline-5.12-rc7
commit c7c1abfd
category: bugfix
bugzilla: 107191
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c7c1abfd6d42be8f09d390ab912cd84983000fa2

-------------------------------------------------

Frag needed should only be sent if the header enables DF.

This fix allows packets larger than MTU to pass the vti interface
and be fragmented after encapsulation, aligning behavior with
non-vti xfrm.

Fixes: d6af1a31 ("vti: Add pmtu handling to vti_xmit.")
Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

Conflicts:
	net/ipv4/ip_vti.c
Signed-off-by: NZiyang Xuan <xuanziyang2@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

55935973

alinux: random: speed up the initialization of module · f153e320

由 Xingjun Liu 提交于 3年前

maillist inclusion
category: performance
bugzilla: 109294
CVE: NA

Reference: https://gitee.com/openeuler/kernel/commit/ae624897ce3846524a3b0d3e525d8f8a8f80f326

---------------------------

alinux: random: speed up the initialization of module

During the module initialization phase, entropy will be added
to entropy pool for every interrupt, the change should speed up
initialization of the random module.

Before optimization:
[   22.180236] random: crng init done

After optimization:
[    1.474832] random: crng init done
Signed-off-by: NXingjun Liu <xingjun.lxj@alibaba-inc.com>
Reviewed-by: NLiu Jiang <gerry@linux.alibaba.com>
Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Jia Zhang <zhang.jia@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NChen Jialong <chenjialong@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NZiyuan Hu <huziyuan@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f153e320

mm: set the sleep_mapped to true for zbud and z3fold · c83d856c

由 Tian Tao 提交于 3年前

mainline inclusion
from mainline-5.12-rc1
commit e818e820
category: bugfix
bugzilla: 107221
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e818e820c6a0e819d239264fc863531bbcd72c30

----------------------------------------------------------------------

zpool driver adds a flag to indicate whether the zpool driver can enter an
atomic context after mapping.  This patch sets it true for z3fold and
zbud.

Link: https://lkml.kernel.org/r/1611035683-12732-3-git-send-email-tiantao6@hisilicon.comSigned-off-by: NTian Tao <tiantao6@hisilicon.com>
Reviewed-by: NVitaly Wool <vitaly.wool@konsulko.com>
Acked-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: NMike Galbraith <efault@gmx.de>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c83d856c

mm/zswap: add the flag can_sleep_mapped · 1a1d0a85

由 Tian Tao 提交于 3年前

mainline inclusion
from mainline-5.12-rc1
commit fc6697a8
category: bugfix
bugzilla: 107205
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fc6697a89f56d9773b2fbff718d4cf2a6d63379d

-------------------------------------------------

Patch series "Fix the compatibility of zsmalloc and zswap".

Patch #1 adds a flag to zpool, then zswap used to determine if zpool
drivers such as zbud/z3fold/zsmalloc will enter an atomic context after
mapping.

The difference between zbud/z3fold and zsmalloc is that zsmalloc requires
an atomic context that since its map function holds a preempt-disabled,
but zbud/z3fold don't require an atomic context.  So patch #2 sets flag
sleep_mapped to true indicating that zbud/z3fold can sleep after mapping.
zsmalloc didn't support sleep after mapping, so don't set that flag to
true.

This patch (of 2):

Add a flag to zpool, named is "can_sleep_mapped", and have it set true for
zbud/z3fold, not set this flag for zsmalloc, so its default value is
false.  Then zswap could go the current path if the flag is true; and if
it's false, copy data from src to a temporary buffer, then unmap the
handle, take the mutex, process the buffer instead of src to avoid
sleeping function called from atomic context.

[natechancellor@gmail.com: add return value in zswap_frontswap_load]
  Link: https://lkml.kernel.org/r/20210121214804.926843-1-natechancellor@gmail.com
[tiantao6@hisilicon.com: fix potential memory leak]
  Link: https://lkml.kernel.org/r/1611538365-51811-1-git-send-email-tiantao6@hisilicon.com
[colin.king@canonical.com: fix potential uninitialized pointer read on tmp]
  Link: https://lkml.kernel.org/r/20210128141728.639030-1-colin.king@canonical.com
[tiantao6@hisilicon.com: fix variable 'entry' is uninitialized when used]
  Link: https://lkml.kernel.org/r/1611223030-58346-1-git-send-email-tiantao6@hisilicon.comLink: https://lkml.kernel.org/r/1611035683-12732-1-git-send-email-tiantao6@hisilicon.com

Link: https://lkml.kernel.org/r/1611035683-12732-2-git-send-email-tiantao6@hisilicon.comSigned-off-by: NTian Tao <tiantao6@hisilicon.com>
Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NVitaly Wool <vitaly.wool@konsulko.com>
Acked-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: NMike Galbraith <efault@gmx.de>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChen Huang <chenhuang5@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1a1d0a85

kasan: fix null pointer dereference in kasan_record_aux_stack · 59b53f7d

由 Walter Wu 提交于 3年前

mainline inclusion
from mainline-5.11-rc2
commit 13384f61
category: panic
bugzilla: 108166
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=13384f6125ad7ebdcc8914fe1e03ded48ce76581

---------------------------

Syzbot reported the following [1]:

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 2d993067 P4D 2d993067 PUD 19a3c067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP KASAN
  CPU: 1 PID: 3852 Comm: kworker/1:2 Not tainted 5.10.0-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Workqueue: events free_ipc
  RIP: 0010:kasan_record_aux_stack+0x77/0xb0

Add null checking slab object from kasan_get_alloc_meta() in order to
avoid null pointer dereference.

[1] https://syzkaller.appspot.com/x/log.txt?x=10a82a50d00000

Link: https://lkml.kernel.org/r/20201228080018.23041-1-walter-zh.wu@mediatek.comSigned-off-by: NWalter Wu <walter-zh.wu@mediatek.com>
Suggested-by: NDmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChengyang Fan <cy.fan@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

59b53f7d

bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper · a9c81664

由 Yonghong Song 提交于 3年前

mainline inclusion
from mainline-v5.13-rc1
commit b910eaaa
category: bugfix
bugzilla: 106537
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b910eaaaa4b89976ef02e5d6448f3f73dc671d91

-------------------------------------------------

Jiri Olsa reported a bug ([1]) in kernel where cgroup local
storage pointer may be NULL in bpf_get_local_storage() helper.
There are two issues uncovered by this bug:
  (1). kprobe or tracepoint prog incorrectly sets cgroup local storage
       before prog run,
  (2). due to change from preempt_disable to migrate_disable,
       preemption is possible and percpu storage might be overwritten
       by other tasks.

This issue (1) is fixed in [2]. This patch tried to address issue (2).
The following shows how things can go wrong:
  task 1:   bpf_cgroup_storage_set() for percpu local storage
         preemption happens
  task 2:   bpf_cgroup_storage_set() for percpu local storage
         preemption happens
  task 1:   run bpf program

task 1 will effectively use the percpu local storage setting by task 2
which will be either NULL or incorrect ones.

Instead of just one common local storage per cpu, this patch fixed
the issue by permitting 8 local storages per cpu and each local
storage is identified by a task_struct pointer. This way, we
allow at most 8 nested preemption between bpf_cgroup_storage_set()
and bpf_cgroup_storage_unset(). The percpu local storage slot
is released (calling bpf_cgroup_storage_unset()) by the same task
after bpf program finished running.
bpf_test_run() is also fixed to use the new bpf_cgroup_storage_set()
interface.

The patch is tested on top of [2] with reproducer in [1].
Without this patch, kernel will emit error in 2-3 minutes.
With this patch, after one hour, still no error.

 [1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T
 [2] https://lore.kernel.org/bpf/20210309185028.3763817-1-yhs@fb.comSigned-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NRoman Gushchin <guro@fb.com>
Link: https://lore.kernel.org/bpf/20210323055146.3334476-1-yhs@fb.com

Conflicts:
    include/linux/bpf.h
    net/bpf/test_run.c
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a9c81664

fib: Return the correct errno code · bde24837

由 Zheng Yongjun 提交于 3年前

stable inclusion
from stable-5.10.45
commit 808fcc1e707c21a2a6492c8bec65a7cc6eb8b94e
bugzilla: 109305
CVE: NA

--------------------------------

[ Upstream commit 59607863 ]

When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF.
Signed-off-by: NZheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bde24837

net: Return the correct errno code · 56e33cc2

由 Zheng Yongjun 提交于 3年前

stable inclusion
from stable-5.10.45
commit d8b2e3e17c33ab4874a7431d6c314c4939145160
bugzilla: 109305
CVE: NA

--------------------------------

[ Upstream commit 49251cd0 ]

When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF.
Signed-off-by: NZheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

56e33cc2

net/x25: Return the correct errno code · 747913db

由 Zheng Yongjun 提交于 3年前

stable inclusion
from stable-5.10.45
commit 04c1556bfc79734ae91af632aff2f754a501c36c
bugzilla: 109305
CVE: NA

--------------------------------

[ Upstream commit d7736958 ]

When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF.
Signed-off-by: NZheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

747913db

rtnetlink: Fix missing error code in rtnl_bridge_notify() · 30bc7758

由 Jiapeng Chong 提交于 3年前

stable inclusion
from stable-5.10.45
commit 0aa356950800e18a96c78633cadaf1d1c6c33d7d
bugzilla: 109305
CVE: NA

--------------------------------

[ Upstream commit a8db57c1 ]

The error code is missing in this code scenario, add the error code
'-EINVAL' to the return value 'err'.

Eliminate the follow smatch warning:

net/core/rtnetlink.c:4834 rtnl_bridge_notify() warn: missing error code
'err'.
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Signed-off-by: NJiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

30bc7758

drm/amd/amdgpu:save psp ring wptr to avoid attack · ddad2159

由 Victor Zhao 提交于 3年前

stable inclusion
from stable-5.10.45
commit 9250f97fd59416448299f923fba2c69c1a308a07
bugzilla: 109305
CVE: NA

--------------------------------

[ Upstream commit 2370eba9 ]

[Why]
When some tools performing psp mailbox attack, the readback value
of register can be a random value which may break psp.

[How]
Use a psp wptr cache machanism to aovid the change made by attack.

v2: unify change and add detailed reason
Signed-off-by: NVictor Zhao <Victor.Zhao@amd.com>
Signed-off-by: NJingwen Chen <Jingwen.Chen2@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ddad2159

drm/amd/display: Fix potential memory leak in DMUB hw_init · 6477d0c6

由 Roman Li 提交于 3年前

stable inclusion
from stable-5.10.45
commit 9e8c2af010463197315fa54a6c17e74988b5259c
bugzilla: 109305
CVE: NA

--------------------------------

[ Upstream commit c5699e2d ]

[Why]
On resume we perform DMUB hw_init which allocates memory:
dm_resume->dm_dmub_hw_init->dc_dmub_srv_create->kzalloc
That results in memory leak in suspend/resume scenarios.

[How]
Allocate memory for the DC wrapper to DMUB only if it was not
allocated before.
No need to reallocate it on suspend/resume.
Signed-off-by: NLang Yu <Lang.Yu@amd.com>
Signed-off-by: NRoman Li <roman.li@amd.com>
Reviewed-by: NNicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: NQingqing Zhuo <qingqing.zhuo@amd.com>
Tested-by: NDaniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6477d0c6

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功