提交 · ck-release-19 · openanolis / cloud-kernel

18 5月, 2020 4 次提交

configs/x86: add some NET_EMATCH options as module · 1d6103ae

由 Dust Li 提交于 5月 18, 2020

to #27793353

The following configs are set to 'm' to make x86 the same
as aarch64.
  CONFIG_NET_EMATCH_CMP=m
  CONFIG_NET_EMATCH_NBYTE=m
  CONFIG_NET_EMATCH_U32=m
  CONFIG_NET_EMATCH_META=m
  CONFIG_NET_EMATCH_TEXT=m

Some commonly used cases need CONFIG_NET_EMATCH_CMP,
for example:

tc qdisc add dev eth0 root handle 1: prio bands 4
tc qdisc add dev eth0 parent 1:4 handle 40: netem delay 20ms 2ms
tc filter add dev eth0 parent 1: protocol ip prio 4 basic match
			       "cmp(u16 at 2 layer transport eq 3306)
                            and cmp(u8 at 16 layer network eq 10)
                            and cmp(u8 at 17 layer network eq 0)
                            and cmp(u8 at 18 layer network eq 200)
                            and cmp(u8 at 19 layer network eq 45)" flowid 1:4
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

1d6103ae

configs/x86: align x86 NET_SCH configs to aarch64 · 0093b9dd

由 Dust Li 提交于 5月 18, 2020

to #27778669

Aligned NET_SCHED configs with aarch64, except:

1. CONFIG_NET_SCH_ATM is not enabled since we don't use
   ATM on cloud, and CONFIG_ATM is not enabled
2. CONFIG_NET_SCH_DEFAULT is not set since we still use
   pfifo_fast as the default scheduler.
3. CONFIG_NET_SCH_FQ_CODEL set to 'm' since we don't use
   fq_codel as the default qdisc
Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
Reviewed-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

0093b9dd

alinux: sched: make SCHED_SLI dependent on FAIR_GROUP_SCHED · 1e0cec0b

由 Yihao Wu 提交于 5月 09, 2020

fix #27497611

sched SLI feature relies heavily on CFS group scheduling. So we add
"depends on FAIR_GROUP_SCHED" in Kconfig to avoid build issues where
FAIR_GROUP_SCHED is not turned on.
Suggested-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Acked-by: NShanpei Chen <shanpeic@linux.alibaba.com>

1e0cec0b

configs: aarch64: keep uniform configs between ARM and X86 · bc372163

由 Shile Zhang 提交于 5月 15, 2020

to #27182371

1. build mouse driver as module;
2. disable RT_GROUP_SCHED;
3. set HZ=250.
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Suggested-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

bc372163

15 5月, 2020 3 次提交

ipmi: fix hung processes in __get_guid() · f0845c9e

由 Wen Yang 提交于 4月 03, 2020

fix #27563995

commit 32830a0534700f86366f371b150b17f0f0d140d7 upstream.

The wait_event() function is used to detect command completion.
When send_guid_cmd() returns an error, smi_send() has not been
called to send data. Therefore, wait_event() should not be used
on the error path, otherwise it will cause the following warning:

[ 1361.588808] systemd-udevd   D    0  1501   1436 0x00000004
[ 1361.588813]  ffff883f4b1298c0 0000000000000000 ffff883f4b188000 ffff887f7e3d9f40
[ 1361.677952]  ffff887f64bd4280 ffffc90037297a68 ffffffff8173ca3b ffffc90000000010
[ 1361.767077]  00ffc90037297ad0 ffff887f7e3d9f40 0000000000000286 ffff883f4b188000
[ 1361.856199] Call Trace:
[ 1361.885578]  [<ffffffff8173ca3b>] ? __schedule+0x23b/0x780
[ 1361.951406]  [<ffffffff8173cfb6>] schedule+0x36/0x80
[ 1362.010979]  [<ffffffffa071f178>] get_guid+0x118/0x150 [ipmi_msghandler]
[ 1362.091281]  [<ffffffff810d5350>] ? prepare_to_wait_event+0x100/0x100
[ 1362.168533]  [<ffffffffa071f755>] ipmi_register_smi+0x405/0x940 [ipmi_msghandler]
[ 1362.258337]  [<ffffffffa0230ae9>] try_smi_init+0x529/0x950 [ipmi_si]
[ 1362.334521]  [<ffffffffa022f350>] ? std_irq_setup+0xd0/0xd0 [ipmi_si]
[ 1362.411701]  [<ffffffffa0232bd2>] init_ipmi_si+0x492/0x9e0 [ipmi_si]
[ 1362.487917]  [<ffffffffa0232740>] ? ipmi_pci_probe+0x280/0x280 [ipmi_si]
[ 1362.568219]  [<ffffffff810021a0>] do_one_initcall+0x50/0x180
[ 1362.636109]  [<ffffffff812231b2>] ? kmem_cache_alloc_trace+0x142/0x190
[ 1362.714330]  [<ffffffff811b2ae1>] do_init_module+0x5f/0x200
[ 1362.781208]  [<ffffffff81123ca8>] load_module+0x1898/0x1de0
[ 1362.848069]  [<ffffffff811202e0>] ? __symbol_put+0x60/0x60
[ 1362.913886]  [<ffffffff8130696b>] ? security_kernel_post_read_file+0x6b/0x80
[ 1362.998514]  [<ffffffff81124465>] SYSC_finit_module+0xe5/0x120
[ 1363.068463]  [<ffffffff81124465>] ? SYSC_finit_module+0xe5/0x120
[ 1363.140513]  [<ffffffff811244be>] SyS_finit_module+0xe/0x10
[ 1363.207364]  [<ffffffff81003c04>] do_syscall_64+0x74/0x180

Fixes: 50c812b2 ("[PATCH] ipmi: add full sysfs support")
Signed-off-by: NWen Yang <wenyang@linux.alibaba.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: openipmi-developer@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # 2.6.17-
Message-Id: <20200403090408.58745-1-wenyang@linux.alibaba.com>
Signed-off-by: NCorey Minyard <cminyard@mvista.com>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>

f0845c9e

configs: enable support for TCP_RT · 42c27c4f

由 xuanzhuo 提交于 5月 15, 2020

to #26353046
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Signed-off-by: Nxuanzhuo <xuanzhuo@linux.alibaba.com>

42c27c4f

alinux: add tcprt framework to kernel · f84e8fa0

由 xuanzhuo 提交于 5月 11, 2020

to #26353046

TcpRT: Instrument and Diagnostic Analysis System for Service Quality
of Cloud Databases at Massive Scale in Real-time.

It can also provide information for all request/response services. Such as
HTTP request.

This is the kernel framework for tcprt, more work needs tcprt module
support.

TcpRt module should call tcp_unregitsert_rt before rmmod.

TcpRt hooks will be called when sock init, recv data, send data,
packet acked and socket been destroy. The private data save to
icsk->icsk_tcp_rt_priv.
Reviewed-by: NCambda Zhu <cambda@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Signed-off-by: Nxuanzhuo <xuanzhuo@linux.alibaba.com>

f84e8fa0

14 5月, 2020 3 次提交

alinux: quota: fix unused label warning in dquot_load_quota_inode() · 4dc24f04

由 Jeffle Xu 提交于 5月 14, 2020

fix #27211210

Fix the compile warning caused by the unused label 'out' since
commit ec6880e8 ("new helper: lookup_positive_unlocked()").

Fixes: ec6880e8 ("new helper: lookup_positive_unlocked()")
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

4dc24f04

alinux: mm: fix undefined reference to printk_ratelimit_state · 58713685

由 Xu Yu 提交于 5月 09, 2020

fix #27508738

The variable printk_ratelimit_state is not defined if CONFIG_PRINTK is
not set, but is directly accessed in mm/oom_kill.c without considering
the config.

Consider CONFIG_PRINTK when accessing printk_ratelimit_state in
mm/oom_kill.c.

Fixes: 41a1a935 ("alinux: oom: add ratelimit printk to prevent softlockup")
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Reviewed-by: Nzhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

58713685

alinux: mm: fix undefined reference to mlock_fixup · 79e7c57a

由 Xu Yu 提交于 5月 09, 2020

fix #27508674

The function mlock_fixup is not defined if CONFIG_MMU is not set, but is
directly invoked by mm/unevictable.c without considering the config.

Make unevictable.o depend on mmu-$(CONFIG_MMU) where the definition of
mlock_fixup locates in.

Fixes: 7d6cb94f ("alinux: mm: Pin code section of process in memory")
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Acked-by: NXunlei Pang <xlpang@linux.alibaba.com>

79e7c57a

11 5月, 2020 1 次提交

configs: enable multipath for kernel selftests · 9ed6dd62

由 Joseph Qi 提交于 5月 11, 2020

fix #27497636

Enable ip route multipath and dm multipath, for consistent with arm and
physical kconfig.
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>

9ed6dd62

08 5月, 2020 2 次提交

mm: return zero_resv_unavail optimization · b68e2875

由 Pavel Tatashin 提交于 10月 26, 2018

to #26809468

commit ec393a0f014eaf688a3dbe8c8a4cbb52d7f535f9 upstream.

When checking for valid pfns in zero_resv_unavail(), it is not necessary
to verify that pfns within pageblock_nr_pages ranges are valid, only the
first one needs to be checked.  This is because memory for pages are
allocated in contiguous chunks that contain pageblock_nr_pages struct
pages.

Link: http://lkml.kernel.org/r/20181002143821.5112-3-msys.mizuma@gmail.comSigned-off-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
Signed-off-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NShile Zhang <shile.zhang@linux.alibaba.com>

b68e2875

mm: zero remaining unavailable struct pages · f8521831

由 Naoya Horiguchi 提交于 10月 26, 2018

to #26809468

commit 907ec5fca3dc38d37737de826f06f25b063aa08e upstream.

Patch series "mm: Fix for movable_node boot option", v3.

This patch series contains a fix for the movable_node boot option issue
which was introduced by commit 124049de ("x86/e820: put !E820_TYPE_RAM
regions into memblock.reserved").

The commit breaks the option because it changed the memory gap range to
reserved memblock.  So, the node is marked as Normal zone even if the SRAT
has Hot pluggable affinity.

First and second patch fix the original issue which the commit tried to
fix, then revert the commit.

This patch (of 3):

There is a kernel panic that is triggered when reading /proc/kpageflags on
the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':

  BUG: unable to handle kernel paging request at fffffffffffffffe
  PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
  Oops: 0000 [#1] SMP PTI
  CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
  RIP: 0010:stable_page_flags+0x27/0x3c0
  Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
  RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202
  RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0
  RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001
  R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0
  R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10
  FS:  00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0
  Call Trace:
   kpageflags_read+0xc7/0x120
   proc_reg_read+0x3c/0x60
   __vfs_read+0x36/0x170
   vfs_read+0x89/0x130
   ksys_pread64+0x71/0x90
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7efc42e75e23
  Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24

According to kernel bisection, this problem became visible due to commit
f7f99100 which changes how struct pages are initialized.

Memblock layout affects the pfn ranges covered by node/zone.  Consider
that we have a VM with 2 NUMA nodes and each node has 4GB memory, and the
default (no memmap= given) memblock layout is like below:

  MEMBLOCK configuration:
   memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000
   memory.cnt  = 0x4
   memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
   memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
   memory[0x2]     [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0
   memory[0x3]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
   ...

If you give memmap=1G!4G (so it just covers memory[0x2]),
the range [0x100000000-0x13fffffff] is gone:

  MEMBLOCK configuration:
   memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000
   memory.cnt  = 0x3
   memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
   memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
   memory[0x2]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
   ...

This causes shrinking node 0's pfn range because it is calculated by the
address range of memblock.memory.  So some of struct pages in the gap
range are left uninitialized.

We have a function zero_resv_unavail() which does zeroing the struct pages
outside memblock.memory, but currently it covers only the reserved
unavailable range (i.e.  memblock.memory && !memblock.reserved).  This
patch extends it to cover all unavailable range, which fixes the reported
issue.

Link: http://lkml.kernel.org/r/20181002143821.5112-2-msys.mizuma@gmail.com
Fixes: f7f99100 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Tested-by: NOscar Salvador <osalvador@suse.de>
Tested-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NShile Zhang <shile.zhang@linux.alibaba.com>

f8521831

07 5月, 2020 2 次提交

alinux: sched: Fix p->cpu build error on aarch64 · 5be663e3

由 Yihao Wu 提交于 5月 01, 2020

to #27372989

CONFIG_THREAD_INFO_IN_TASK is not set for aarch64, task_struct has no
cpu member. We should use the helper functiton task_cpu instead.

Fixes: 9e7b35d6 ("alinux: sched: Introduce per-cgroup iowait accounting")
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

5be663e3

blk-mq: balance mapping between present CPUs and queues · 5dc0acc2

由 Ming Lei 提交于 7月 25, 2019

fix #27417914

commit 556f36e90dbe7dded81f4fac084d2bc8a2458330 upstream

Spread queues among present CPUs first, then building mapping on other
non-present CPUs.

So we can minimize count of dead queues which are mapped by un-present
CPUs only. Then bad IO performance can be avoided by unbalanced mapping
between present CPUs and queues.

The similar policy has been applied on Managed IRQ affinity.

Cc: Yi Zhang <yi.zhang@redhat.com>
Reported-by: NYi Zhang <yi.zhang@redhat.com>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
[jeffle: remove code supporting multiple queue maps, which is merged since v5.0]
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

5dc0acc2

06 5月, 2020 7 次提交

configs: disable deferred struct page init · 9686c568

由 Shile Zhang 提交于 5月 06, 2020

to #26809468, #25931767

Disable deferred struct page init by default since some stability issue
observed.

A new kernel parameter was planed to enable it for large size instance
on demand, for memory init speed up.
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

9686c568

fs/namespace.c: fix mountpoint reference counter race · a14d8836

由 Piotr Krysiuk 提交于 4月 27, 2020

to #24913189

commit f511dc75d22e0c000fc70b54f670c2c17f5fba9a stable-4.19.

A race condition between threads updating mountpoint reference counter
affects longterm releases 4.4.220, 4.9.220, 4.14.177 and 4.19.118.

The mountpoint reference counter corruption may occur when:
* one thread increments m_count member of struct mountpoint
  [under namespace_sem, but not holding mount_lock]
    pivot_root()
* another thread simultaneously decrements the same m_count
  [under mount_lock, but not holding namespace_sem]
    put_mountpoint()
      unhash_mnt()
        umount_mnt()
          mntput_no_expire()

To fix this race condition, grab mount_lock before updating m_count in
pivot_root().

Reference: CVE-2020-12114
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NPiotr Krysiuk <piotras@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

a14d8836

blktrace: fix dereference after null check · d9dbee73

由 Cengiz Can 提交于 3月 04, 2020

to #24913189

commit 153031a301bb07194e9c37466cfce8eacb977621 upstream.

There was a recent change in blktrace.c that added a RCU protection to
`q->blk_trace` in order to fix a use-after-free issue during access.

However the change missed an edge case that can lead to dereferencing of
`bt` pointer even when it's NULL:

Coverity static analyzer marked this as a FORWARD_NULL issue with CID
1460458.

```
/kernel/trace/blktrace.c: 1904 in sysfs_blk_trace_attr_store()
1898            ret = 0;
1899            if (bt == NULL)
1900                    ret = blk_trace_setup_queue(q, bdev);
1901
1902            if (ret == 0) {
1903                    if (attr == &dev_attr_act_mask)
>>>     CID 1460458:  Null pointer dereferences  (FORWARD_NULL)
>>>     Dereferencing null pointer "bt".
1904                            bt->act_mask = value;
1905                    else if (attr == &dev_attr_pid)
1906                            bt->pid = value;
1907                    else if (attr == &dev_attr_start_lba)
1908                            bt->start_lba = value;
1909                    else if (attr == &dev_attr_end_lba)
```

Added a reassignment with RCU annotation to fix the issue.

Fixes: c780e86dd48 ("blktrace: Protect q->blk_trace with RCU")
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NCengiz Can <cengiz@kernel.wtf>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
References: CVE-2019-19768
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

d9dbee73

blktrace: Protect q->blk_trace with RCU · 62b6bd7c

由 Jan Kara 提交于 2月 06, 2020

to #24913189

commit c780e86dd48ef6467a1146cf7d0fe1e05a635039 upstream.

KASAN is reporting that __blk_add_trace() has a use-after-free issue
when accessing q->blk_trace. Indeed the switching of block tracing (and
thus eventual freeing of q->blk_trace) is completely unsynchronized with
the currently running tracing and thus it can happen that the blk_trace
structure is being freed just while __blk_add_trace() works on it.
Protect accesses to q->blk_trace by RCU during tracing and make sure we
wait for the end of RCU grace period when shutting down tracing. Luckily
that is rare enough event that we can afford that. Note that postponing
the freeing of blk_trace to an RCU callback should better be avoided as
it could have unexpected user visible side-effects as debugfs files
would be still existing for a short while block tracing has been shut
down.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=205711
CC: stable@vger.kernel.org
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Tested-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reported-by: NTristan Madani <tristmd@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
[bwh: Backported to 4.19: adjust context]
Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
References: CVE-2019-19768
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

62b6bd7c

net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup · ff7c8fa2

由 Sabrina Dubroca 提交于 12月 04, 2019

to #24913189

commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 upstream.

ipv6_stub uses the ip6_dst_lookup function to allow other modules to
perform IPv6 lookups. However, this function skips the XFRM layer
entirely.

All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
which calls xfrm_lookup_route(). This patch fixes this inconsistent
behavior by switching the stub to ip6_dst_lookup_flow, which also calls
xfrm_lookup_route().

This requires some changes in all the callers, as these two functions
take different arguments and have different return types.

Fixes: 5f81bd2e ("ipv6: export a stub for IPv6 symbols used by vxlan")
Reported-by: NXiumei Mu <xmu@redhat.com>
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
[bwh: Backported to 4.19:
 - Drop change in lwt_bpf.c
 - Delete now-unused "ret" in mlx5e_route_lookup_ipv6()
 - Initialise "out_dev" in mlx5e_create_encap_header_ipv6() to avoid
   introducing a spurious "may be used uninitialised" warning
 - Adjust filenames, context, indentation]
Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
References: CVE-2020-1749
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

ff7c8fa2

net: ipv6: add net argument to ip6_dst_lookup_flow · c1394349

由 Sabrina Dubroca 提交于 12月 04, 2019

to #24913189

commit c4e85f73afb6384123e5ef1bba3315b2e3ad031e upstream.

This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow,
as some modules currently pass a net argument without a socket to
ip6_dst_lookup. This is equivalent to commit 343d60aa ("ipv6: change
ipv6_stub_impl.ipv6_dst_lookup to take net argument").
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
[bwh: Backported to 4.19: adjust context]
Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
References: CVE-2020-1749
[zsl: fixes conflicts in net/sctp/ipv6.c]
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

c1394349

vhost: Check docket sk_family instead of call getname · a0905612

由 Eugenio Pérez 提交于 2月 21, 2020

fix #27354984

commit 42d84c8490f9f0931786f1623191fcab397c3d64 upstream.

Doing so, we save one call to get data we already have in the struct.

Also, since there is no guarantee that getname use sockaddr_ll
parameter beyond its size, we add a little bit of security here.
It should do not do beyond MAX_ADDR_LEN, but syzbot found that
ax25_getname writes more (72 bytes, the size of full_sockaddr_ax25,
versus 20 + 32 bytes of sockaddr_ll + MAX_ADDR_LEN in syzbot repro).

Fixes: 3a4d5c94 ("vhost_net: a kernel-level virtio server")
Reported-by: syzbot+f2a62d07a5198c819c7b@syzkaller.appspotmail.com
Signed-off-by: NEugenio Pérez <eperezma@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
References: CVE-2020-10942
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

a0905612

05 5月, 2020 1 次提交

configs: enable squashfs support · 8d07dc10

由 Joseph Qi 提交于 4月 30, 2020

to #27362006

Especially, we eanble the following for daishu, which are consistent
with centos8:
CONFIG_SQUASHFS_FILE_DIRECT
CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

8d07dc10

01 5月, 2020 1 次提交

alinux: sched: Fix nr_migrations compile errors · 435d7069

由 Yihao Wu 提交于 4月 30, 2020

to #27363370

When CONFIG_SCHED_SLI is not set, compiler gives errors about
task_ca_increase_nr_migrations redefinition. It should've been
an empty implementation in this case.

Fixes: 965d75d3 ("alinux: cpuacct: make cpuacct record nr_migrations")
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

435d7069

30 4月, 2020 9 次提交

configs: enable support for virtio pmem driver · 8149cf4a

由 Shile Zhang 提交于 4月 30, 2020

fix #27138800
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

8149cf4a

libnvdimm/region: Enable MAP_SYNC for volatile regions · 429d072f

由 Aneesh Kumar K.V 提交于 9月 24, 2019

fix #27138800

commit 4c806b897d6075bfa5067e524fb058c57ab64e7b upstream.

Some environments want to use a host tmpfs/ramdisk to back guest pmem.
While the data is not persisted relative to the host it *is* persisted
relative to guest crashes / reboots. The guest is free to use dax and
MAP_SYNC to keep filesystem metadata consistent with dax accesses
without requiring guest fsync(). The guest can also observe that the
region is volatile and skip cache flushing as global visibility is
enough to "persist" data relative to the host staying alive over guest
reset events.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: NPankaj Gupta <pagupta@redhat.com>
Link: https://lore.kernel.org/r/20190924114327.14700-1-aneesh.kumar@linux.ibm.com
[djbw: reword the changelog]
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>

429d072f

virtio_pmem: fix sparse warning · 663765e1

由 Pankaj Gupta 提交于 7月 12, 2019

fix #27138800

commit 8c2e408e73f735d2e6e8b43f9b038c9abb082939 upstream.

This patch fixes below sparse warning related to __virtio
type in virtio pmem driver. This is reported by Intel test
bot on linux-next tree.

nd_virtio.c:56:28: warning: incorrect type in assignment
                                (different base types)
nd_virtio.c:56:28:    expected unsigned int [unsigned] [usertype] type
nd_virtio.c:56:28:    got restricted __virtio32
nd_virtio.c:93:59: warning: incorrect type in argument 2
                                (different base types)
nd_virtio.c:93:59:    expected restricted __virtio32 [usertype] val
nd_virtio.c:93:59:    got unsigned int [unsigned] [usertype] ret
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NPankaj Gupta <pagupta@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>

663765e1

xfs: disable map_sync for async flush · df842278