1. 17 4月, 2019 25 次提交
    • Y
      net/mlx5e: Add a lock on tir list · c297e881
      Yuval Avnery 提交于
      [ Upstream commit 80a2a9026b24c6bd34b8d58256973e22270bedec ]
      
      Refresh tirs is looping over a global list of tirs while netdevs are
      adding and removing tirs from that list. That is why a lock is
      required.
      
      Fixes: 724b2aa1 ("net/mlx5e: TIRs management refactoring")
      Signed-off-by: NYuval Avnery <yuvalav@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c297e881
    • G
      net/mlx5e: Fix error handling when refreshing TIRs · 94413175
      Gavi Teitz 提交于
      [ Upstream commit bc87a0036826a37b43489b029af8143bd07c6cca ]
      
      Previously, a false positive would be caught if the TIRs list is
      empty, since the err value was initialized to -ENOMEM, and was only
      updated if a TIR is refreshed. This is resolved by initializing the
      err value to zero.
      
      Fixes: b676f653 ("net/mlx5e: Refactor refresh TIRs")
      Signed-off-by: NGavi Teitz <gavi@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      94413175
    • S
      vrf: check accept_source_route on the original netdevice · 0516ef27
      Stephen Suryaputra 提交于
      [ Upstream commit 8c83f2df9c6578ea4c5b940d8238ad8a41b87e9e ]
      
      Configuration check to accept source route IP options should be made on
      the incoming netdevice when the skb->dev is an l3mdev master. The route
      lookup for the source route next hop also needs the incoming netdev.
      
      v2->v3:
      - Simplify by passing the original netdevice down the stack (per David
        Ahern).
      Signed-off-by: NStephen Suryaputra <ssuryaextr@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0516ef27
    • D
      tcp: fix a potential NULL pointer dereference in tcp_sk_exit · 7243e352
      Dust Li 提交于
      [ Upstream commit b506bc975f60f06e13e74adb35e708a23dc4e87c ]
      
       When tcp_sk_init() failed in inet_ctl_sock_create(),
       'net->ipv4.tcp_congestion_control' will be left
       uninitialized, but tcp_sk_exit() hasn't check for
       that.
      
       This patch add checking on 'net->ipv4.tcp_congestion_control'
       in tcp_sk_exit() to prevent NULL-ptr dereference.
      
      Fixes: 6670e152 ("tcp: Namespace-ify sysctl_tcp_default_congestion_control")
      Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7243e352
    • K
      tcp: Ensure DCTCP reacts to losses · 0e0afb06
      Koen De Schepper 提交于
      [ Upstream commit aecfde23108b8e637d9f5c5e523b24fb97035dc3 ]
      
      RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
      loss episodes in the same way as conventional TCP".
      
      Currently, Linux DCTCP performs no cwnd reduction when losses
      are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets
      alpha to its maximal value if a RTO happens. This behavior
      is sub-optimal for at least two reasons: i) it ignores losses
      triggering fast retransmissions; and ii) it causes unnecessary large
      cwnd reduction in the future if the loss was isolated as it resets
      the historical term of DCTCP's alpha EWMA to its maximal value (i.e.,
      denoting a total congestion). The second reason has an especially
      noticeable effect when using DCTCP in high BDP environments, where
      alpha normally stays at low values.
      
      This patch replace the clamping of alpha by setting ssthresh to
      half of cwnd for both fast retransmissions and RTOs, at most once
      per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
      has been removed.
      
      The table below shows experimental results where we measured the
      drop probability of a PIE AQM (not applying ECN marks) at a
      bottleneck in the presence of a single TCP flow with either the
      alpha-clamping option enabled or the cwnd halving proposed by this
      patch. Results using reno or cubic are given for comparison.
      
                                |  Link   |   RTT    |    Drop
                       TCP CC   |  speed  | base+AQM | probability
              ==================|=========|==========|============
                          CUBIC |  40Mbps |  7+20ms  |    0.21%
                           RENO |         |          |    0.19%
              DCTCP-CLAMP-ALPHA |         |          |   25.80%
               DCTCP-HALVE-CWND |         |          |    0.22%
              ------------------|---------|----------|------------
                          CUBIC | 100Mbps |  7+20ms  |    0.03%
                           RENO |         |          |    0.02%
              DCTCP-CLAMP-ALPHA |         |          |   23.30%
               DCTCP-HALVE-CWND |         |          |    0.04%
              ------------------|---------|----------|------------
                          CUBIC | 800Mbps |   1+1ms  |    0.04%
                           RENO |         |          |    0.05%
              DCTCP-CLAMP-ALPHA |         |          |   18.70%
               DCTCP-HALVE-CWND |         |          |    0.06%
      
      We see that, without halving its cwnd for all source of losses,
      DCTCP drives the AQM to large drop probabilities in order to keep
      the queue length under control (i.e., it repeatedly faces RTOs).
      Instead, if DCTCP reacts to all source of losses, it can then be
      controlled by the AQM using similar drop levels than cubic or reno.
      Signed-off-by: NKoen De Schepper <koen.de_schepper@nokia-bell-labs.com>
      Signed-off-by: NOlivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
      Cc: Bob Briscoe <research@bobbriscoe.net>
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <borkmann@iogearbox.net>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Andrew Shewmaker <agshew@gmail.com>
      Cc: Glenn Judd <glenn.judd@morganstanley.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0e0afb06
    • X
      sctp: initialize _pad of sockaddr_in before copying to user memory · 87349583
      Xin Long 提交于
      [ Upstream commit 09279e615c81ce55e04835970601ae286e3facbe ]
      
      Syzbot report a kernel-infoleak:
      
        BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
        Call Trace:
          _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
          copy_to_user include/linux/uaccess.h:174 [inline]
          sctp_getsockopt_peer_addrs net/sctp/socket.c:5911 [inline]
          sctp_getsockopt+0x1668e/0x17f70 net/sctp/socket.c:7562
          ...
        Uninit was stored to memory at:
          sctp_transport_init net/sctp/transport.c:61 [inline]
          sctp_transport_new+0x16d/0x9a0 net/sctp/transport.c:115
          sctp_assoc_add_peer+0x532/0x1f70 net/sctp/associola.c:637
          sctp_process_param net/sctp/sm_make_chunk.c:2548 [inline]
          sctp_process_init+0x1a1b/0x3ed0 net/sctp/sm_make_chunk.c:2361
          ...
        Bytes 8-15 of 16 are uninitialized
      
      It was caused by that th _pad field (the 8-15 bytes) of a v4 addr (saved in
      struct sockaddr_in) wasn't initialized, but directly copied to user memory
      in sctp_getsockopt_peer_addrs().
      
      So fix it by calling memset(addr->v4.sin_zero, 0, 8) to initialize _pad of
      sockaddr_in before copying it to user memory in sctp_v4_addr_to_user(), as
      sctp_v6_addr_to_user() does.
      
      Reported-by: syzbot+86b5c7c236a22616a72f@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Tested-by: NAlexander Potapenko <glider@google.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      87349583
    • H
      r8169: disable ASPM again · 1e4a7e78
      Heiner Kallweit 提交于
      [ Upstream commit b75bb8a5b755d0c7bf1ac071e4df2349a7644a1e ]
      
      There's a significant number of reports that re-enabling ASPM causes
      different issues, ranging from decreased performance to system not
      booting at all. This affects only a minority of users, but the number
      of affected users is big enough that we better switch off ASPM again.
      
      This will hurt notebook users who are not affected by the issues, they
      may see decreased battery runtime w/o ASPM. With the PCI core folks is
      being discussed to add generic sysfs attributes to control ASPM.
      Once this is in place brave enough users can re-enable ASPM on their
      system.
      
      Fixes: a99790bf ("r8169: Reinstate ASPM Support")
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1e4a7e78
    • B
      qmi_wwan: add Olicard 600 · 84dc2f87
      Bjørn Mork 提交于
      [ Upstream commit 6289d0facd9ebce4cc83e5da39e15643ee998dc5 ]
      
      This is a Qualcomm based device with a QMI function on interface 4.
      It is mode switched from 2020:2030 using a standard eject message.
      
      T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  6 Spd=480  MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=2020 ProdID=2031 Rev= 2.32
      S:  Manufacturer=Mobile Connect
      S:  Product=Mobile Connect
      S:  SerialNumber=0123456789ABCDEF
      C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=500mA
      I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
      E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
      E:  Ad=83(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
      E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
      E:  Ad=87(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
      E:  Ad=89(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      E:  Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      I:* If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none)
      E:  Ad=8a(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
      E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=125us
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      84dc2f87
    • A
      openvswitch: fix flow actions reallocation · ec0e32da
      Andrea Righi 提交于
      [ Upstream commit f28cd2af22a0c134e4aa1c64a70f70d815d473fb ]
      
      The flow action buffer can be resized if it's not big enough to contain
      all the requested flow actions. However, this resize doesn't take into
      account the new requested size, the buffer is only increased by a factor
      of 2x. This might be not enough to contain the new data, causing a
      buffer overflow, for example:
      
      [   42.044472] =============================================================================
      [   42.045608] BUG kmalloc-96 (Not tainted): Redzone overwritten
      [   42.046415] -----------------------------------------------------------------------------
      
      [   42.047715] Disabling lock debugging due to kernel taint
      [   42.047716] INFO: 0x8bf2c4a5-0x720c0928. First byte 0x0 instead of 0xcc
      [   42.048677] INFO: Slab 0xbc6d2040 objects=29 used=18 fp=0xdc07dec4 flags=0x2808101
      [   42.049743] INFO: Object 0xd53a3464 @offset=2528 fp=0xccdcdebb
      
      [   42.050747] Redzone 76f1b237: cc cc cc cc cc cc cc cc                          ........
      [   42.051839] Object d53a3464: 6b 6b 6b 6b 6b 6b 6b 6b 0c 00 00 00 6c 00 00 00  kkkkkkkk....l...
      [   42.053015] Object f49a30cc: 6c 00 0c 00 00 00 00 00 00 00 00 03 78 a3 15 f6  l...........x...
      [   42.054203] Object acfe4220: 20 00 02 00 ff ff ff ff 00 00 00 00 00 00 00 00   ...............
      [   42.055370] Object 21024e91: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [   42.056541] Object 070e04c3: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [   42.057797] Object 948a777a: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      [   42.059061] Redzone 8bf2c4a5: 00 00 00 00                                      ....
      [   42.060189] Padding a681b46e: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
      
      Fix by making sure the new buffer is properly resized to contain all the
      requested data.
      
      BugLink: https://bugs.launchpad.net/bugs/1813244Signed-off-by: NAndrea Righi <andrea.righi@canonical.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ec0e32da
    • N
      net/sched: fix ->get helper of the matchall cls · eeedfa94
      Nicolas Dichtel 提交于
      [ Upstream commit 0db6f8befc32c68bb13d7ffbb2e563c79e913e13 ]
      
      It returned always NULL, thus it was never possible to get the filter.
      
      Example:
      $ ip link add foo type dummy
      $ ip link add bar type dummy
      $ tc qdisc add dev foo clsact
      $ tc filter add dev foo protocol all pref 1 ingress handle 1234 \
      	matchall action mirred ingress mirror dev bar
      
      Before the patch:
      $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
      Error: Specified filter handle not found.
      We have an error talking to the kernel
      
      After:
      $ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
      filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2
        not_in_hw
              action order 1: mirred (Ingress Mirror to device bar) pipe
              index 1 ref 1 bind 1
      
      CC: Yotam Gigi <yotamg@mellanox.com>
      CC: Jiri Pirko <jiri@mellanox.com>
      Fixes: fd62d9f5 ("net/sched: matchall: Fix configuration race")
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      eeedfa94
    • D
      net/sched: act_sample: fix divide by zero in the traffic path · 15c0770e
      Davide Caratti 提交于
      [ Upstream commit fae2708174ae95d98d19f194e03d6e8f688ae195 ]
      
      the control path of 'sample' action does not validate the value of 'rate'
      provided by the user, but then it uses it as divisor in the traffic path.
      Validate it in tcf_sample_init(), and return -EINVAL with a proper extack
      message in case that value is zero, to fix a splat with the script below:
      
       # tc f a dev test0 egress matchall action sample rate 0 group 1 index 2
       # tc -s a s action sample
       total acts 1
      
               action order 0: sample rate 1/0 group 1 pipe
                index 2 ref 1 bind 1 installed 19 sec used 19 sec
               Action statistics:
               Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
               backlog 0b 0p requeues 0
       # ping 192.0.2.1 -I test0 -c1 -q
      
       divide error: 0000 [#1] SMP PTI
       CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample]
       Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 <41> f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01
       RSP: 0018:ffffae320190ba30 EFLAGS: 00010246
       RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49
       RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0
       RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000
       R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80
       R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000
       FS:  00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0
       Call Trace:
        tcf_action_exec+0x7c/0x1c0
        tcf_classify+0x57/0x160
        __dev_queue_xmit+0x3dc/0xd10
        ip_finish_output2+0x257/0x6d0
        ip_output+0x75/0x280
        ip_send_skb+0x15/0x40
        raw_sendmsg+0xae3/0x1410
        sock_sendmsg+0x36/0x40
        __sys_sendto+0x10e/0x140
        __x64_sys_sendto+0x24/0x30
        do_syscall_64+0x60/0x210
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [...]
        Kernel panic - not syncing: Fatal exception in interrupt
      
      Add a TDC selftest to document that 'rate' is now being validated.
      Reported-by: NMatteo Croce <mcroce@redhat.com>
      Fixes: 5c5670fa ("net/sched: Introduce sample tc action")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NYotam Gigi <yotam.gi@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      15c0770e
    • M
      net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock(). · 78b4bf26
      Mao Wenan 提交于
      [ Upstream commit cb66ddd156203daefb8d71158036b27b0e2caf63 ]
      
      When it is to cleanup net namespace, rds_tcp_exit_net() will call
      rds_tcp_kill_sock(), if t_sock is NULL, it will not call
      rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free
      connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in
      net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect()
      and reference 'net' which has already been freed.
      
      In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before
      sock->ops->connect, but if connect() is failed, it will call
      rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always
      failed, rds_connect_worker() will try to reconnect all the time, so
      rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the
      connections.
      
      Therefore, the condition !tc->t_sock is not needed if it is going to do
      cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always
      NULL, and there is on other path to cancel cp_conn_w and free
      connection. So this patch is to fix this.
      
      rds_tcp_kill_sock():
      ...
      if (net != c_net || !tc->t_sock)
      ...
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      
      ==================================================================
      BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28
      net/ipv4/af_inet.c:340
      Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721
      
      CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 #11
      Hardware name: linux,dummy-virt (DT)
      Workqueue: krdsd rds_connect_worker
      Call trace:
       dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53
       show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x120/0x188 lib/dump_stack.c:113
       print_address_description+0x68/0x278 mm/kasan/report.c:253
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x21c/0x348 mm/kasan/report.c:409
       __asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429
       inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340
       __sock_create+0x4f8/0x770 net/socket.c:1276
       sock_create_kern+0x50/0x68 net/socket.c:1322
       rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114
       rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175
       process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
       worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
       kthread+0x2f0/0x378 kernel/kthread.c:255
       ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
      
      Allocated by task 687:
       save_stack mm/kasan/kasan.c:448 [inline]
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490
       slab_post_alloc_hook mm/slab.h:444 [inline]
       slab_alloc_node mm/slub.c:2705 [inline]
       slab_alloc mm/slub.c:2713 [inline]
       kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718
       kmem_cache_zalloc include/linux/slab.h:697 [inline]
       net_alloc net/core/net_namespace.c:384 [inline]
       copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424
       create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107
       unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206
       ksys_unshare+0x340/0x628 kernel/fork.c:2577
       __do_sys_unshare kernel/fork.c:2645 [inline]
       __se_sys_unshare kernel/fork.c:2643 [inline]
       __arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643
       __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
       el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83
       el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129
       el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960
      
      Freed by task 264:
       save_stack mm/kasan/kasan.c:448 [inline]
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521
       kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528
       slab_free_hook mm/slub.c:1370 [inline]
       slab_free_freelist_hook mm/slub.c:1397 [inline]
       slab_free mm/slub.c:2952 [inline]
       kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968
       net_free net/core/net_namespace.c:400 [inline]
       net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407
       net_drop_ns net/core/net_namespace.c:406 [inline]
       cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569
       process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
       worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
       kthread+0x2f0/0x378 kernel/kthread.c:255
       ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
      
      The buggy address belongs to the object at ffff8003496a3f80
       which belongs to the cache net_namespace of size 7872
      The buggy address is located 1796 bytes inside of
       7872-byte region [ffff8003496a3f80, ffff8003496a5e40)
      The buggy address belongs to the page:
      page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000
      index:0x0 compound_mapcount: 0
      flags: 0xffffe0000008100(slab|head)
      raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000
      raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                         ^
       ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Fixes: 467fa153("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      78b4bf26
    • E
      netns: provide pure entropy for net_hash_mix() · a1c2f322
      Eric Dumazet 提交于
      [ Upstream commit 355b98553789b646ed97ad801a619ff898471b92 ]
      
      net_hash_mix() currently uses kernel address of a struct net,
      and is used in many places that could be used to reveal this
      address to a patient attacker, thus defeating KASLR, for
      the typical case (initial net namespace, &init_net is
      not dynamically allocated)
      
      I believe the original implementation tried to avoid spending
      too many cycles in this function, but security comes first.
      
      Also provide entropy regardless of CONFIG_NET_NS.
      
      Fixes: 0b441916 ("netns: introduce the net_hash_mix "salt" for hashes")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAmit Klein <aksecurity@gmail.com>
      Reported-by: NBenny Pinkas <benny@pinkas.net>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a1c2f322
    • A
      net/mlx5: Decrease default mr cache size · 53a19068
      Artemy Kovalyov 提交于
      [ Upstream commit e8b26b2135dedc0284490bfeac06dfc4418d0105 ]
      
      Delete initialization of high order entries in mr cache to decrease initial
      memory footprint. When required, the administrator can populate the
      entries with memory keys via the /sys interface.
      
      This approach is very helpful to significantly reduce the per HW function
      memory footprint in virtualization environments such as SRIOV.
      
      Fixes: 9603b61d ("mlx5: Move pci device handling from mlx5_ib to mlx5_core")
      Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Reported-by: NShalom Toledo <shalomt@mellanox.com>
      Acked-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      53a19068
    • S
      net-gro: Fix GRO flush when receiving a GSO packet. · b87ec813
      Steffen Klassert 提交于
      [ Upstream commit 0ab03f353d3613ea49d1f924faf98559003670a8 ]
      
      Currently we may merge incorrectly a received GSO packet
      or a packet with frag_list into a packet sitting in the
      gro_hash list. skb_segment() may crash case because
      the assumptions on the skb layout are not met.
      The correct behaviour would be to flush the packet in the
      gro_hash list and send the received GSO packet directly
      afterwards. Commit d61d072e ("net-gro: avoid reorders")
      sets NAPI_GRO_CB(skb)->flush in this case, but this is not
      checked before merging. This patch makes sure to check this
      flag and to not merge in that case.
      
      Fixes: d61d072e ("net-gro: avoid reorders")
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b87ec813
    • L
      net: ethtool: not call vzalloc for zero sized memory request · 80c20581
      Li RongQing 提交于
      [ Upstream commit 3d8830266ffc28c16032b859e38a0252e014b631 ]
      
      NULL or ZERO_SIZE_PTR will be returned for zero sized memory
      request, and derefencing them will lead to a segfault
      
      so it is unnecessory to call vzalloc for zero sized memory
      request and not call functions which maybe derefence the
      NULL allocated memory
      
      this also fixes a possible memory leak if phy_ethtool_get_stats
      returns error, memory should be freed before exit
      Signed-off-by: NLi RongQing <lirongqing@baidu.com>
      Reviewed-by: NWang Li <wangli39@baidu.com>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      80c20581
    • J
      kcm: switch order of device registration to fix a crash · b7b05831
      Jiri Slaby 提交于
      [ Upstream commit 3c446e6f96997f2a95bf0037ef463802162d2323 ]
      
      When kcm is loaded while many processes try to create a KCM socket, a
      crash occurs:
       BUG: unable to handle kernel NULL pointer dereference at 000000000000000e
       IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
       PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0
       Oops: 0002 [#1] SMP KASAN PTI
       CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased)
       RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
       RSP: 0018:ffff88000d487a00 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719
       ...
       CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0
       Call Trace:
        kcm_create+0x600/0xbf0 [kcm]
        __sock_create+0x324/0x750 net/socket.c:1272
       ...
      
      This is due to race between sock_create and unfinished
      register_pernet_device. kcm_create tries to do "net_generic(net,
      kcm_net_id)". but kcm_net_id is not initialized yet.
      
      So switch the order of the two to close the race.
      
      This can be reproduced with mutiple processes doing socket(PF_KCM, ...)
      and one process doing module removal.
      
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b7b05831
    • L
      ipv6: sit: reset ip header pointer in ipip6_rcv · 42f1fa0f
      Lorenzo Bianconi 提交于
      [ Upstream commit bb9bd814ebf04f579be466ba61fc922625508807 ]
      
      ipip6 tunnels run iptunnel_pull_header on received skbs. This can
      determine the following use-after-free accessing iph pointer since
      the packet will be 'uncloned' running pskb_expand_head if it is a
      cloned gso skb (e.g if the packet has been sent though a veth device)
      
      [  706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit]
      [  706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/=
      [  706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016
      [  706.771839] Call trace:
      [  706.801159]  dump_backtrace+0x0/0x2f8
      [  706.845079]  show_stack+0x24/0x30
      [  706.884833]  dump_stack+0xe0/0x11c
      [  706.925629]  print_address_description+0x68/0x260
      [  706.982070]  kasan_report+0x178/0x340
      [  707.025995]  __asan_report_load1_noabort+0x30/0x40
      [  707.083481]  ipip6_rcv+0x1678/0x16e0 [sit]
      [  707.132623]  tunnel64_rcv+0xd4/0x200 [tunnel4]
      [  707.185940]  ip_local_deliver_finish+0x3b8/0x988
      [  707.241338]  ip_local_deliver+0x144/0x470
      [  707.289436]  ip_rcv_finish+0x43c/0x14b0
      [  707.335447]  ip_rcv+0x628/0x1138
      [  707.374151]  __netif_receive_skb_core+0x1670/0x2600
      [  707.432680]  __netif_receive_skb+0x28/0x190
      [  707.482859]  process_backlog+0x1d0/0x610
      [  707.529913]  net_rx_action+0x37c/0xf68
      [  707.574882]  __do_softirq+0x288/0x1018
      [  707.619852]  run_ksoftirqd+0x70/0xa8
      [  707.662734]  smpboot_thread_fn+0x3a4/0x9e8
      [  707.711875]  kthread+0x2c8/0x350
      [  707.750583]  ret_from_fork+0x10/0x18
      
      [  707.811302] Allocated by task 16982:
      [  707.854182]  kasan_kmalloc.part.1+0x40/0x108
      [  707.905405]  kasan_kmalloc+0xb4/0xc8
      [  707.948291]  kasan_slab_alloc+0x14/0x20
      [  707.994309]  __kmalloc_node_track_caller+0x158/0x5e0
      [  708.053902]  __kmalloc_reserve.isra.8+0x54/0xe0
      [  708.108280]  __alloc_skb+0xd8/0x400
      [  708.150139]  sk_stream_alloc_skb+0xa4/0x638
      [  708.200346]  tcp_sendmsg_locked+0x818/0x2b90
      [  708.251581]  tcp_sendmsg+0x40/0x60
      [  708.292376]  inet_sendmsg+0xf0/0x520
      [  708.335259]  sock_sendmsg+0xac/0xf8
      [  708.377096]  sock_write_iter+0x1c0/0x2c0
      [  708.424154]  new_sync_write+0x358/0x4a8
      [  708.470162]  __vfs_write+0xc4/0xf8
      [  708.510950]  vfs_write+0x12c/0x3d0
      [  708.551739]  ksys_write+0xcc/0x178
      [  708.592533]  __arm64_sys_write+0x70/0xa0
      [  708.639593]  el0_svc_handler+0x13c/0x298
      [  708.686646]  el0_svc+0x8/0xc
      
      [  708.739019] Freed by task 17:
      [  708.774597]  __kasan_slab_free+0x114/0x228
      [  708.823736]  kasan_slab_free+0x10/0x18
      [  708.868703]  kfree+0x100/0x3d8
      [  708.905320]  skb_free_head+0x7c/0x98
      [  708.948204]  skb_release_data+0x320/0x490
      [  708.996301]  pskb_expand_head+0x60c/0x970
      [  709.044399]  __iptunnel_pull_header+0x3b8/0x5d0
      [  709.098770]  ipip6_rcv+0x41c/0x16e0 [sit]
      [  709.146873]  tunnel64_rcv+0xd4/0x200 [tunnel4]
      [  709.200195]  ip_local_deliver_finish+0x3b8/0x988
      [  709.255596]  ip_local_deliver+0x144/0x470
      [  709.303692]  ip_rcv_finish+0x43c/0x14b0
      [  709.349705]  ip_rcv+0x628/0x1138
      [  709.388413]  __netif_receive_skb_core+0x1670/0x2600
      [  709.446943]  __netif_receive_skb+0x28/0x190
      [  709.497120]  process_backlog+0x1d0/0x610
      [  709.544169]  net_rx_action+0x37c/0xf68
      [  709.589131]  __do_softirq+0x288/0x1018
      
      [  709.651938] The buggy address belongs to the object at ffffe01b6bd85580
                      which belongs to the cache kmalloc-1024 of size 1024
      [  709.804356] The buggy address is located 117 bytes inside of
                      1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980)
      [  709.946340] The buggy address belongs to the page:
      [  710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0
      [  710.099914] flags: 0xfffff8000000100(slab)
      [  710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600
      [  710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000
      [  710.334966] page dumped because: kasan: bad access detected
      
      Fix it resetting iph pointer after iptunnel_pull_header
      
      Fixes: a09a4c8d ("tunnels: Remove encapsulation offloads on decap")
      Tested-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      42f1fa0f
    • J
      ipv6: Fix dangling pointer when ipv6 fragment · ea06796f
      Junwei Hu 提交于
      [ Upstream commit ef0efcd3bd3fd0589732b67fb586ffd3c8705806 ]
      
      At the beginning of ip6_fragment func, the prevhdr pointer is
      obtained in the ip6_find_1stfragopt func.
      However, all the pointers pointing into skb header may change
      when calling skb_checksum_help func with
      skb->ip_summed = CHECKSUM_PARTIAL condition.
      The prevhdr pointe will be dangling if it is not reloaded after
      calling __skb_linearize func in skb_checksum_help func.
      
      Here, I add a variable, nexthdr_offset, to evaluate the offset,
      which does not changes even after calling __skb_linearize func.
      
      Fixes: 405c92f7 ("ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment")
      Signed-off-by: NJunwei Hu <hujunwei4@huawei.com>
      Reported-by: NWenhao Zhang <zhangwenhao8@huawei.com>
      Reported-by: syzbot+e8ce541d095e486074fc@syzkaller.appspotmail.com
      Reviewed-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ea06796f
    • S
      ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type · 8e4b4da3
      Sheena Mira-ato 提交于
      [ Upstream commit b2e54b09a3d29c4db883b920274ca8dca4d9f04d ]
      
      The device type for ip6 tunnels is set to
      ARPHRD_TUNNEL6. However, the ip4ip6_err function
      is expecting the device type of the tunnel to be
      ARPHRD_TUNNEL.  Since the device types do not
      match, the function exits and the ICMP error
      packet is not sent to the originating host. Note
      that the device type for IPv4 tunnels is set to
      ARPHRD_TUNNEL.
      
      Fix is to expect a tunnel device type of
      ARPHRD_TUNNEL6 instead.  Now the tunnel device
      type matches and the ICMP error packet is sent
      to the originating host.
      Signed-off-by: NSheena Mira-ato <sheena.mira-ato@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8e4b4da3
    • T
      ibmvnic: Fix completion structure initialization · 16701957
      Thomas Falcon 提交于
      [ Upstream commit bbd669a868bba591ffd38b7bc75a7b361bb54b04 ]
      
      Fix device initialization completion handling for vNIC adapters.
      Initialize the completion structure on probe and reinitialize when needed.
      This also fixes a race condition during kdump where the driver can attempt
      to access the completion struct before it is initialized:
      
      Unable to handle kernel paging request for data at address 0x00000000
      Faulting instruction address: 0xc0000000081acbe0
      Oops: Kernel access of bad area, sig: 11 [#1]
      LE SMP NR_CPUS=2048 NUMA pSeries
      Modules linked in: ibmvnic(+) ibmveth sunrpc overlay squashfs loop
      CPU: 19 PID: 301 Comm: systemd-udevd Not tainted 4.18.0-64.el8.ppc64le #1
      NIP:  c0000000081acbe0 LR: c0000000081ad964 CTR: c0000000081ad900
      REGS: c000000027f3f990 TRAP: 0300   Not tainted  (4.18.0-64.el8.ppc64le)
      MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28228288  XER: 00000006
      CFAR: c000000008008934 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 1
      GPR00: c0000000081ad964 c000000027f3fc10 c0000000095b5800 c0000000221b4e58
      GPR04: 0000000000000003 0000000000000001 000049a086918581 00000000000000d4
      GPR08: 0000000000000007 0000000000000000 ffffffffffffffe8 d0000000014dde28
      GPR12: c0000000081ad900 c000000009a00c00 0000000000000001 0000000000000100
      GPR16: 0000000000000038 0000000000000007 c0000000095e2230 0000000000000006
      GPR20: 0000000000400140 0000000000000001 c00000000910c880 0000000000000000
      GPR24: 0000000000000000 0000000000000006 0000000000000000 0000000000000003
      GPR28: 0000000000000001 0000000000000001 c0000000221b4e60 c0000000221b4e58
      NIP [c0000000081acbe0] __wake_up_locked+0x50/0x100
      LR [c0000000081ad964] complete+0x64/0xa0
      Call Trace:
      [c000000027f3fc10] [c000000027f3fc60] 0xc000000027f3fc60 (unreliable)
      [c000000027f3fc60] [c0000000081ad964] complete+0x64/0xa0
      [c000000027f3fca0] [d0000000014dad58] ibmvnic_handle_crq+0xce0/0x1160 [ibmvnic]
      [c000000027f3fd50] [d0000000014db270] ibmvnic_tasklet+0x98/0x130 [ibmvnic]
      [c000000027f3fda0] [c00000000813f334] tasklet_action_common.isra.3+0xc4/0x1a0
      [c000000027f3fe00] [c000000008cd13f4] __do_softirq+0x164/0x400
      [c000000027f3fef0] [c00000000813ed64] irq_exit+0x184/0x1c0
      [c000000027f3ff20] [c0000000080188e8] __do_irq+0xb8/0x210
      [c000000027f3ff90] [c00000000802d0a4] call_do_irq+0x14/0x24
      [c000000026a5b010] [c000000008018adc] do_IRQ+0x9c/0x130
      [c000000026a5b060] [c000000008008ce4] hardware_interrupt_common+0x114/0x120
      Signed-off-by: NThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      16701957
    • H
      hv_netvsc: Fix unwanted wakeup after tx_disable · 9a7c4f5a
      Haiyang Zhang 提交于
      [ Upstream commit 1b704c4a1ba95574832e730f23817b651db2aa59 ]
      
      After queue stopped, the wakeup mechanism may wake it up again
      when ring buffer usage is lower than a threshold. This may cause
      send path panic on NULL pointer when we stopped all tx queues in
      netvsc_detach and start removing the netvsc device.
      
      This patch fix it by adding a tx_disable flag to prevent unwanted
      queue wakeup.
      
      Fixes: 7b2ee50c ("hv_netvsc: common detach logic")
      Reported-by: NMohammed Gamal <mgamal@redhat.com>
      Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      9a7c4f5a
    • B
      powerpc/tm: Limit TM code inside PPC_TRANSACTIONAL_MEM · 902eca1a
      Breno Leitao 提交于
      commit 897bc3df8c5aebb54c32d831f917592e873d0559 upstream.
      
      Commit e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint")
      moved a code block around and this block uses a 'msr' variable outside of
      the CONFIG_PPC_TRANSACTIONAL_MEM, however the 'msr' variable is declared
      inside a CONFIG_PPC_TRANSACTIONAL_MEM block, causing a possible error when
      CONFIG_PPC_TRANSACTION_MEM is not defined.
      
      	error: 'msr' undeclared (first use in this function)
      
      This is not causing a compilation error in the mainline kernel, because
      'msr' is being used as an argument of MSR_TM_ACTIVE(), which is defined as
      the following when CONFIG_PPC_TRANSACTIONAL_MEM is *not* set:
      
      	#define MSR_TM_ACTIVE(x) 0
      
      This patch just fixes this issue avoiding the 'msr' variable usage outside
      the CONFIG_PPC_TRANSACTIONAL_MEM block, avoiding trusting in the
      MSR_TM_ACTIVE() definition.
      
      Cc: stable@vger.kernel.org
      Reported-by: NChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Fixes: e1c3743e1a20 ("powerpc/tm: Set MSR[TS] just prior to recheckpoint")
      Signed-off-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      902eca1a
    • Y
      drm/i915/gvt: do not let pin count of shadow mm go negative · 8ad895bf
      Yan Zhao 提交于
      [ Upstream commit 663a50ceac75c2208d2ad95365bc8382fd42f44d ]
      
      shadow mm's pin count got increased in workload preparation phase, which
      is after workload scanning.
      it will get decreased in complete_current_workload() anyway after
      workload completion.
      Sometimes, if a workload meets a scanning error, its shadow mm pin count
      will not get increased but will get decreased in the end.
      This patch lets shadow mm's pin count not go below 0.
      
      Fixes: 2707e444 ("drm/i915/gvt: vGPU graphics memory virtualization")
      Cc: zhenyuw@linux.intel.com
      Cc: stable@vger.kernel.org #4.14+
      Signed-off-by: NYan Zhao <yan.y.zhao@intel.com>
      Signed-off-by: NZhenyu Wang <zhenyuw@linux.intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8ad895bf
    • J
      kvm: nVMX: NMI-window and interrupt-window exiting should wake L2 from HLT · 646f8e01
      Jim Mattson 提交于
      [ Upstream commit 9ebdfe5230f2e50e3ba05c57723a06e90946815a ]
      
      According to the SDM, "NMI-window exiting" VM-exits wake a logical
      processor from the same inactive states as would an NMI and
      "interrupt-window exiting" VM-exits wake a logical processor from the
      same inactive states as would an external interrupt. Specifically, they
      wake a logical processor from the shutdown state and from the states
      entered using the HLT and MWAIT instructions.
      
      Fixes: 6dfacadd ("KVM: nVMX: Add support for activity state HLT")
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NPeter Shier <pshier@google.com>
      Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
      [Squashed comments of two Jim's patches and used the simplified code
       hunk provided by Sean. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      646f8e01
  2. 06 4月, 2019 15 次提交
    • G
      Linux 4.19.34 · 4d552acf
      Greg Kroah-Hartman 提交于
      4d552acf
    • A
      kprobes/x86: Blacklist non-attachable interrupt functions · d5813e77
      Andrea Righi 提交于
      [ Upstream commit a50480cb6d61d5c5fc13308479407b628b6bc1c5 ]
      
      These interrupt functions are already non-attachable by kprobes.
      Blacklist them explicitly so that they can show up in
      /sys/kernel/debug/kprobes/blacklist and tools like BCC can use this
      additional information.
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yonghong Song <yhs@fb.com>
      Link: http://lkml.kernel.org/r/20181206095648.GA8249@DellSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d5813e77
    • C
      bcache: fix potential div-zero error of writeback_rate_p_term_inverse · e7d26616
      Coly Li 提交于
      [ Upstream commit 5b5fd3c94eef69dcfaa8648198e54c92e5687d6d ]
      
      Current code already uses d_strtoul_nonzero() to convert input string
      to an unsigned integer, to make sure writeback_rate_p_term_inverse
      won't be zero value. But overflow may happen when converting input
      string to an unsigned integer value by d_strtoul_nonzero(), then
      dc->writeback_rate_p_term_inverse can still be set to 0 even if the
      sysfs file input value is not zero, e.g. 4294967296 (a.k.a UINT_MAX+1).
      
      If dc->writeback_rate_p_term_inverse is set to 0, it might cause a
      dev-zero error in following code from __update_writeback_rate(),
      	int64_t proportional_scaled =
      		div_s64(error, dc->writeback_rate_p_term_inverse);
      
      This patch replaces d_strtoul_nonzero() by sysfs_strtoul_clamp() and
      limit the value range in [1, UINT_MAX]. Then the unsigned integer
      overflow and dev-zero error can be avoided.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e7d26616
    • H
      ACPI / video: Extend chassis-type detection with a "Lunch Box" check · 09abe130
      Hans de Goede 提交于
      [ Upstream commit d693c008e3ca04db5916ff72e68ce661888a913b ]
      
      Commit 53fa1f6e ("ACPI / video: Only default only_lcd to true on
      Win8-ready _desktops_") introduced chassis type detection, limiting the
      lcd_only check for the backlight to devices where the chassis-type
      indicates their is no builtin LCD panel.
      
      The purpose of the lcd_only check is to avoid advertising a backlight
      interface on desktops, since skylake and newer machines seem to always
      have a backlight interface even if there is no LCD panel. The limiting
      of this check to desktops only was done to avoid breaking backlight
      support on some laptops which do not have the lcd flag set.
      
      The Fujitsu ESPRIMO Q910 which is a compact (NUC like) desktop machine
      has a chassis type of 0x10 aka "Lunch Box". Without the lcd_only check
      we end up falsely advertising backlight/brightness control on this
      device. This commit extend the dmi_is_desktop check to return true
      for type 0x10 to fix this.
      
      Fixes: 53fa1f6e ("ACPI / video: Only default only_lcd to true ...")
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      09abe130
    • N
      net: stmmac: Avoid one more sometimes uninitialized Clang warning · d1d2ca98
      Nathan Chancellor 提交于
      [ Upstream commit 1f5d861f7fefa971b2c6e766f77932c86419a319 ]
      
      When building with -Wsometimes-uninitialized, Clang warns:
      
      drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c:111:2: error: variable
      'ns' is used uninitialized whenever 'if' condition is false
      [-Werror,-Wsometimes-uninitialized]
      drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c:111:2: error: variable
      'ns' is used uninitialized whenever '&&' condition is false
      [-Werror,-Wsometimes-uninitialized]
      
      Clang is concerned with the use of stmmac_do_void_callback (which
      stmmac_get_systime wraps), as it may fail to initialize these values if
      the if condition was ever false (meaning the callback doesn't exist).
      It's not wrong because the callback is what initializes ns. While it's
      unlikely that the callback is going to disappear at some point and make
      that condition false, we can easily avoid this warning by zero
      initializing the variable.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/384
      Fixes: df103170854e ("net: stmmac: Avoid sometimes uninitialized Clang warnings")
      Suggested-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d1d2ca98
    • V
      drm/dp/mst: Configure no_stop_bit correctly for remote i2c xfers · 972e31ba
      Ville Syrjälä 提交于
      [ Upstream commit c978ae9bde582e82a04c63a4071701691dd8b35c ]
      
      We aren't supposed to force a stop+start between every i2c msg
      when performing multi message transfers. This should eg. cause
      the DDC segment address to be reset back to 0 between writing
      the segment address and reading the actual EDID extension block.
      
      To quote the E-DDC spec:
      "... this standard requires that the segment pointer be
       reset to 00h when a NO ACK or a STOP condition is received."
      
      Since we're going to touch this might as well consult the
      I2C_M_STOP flag to determine whether we want to force the stop
      or not.
      
      Cc: Brian Vincent <brainn@gmail.com>
      References: https://bugs.freedesktop.org/show_bug.cgi?id=108081Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180928180403.22499-1-ville.syrjala@linux.intel.comReviewed-by: NDhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      972e31ba
    • H
      Input: soc_button_array - fix mapping of the 5th GPIO in a PNP0C40 device · 986a2bb5
      Hans de Goede 提交于
      [ Upstream commit e9eb788f9442d1b5d93efdb30c3be071ce8a22b1 ]
      
      The Microsoft documenation for the PNP0C40 device aka the
      "Windows-compatible button array" describes the 5th GpioInt listed in
      the resources as: '5. Interrupt corresponding to the "Rotation Lock"
      button, if supported'.
      
      Notice this describes the 5th entry as a button while we sofar have been
      mapping it to EV_SW, SW_ROTATE_LOCK. On my Point of View TAB P1006W-232
      which actually comes with a rotation-lock button, the button indeed is a
      button and not a slider/switch. An image search for other Windows tablets
      has found 2 more models with a rotation-lock button and on both of those
      it too is a push-button and not a slider/switch.
      
      Further evidence can be found in the HUT extension HUTRR52 from Microsoft
      which adds rotation lock support to the HUT, which describes 2 different
      usages: "0xC9 System Display Rotation Lock Button" and
      "0xCA System Display Rotation Lock Slider Switch" note that switch is seen
      as a separate thing here and the non switch wording is an exact match for
      the "Windows-compatible button array" spec wording.
      
      TL;DR: our current mapping of the 5th GPIO to SW_ROTATE_LOCK is wrong
      because the 5th GPIO is for a push-button not a switch.
      
      This commit fixes this by maping the 5th GPIO to KEY_ROTATE_LOCK_TOGGLE.
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      986a2bb5
    • B
      dmaengine: tegra: avoid overflow of byte tracking · 6d2817e2
      Ben Dooks 提交于
      [ Upstream commit e486df39305864604b7e25f2a95d51039517ac57 ]
      
      The dma_desc->bytes_transferred counter tracks the number of bytes
      moved by the DMA channel. This is then used to calculate the information
      passed back in the in the tegra_dma_tx_status callback, which is usually
      fine.
      
      When the DMA channel is configured as continous, then the bytes_transferred
      counter will increase over time and eventually overflow to become negative
      so the residue count will become invalid and the ALSA sound-dma code will
      report invalid hardware pointer values to the application. This results in
      some users becoming confused about the playout position and putting audio
      data in the wrong place.
      
      To fix this issue, always ensure the bytes_transferred field is modulo the
      size of the request. We only do this for the case of the cyclic transfer
      done ISR as anyone attempting to move 2GiB of DMA data in one transfer
      is unlikely.
      
      Note, we don't fix the issue that we should /never/ transfer a negative
      number of bytes so we could make those fields unsigned.
      Reviewed-by: NDmitry Osipenko <digetx@gmail.com>
      Signed-off-by: NBen Dooks <ben.dooks@codethink.co.uk>
      Acked-by: NJon Hunter <jonathanh@nvidia.com>
      Signed-off-by: NVinod Koul <vkoul@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6d2817e2
    • K
      clk: rockchip: fix frac settings of GPLL clock for rk3328 · 7386f095
      Katsuhiro Suzuki 提交于
      [ Upstream commit a0e447b0c50240a90ab84b7126b3c06b0bab4adc ]
      
      This patch fixes settings of GPLL frequency in fractional mode for
      rk3328. In this mode, FOUTVCO is calcurated by following formula:
        FOUTVCO = FREF * FBDIV / REFDIV + ((FREF * FRAC / REFDIV) >> 24)
      
      The problem is in FREF * FRAC >> 24 term. This result always lacks
      one from target value is specified by rate member. For example first
      itme of rk3328_pll_frac_rate originally has
        - rate  : 1016064000
        - refdiv: 3
        - fbdiv : 127
        - frac  : 134217
        - FREF * FBDIV / REFDIV        = 1016000000
        - (FREF * FRAC / REFDIV) >> 24 = 63999
      Thus calculated rate is 1016063999. It seems wrong.
      
      If frac has 134218 (it is increased 1 from original value), second
      term is 64000. All other items have same situation. So this patch
      adds 1 to frac member in all items of rk3328_pll_frac_rate.
      Signed-off-by: NKatsuhiro Suzuki <katsuhiro@katsuster.net>
      Acked-by: NElaine Zhang <zhangqing@rock-chips.com>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7386f095
    • J
      clk: meson: clean-up clock registration · c8e4f840
      Jerome Brunet 提交于
      [ Upstream commit 8d9981efbcab066d17af4d3c85c169200f6f78df ]
      
      Order, ids and size  between the table of regmap clocks and the onecell
      data table could be different.
      
      Set regmap pointer in all the regmap clocks before starting the
      registration using the onecell data, to make sure we don't
      get into an incoherent situation.
      Signed-off-by: NJerome Brunet <jbrunet@baylibre.com>
      Acked-by: NNeil Armstrong <narmstrong@baylibre.com>
      Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com>
      Link: https://lkml.kernel.org/r/20181221160239.26265-3-jbrunet@baylibre.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      c8e4f840
    • P
      drm/fb-helper: fix leaks in error path of drm_fb_helper_fbdev_setup · 6251c1db
      Peter Wu 提交于
      [ Upstream commit 00eb5b0da8d27b3c944bfc959c3344d665caae26 ]
      
      After drm_fb_helper_fbdev_setup calls drm_fb_helper_init,
      "dev->fb_helper" will be initialized (and thus drm_fb_helper_fini will
      have some effect). After that, drm_fb_helper_initial_config is called
      which may call the "fb_probe" driver callback.
      
      This driver callback may call drm_fb_helper_defio_init (as is done by
      drm_fb_helper_generic_probe) or set a framebuffer (as is done by bochs)
      as documented. These are normally cleaned up on exit by
      drm_fb_helper_fbdev_teardown which also calls drm_fb_helper_fini.
      
      If an error occurs after "fb_probe", but before setup is complete, then
      calling just drm_fb_helper_fini will leak resources. This was triggered
      by df2052cc922 ("bochs: convert to drm_fb_helper_fbdev_setup/teardown"):
      
          [   50.008030] bochsdrmfb: enable CONFIG_FB_LITTLE_ENDIAN to support this framebuffer
          [   50.009436] bochs-drm 0000:00:02.0: [drm:drm_fb_helper_fbdev_setup] *ERROR* fbdev: Failed to set configuration (ret=-38)
          [   50.011456] [drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:02.0 on minor 2
          [   50.013604] WARNING: CPU: 1 PID: 1 at drivers/gpu/drm/drm_mode_config.c:477 drm_mode_config_cleanup+0x280/0x2a0
          [   50.016175] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G                T 4.20.0-rc7 #1
          [   50.017732] EIP: drm_mode_config_cleanup+0x280/0x2a0
          ...
          [   50.023155] Call Trace:
          [   50.023155]  ? bochs_kms_fini+0x1e/0x30
          [   50.023155]  ? bochs_unload+0x18/0x40
      
      This can be reproduced with QEMU and CONFIG_FB_LITTLE_ENDIAN=n.
      
      Link: https://lkml.kernel.org/r/20181221083226.GI23332@shao2-debian
      Link: https://lkml.kernel.org/r/20181223004315.GA11455@al
      Fixes: 87412163 ("drm/fb-helper: Add drm_fb_helper_fbdev_setup/teardown()")
      Reported-by: Nkernel test robot <rong.a.chen@intel.com>
      Cc: Noralf Trønnes <noralf@tronnes.org>
      Signed-off-by: NPeter Wu <peter@lekensteyn.nl>
      Reviewed-by: NNoralf Trønnes <noralf@tronnes.org>
      Signed-off-by: NNoralf Trønnes <noralf@tronnes.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181223005507.28328-1-peter@lekensteyn.nlSigned-off-by: NSasha Levin <sashal@kernel.org>
      6251c1db
    • R
      x86/build: Mark per-CPU symbols as absolute explicitly for LLD · 648b949b
      Rafael Ávila de Espíndola 提交于
      [ Upstream commit d071ae09a4a1414c1433d5ae9908959a7325b0ad ]
      
      Accessing per-CPU variables is done by finding the offset of the
      variable in the per-CPU block and adding it to the address of the
      respective CPU's block.
      
      Section 3.10.8 of ld.bfd's documentation states:
      
        For expressions involving numbers, relative addresses and absolute
        addresses, ld follows these rules to evaluate terms:
      
        Other binary operations, that is, between two relative addresses
        not in the same section, or between a relative address and an
        absolute address, first convert any non-absolute term to an
        absolute address before applying the operator."
      
      Note that LLVM's linker does not adhere to the GNU ld's implementation
      and as such requires implicitly-absolute terms to be explicitly marked
      as absolute in the linker script. If not, it fails currently with:
      
        ld.lld: error: ./arch/x86/kernel/vmlinux.lds:153: at least one side of the expression must be absolute
        ld.lld: error: ./arch/x86/kernel/vmlinux.lds:154: at least one side of the expression must be absolute
        Makefile:1040: recipe for target 'vmlinux' failed
      
      This is not a functional change for ld.bfd which converts the term to an
      absolute symbol anyways as specified above.
      
      Based on a previous submission by Tri Vo <trong@android.com>.
      Reported-by: NDmitry Golovin <dima@golovin.in>
      Signed-off-by: NRafael Ávila de Espíndola <rafael@espindo.la>
      [ Update commit message per Boris' and Michael's suggestions. ]
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      [ Massage commit message more, fix typos. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NDmitry Golovin <dima@golovin.in>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Cao Jin <caoj.fnst@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tri Vo <trong@android.com>
      Cc: dima@golovin.in
      Cc: morbo@google.com
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20181219190145.252035-1-ndesaulniers@google.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      648b949b
    • Z
      wlcore: Fix memory leak in case wl12xx_fetch_firmware failure · 52cd9e0e
      Zumeng Chen 提交于
      [ Upstream commit ba2ffc96321c8433606ceeb85c9e722b8113e5a7 ]
      
      Release fw_status, raw_fw_status, and tx_res_if when wl12xx_fetch_firmware
      failed instead of meaningless goto out to avoid the following memory leak
      reports(Only the last one listed):
      
      unreferenced object 0xc28a9a00 (size 512):
        comm "kworker/0:4", pid 31298, jiffies 2783204 (age 203.290s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
        backtrace:
          [<6624adab>] kmemleak_alloc+0x40/0x74
          [<500ddb31>] kmem_cache_alloc_trace+0x1ac/0x270
          [<db4d731d>] wl12xx_chip_wakeup+0xc4/0x1fc [wlcore]
          [<76c5db53>] wl1271_op_add_interface+0x4a4/0x8f4 [wlcore]
          [<cbf30777>] drv_add_interface+0xa4/0x1a0 [mac80211]
          [<65bac325>] ieee80211_reconfig+0x9c0/0x1644 [mac80211]
          [<2817c80e>] ieee80211_restart_work+0x90/0xc8 [mac80211]
          [<7e1d425a>] process_one_work+0x284/0x42c
          [<55f9432e>] worker_thread+0x2fc/0x48c
          [<abb582c6>] kthread+0x148/0x160
          [<63144b13>] ret_from_fork+0x14/0x2c
          [< (null)>] (null)
          [<1f6e7715>] 0xffffffff
      Signed-off-by: NZumeng Chen <zumeng.chen@gmail.com>
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      52cd9e0e
    • H
      brcmfmac: Use firmware_request_nowarn for the clm_blob · 05b23c66
      Hans de Goede 提交于
      [ Upstream commit 4ad0be160544ffbdafb7cec39bb8e6dd0a97317a ]
      
      The linux-firmware brcmfmac firmware files contain an embedded table with
      per country allowed channels and strength info.
      
      For recent hardware these versions of the firmware are specially build for
      linux-firmware, the firmware files directly available from Cypress rely on
      a separate clm_blob file for this info.
      
      For some unknown reason Cypress refuses to provide the standard firmware
      files + clm_blob files it uses elsewhere for inclusion into linux-firmware,
      instead relying on these special builds with the clm_blob info embedded.
      This means that the linux-firmware firmware versions often lag behind,
      but I digress.
      
      The brcmfmac driver does support the separate clm_blob file and always
      tries to load this. Currently we use request_firmware for this. This means
      that on any standard install, using the standard combo of linux-kernel +
      linux-firmware, we will get a warning:
      "Direct firmware load for ... failed with error -2"
      
      On top of this, brcmfmac itself prints: "no clm_blob available (err=-2),
      device may have limited channels available".
      
      This commit switches to firmware_request_nowarn, fixing almost any brcmfmac
      device logging the warning (it leaves the brcmfmac info message in place).
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      05b23c66
    • O
      selinux: do not override context on context mounts · e30e0b09
      Ondrej Mosnacek 提交于
      [ Upstream commit 53e0c2aa9a59a48e3798ef193d573ade85aa80f5 ]
      
      Ignore all selinux_inode_notifysecctx() calls on mounts with SBLABEL_MNT
      flag unset. This is achived by returning -EOPNOTSUPP for this case in
      selinux_inode_setsecurtity() (because that function should not be called
      in such case anyway) and translating this error to 0 in
      selinux_inode_notifysecctx().
      
      This fixes behavior of kernfs-based filesystems when mounted with the
      'context=' option. Before this patch, if a node's context had been
      explicitly set to a non-default value and later the filesystem has been
      remounted with the 'context=' option, then this node would show up as
      having the manually-set context and not the mount-specified one.
      
      Steps to reproduce:
          # mount -t cgroup2 cgroup2 /sys/fs/cgroup/unified
          # chcon unconfined_u:object_r:user_home_t:s0 /sys/fs/cgroup/unified/cgroup.stat
          # ls -lZ /sys/fs/cgroup/unified
          total 0
          -r--r--r--. 1 root root system_u:object_r:cgroup_t:s0        0 Dec 13 10:41 cgroup.controllers
          -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0        0 Dec 13 10:41 cgroup.max.depth
          -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0        0 Dec 13 10:41 cgroup.max.descendants
          -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0        0 Dec 13 10:41 cgroup.procs
          -r--r--r--. 1 root root unconfined_u:object_r:user_home_t:s0 0 Dec 13 10:41 cgroup.stat
          -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0        0 Dec 13 10:41 cgroup.subtree_control
          -rw-r--r--. 1 root root system_u:object_r:cgroup_t:s0        0 Dec 13 10:41 cgroup.threads
          # umount /sys/fs/cgroup/unified
          # mount -o context=system_u:object_r:tmpfs_t:s0 -t cgroup2 cgroup2 /sys/fs/cgroup/unified
      
      Result before:
          # ls -lZ /sys/fs/cgroup/unified
          total 0
          -r--r--r--. 1 root root system_u:object_r:tmpfs_t:s0         0 Dec 13 10:41 cgroup.controllers
          -rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0         0 Dec 13 10:41 cgroup.max.depth
          -rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0         0 Dec 13 10:41 cgroup.max.descendants
          -rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0         0 Dec 13 10:41 cgroup.procs
          -r--r--r--. 1 root root unconfined_u:object_r:user_home_t:s0 0 Dec 13 10:41 cgroup.stat
          -rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0         0 Dec 13 10:41 cgroup.subtree_control
          -rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0         0 Dec 13 10:41 cgroup.threads
      
      Result after:
          # ls -lZ /sys/fs/cgroup/unified
          total 0
          -r--r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.controllers
          -rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.max.depth
          -rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.max.descendants
          -rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.procs
          -r--r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.stat
          -rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.subtree_control
          -rw-r--r--. 1 root root system_u:object_r:tmpfs_t:s0 0 Dec 13 10:41 cgroup.threads
      Signed-off-by: NOndrej Mosnacek <omosnace@redhat.com>
      Reviewed-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e30e0b09