1. 20 5月, 2020 1 次提交
  2. 15 5月, 2020 4 次提交
    • B
      x86: Fix early boot crash on gcc-10, third try · a9a3ed1e
      Borislav Petkov 提交于
      ... or the odyssey of trying to disable the stack protector for the
      function which generates the stack canary value.
      
      The whole story started with Sergei reporting a boot crash with a kernel
      built with gcc-10:
      
        Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary
        CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b3 #139
        Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013
        Call Trace:
          dump_stack
          panic
          ? start_secondary
          __stack_chk_fail
          start_secondary
          secondary_startup_64
        -—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary
      
      This happens because gcc-10 tail-call optimizes the last function call
      in start_secondary() - cpu_startup_entry() - and thus emits a stack
      canary check which fails because the canary value changes after the
      boot_init_stack_canary() call.
      
      To fix that, the initial attempt was to mark the one function which
      generates the stack canary with:
      
        __attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused)
      
      however, using the optimize attribute doesn't work cumulatively
      as the attribute does not add to but rather replaces previously
      supplied optimization options - roughly all -fxxx options.
      
      The key one among them being -fno-omit-frame-pointer and thus leading to
      not present frame pointer - frame pointer which the kernel needs.
      
      The next attempt to prevent compilers from tail-call optimizing
      the last function call cpu_startup_entry(), shy of carving out
      start_secondary() into a separate compilation unit and building it with
      -fno-stack-protector, was to add an empty asm("").
      
      This current solution was short and sweet, and reportedly, is supported
      by both compilers but we didn't get very far this time: future (LTO?)
      optimization passes could potentially eliminate this, which leads us
      to the third attempt: having an actual memory barrier there which the
      compiler cannot ignore or move around etc.
      
      That should hold for a long time, but hey we said that about the other
      two solutions too so...
      Reported-by: NSergei Trofimovich <slyfox@gentoo.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NKalle Valo <kvalo@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org
      a9a3ed1e
    • K
      net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810 · cc8a677a
      Kevin Lo 提交于
      Set the correct bit when checking for PHY_BRCM_DIS_TXCRXC_NOENRGY on the
      BCM54810 PHY.
      
      Fixes: 0ececcfc ("net: phy: broadcom: Allow BCM54810 to use bcm54xx_adjust_rxrefclk()")
      Signed-off-by: NKevin Lo <kevlo@kevlo.org>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc8a677a
    • A
      security: Fix the default value of secid_to_secctx hook · 625236ba
      Anders Roxell 提交于
      security_secid_to_secctx is called by the bpf_lsm hook and a successful
      return value (i.e 0) implies that the parameter will be consumed by the
      LSM framework. The current behaviour return success when the pointer
      isn't initialized when CONFIG_BPF_LSM is enabled, with the default
      return from kernel/bpf/bpf_lsm.c.
      
      This is the internal error:
      
      [ 1229.341488][ T2659] usercopy: Kernel memory exposure attempt detected from null address (offset 0, size 280)!
      [ 1229.374977][ T2659] ------------[ cut here ]------------
      [ 1229.376813][ T2659] kernel BUG at mm/usercopy.c:99!
      [ 1229.378398][ T2659] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
      [ 1229.380348][ T2659] Modules linked in:
      [ 1229.381654][ T2659] CPU: 0 PID: 2659 Comm: systemd-journal Tainted: G    B   W         5.7.0-rc5-next-20200511-00019-g864e0c6319b8-dirty #13
      [ 1229.385429][ T2659] Hardware name: linux,dummy-virt (DT)
      [ 1229.387143][ T2659] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--)
      [ 1229.389165][ T2659] pc : usercopy_abort+0xc8/0xcc
      [ 1229.390705][ T2659] lr : usercopy_abort+0xc8/0xcc
      [ 1229.392225][ T2659] sp : ffff000064247450
      [ 1229.393533][ T2659] x29: ffff000064247460 x28: 0000000000000000
      [ 1229.395449][ T2659] x27: 0000000000000118 x26: 0000000000000000
      [ 1229.397384][ T2659] x25: ffffa000127049e0 x24: ffffa000127049e0
      [ 1229.399306][ T2659] x23: ffffa000127048e0 x22: ffffa000127048a0
      [ 1229.401241][ T2659] x21: ffffa00012704b80 x20: ffffa000127049e0
      [ 1229.403163][ T2659] x19: ffffa00012704820 x18: 0000000000000000
      [ 1229.405094][ T2659] x17: 0000000000000000 x16: 0000000000000000
      [ 1229.407008][ T2659] x15: 0000000000000000 x14: 003d090000000000
      [ 1229.408942][ T2659] x13: ffff80000d5b25b2 x12: 1fffe0000d5b25b1
      [ 1229.410859][ T2659] x11: 1fffe0000d5b25b1 x10: ffff80000d5b25b1
      [ 1229.412791][ T2659] x9 : ffffa0001034bee0 x8 : ffff00006ad92d8f
      [ 1229.414707][ T2659] x7 : 0000000000000000 x6 : ffffa00015eacb20
      [ 1229.416642][ T2659] x5 : ffff0000693c8040 x4 : 0000000000000000
      [ 1229.418558][ T2659] x3 : ffffa0001034befc x2 : d57a7483a01c6300
      [ 1229.420610][ T2659] x1 : 0000000000000000 x0 : 0000000000000059
      [ 1229.422526][ T2659] Call trace:
      [ 1229.423631][ T2659]  usercopy_abort+0xc8/0xcc
      [ 1229.425091][ T2659]  __check_object_size+0xdc/0x7d4
      [ 1229.426729][ T2659]  put_cmsg+0xa30/0xa90
      [ 1229.428132][ T2659]  unix_dgram_recvmsg+0x80c/0x930
      [ 1229.429731][ T2659]  sock_recvmsg+0x9c/0xc0
      [ 1229.431123][ T2659]  ____sys_recvmsg+0x1cc/0x5f8
      [ 1229.432663][ T2659]  ___sys_recvmsg+0x100/0x160
      [ 1229.434151][ T2659]  __sys_recvmsg+0x110/0x1a8
      [ 1229.435623][ T2659]  __arm64_sys_recvmsg+0x58/0x70
      [ 1229.437218][ T2659]  el0_svc_common.constprop.1+0x29c/0x340
      [ 1229.438994][ T2659]  do_el0_svc+0xe8/0x108
      [ 1229.440587][ T2659]  el0_svc+0x74/0x88
      [ 1229.441917][ T2659]  el0_sync_handler+0xe4/0x8b4
      [ 1229.443464][ T2659]  el0_sync+0x17c/0x180
      [ 1229.444920][ T2659] Code: aa1703e2 aa1603e1 910a8260 97ecc860 (d4210000)
      [ 1229.447070][ T2659] ---[ end trace 400497d91baeaf51 ]---
      [ 1229.448791][ T2659] Kernel panic - not syncing: Fatal exception
      [ 1229.450692][ T2659] Kernel Offset: disabled
      [ 1229.452061][ T2659] CPU features: 0x240002,20002004
      [ 1229.453647][ T2659] Memory Limit: none
      [ 1229.455015][ T2659] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Rework the so the default return value is -EOPNOTSUPP.
      
      There are likely other callbacks such as security_inode_getsecctx() that
      may have the same problem, and that someone that understand the code
      better needs to audit them.
      
      Thank you Arnd for helping me figure out what went wrong.
      
      Fixes: 98e828a0 ("security: Refactor declaration of LSM hooks")
      Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJames Morris <jamorris@linux.microsoft.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Link: https://lore.kernel.org/bpf/20200512174607.9630-1-anders.roxell@linaro.org
      625236ba
    • Y
      mm, memcg: fix inconsistent oom event behavior · 04fd61a4
      Yafang Shao 提交于
      A recent commit 9852ae3f ("mm, memcg: consider subtrees in
      memory.events") changed the behavior of memcg events, which will now
      consider subtrees in memory.events.
      
      But oom_kill event is a special one as it is used in both cgroup1 and
      cgroup2.  In cgroup1, it is displayed in memory.oom_control.  The file
      memory.oom_control is in both root memcg and non root memcg, that is
      different with memory.event as it only in non-root memcg.  That commit
      is okay for cgroup2, but it is not okay for cgroup1 as it will cause
      inconsistent behavior between root memcg and non-root memcg.
      
      Here's an example on why this behavior is inconsistent in cgroup1.
      
             root memcg
             /
          memcg foo
           /
        memcg bar
      
      Suppose there's an oom_kill in memcg bar, then the oon_kill will be
      
             root memcg : memory.oom_control(oom_kill)  0
             /
          memcg foo : memory.oom_control(oom_kill)  1
           /
        memcg bar : memory.oom_control(oom_kill)  1
      
      For the non-root memcg, its memory.oom_control(oom_kill) includes its
      descendants' oom_kill, but for root memcg, it doesn't include its
      descendants' oom_kill.  That means, memory.oom_control(oom_kill) has
      different meanings in different memcgs.  That is inconsistent.  Then the
      user has to know whether the memcg is root or not.
      
      If we can't fully support it in cgroup1, for example by adding
      memory.events.local into cgroup1 as well, then let's don't touch its
      original behavior.
      
      Fixes: 9852ae3f ("mm, memcg: consider subtrees in memory.events")
      Reported-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NChris Down <chris@chrisdown.name>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200502141055.7378-1-laoar.shao@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04fd61a4
  3. 13 5月, 2020 2 次提交
    • S
      x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up · 59566b0b
      Steven Rostedt (VMware) 提交于
      Booting one of my machines, it triggered the following crash:
      
       Kernel/User page tables isolation: enabled
       ftrace: allocating 36577 entries in 143 pages
       Starting tracer 'function'
       BUG: unable to handle page fault for address: ffffffffa000005c
       #PF: supervisor write access in kernel mode
       #PF: error_code(0x0003) - permissions violation
       PGD 2014067 P4D 2014067 PUD 2015063 PMD 7b253067 PTE 7b252061
       Oops: 0003 [#1] PREEMPT SMP PTI
       CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-test+ #24
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
       RIP: 0010:text_poke_early+0x4a/0x58
       Code: 34 24 48 89 54 24 08 e8 bf 72 0b 00 48 8b 34 24 48 8b 4c 24 08 84 c0 74 0b 48 89 df f3 a4 48 83 c4 10 5b c3 9c 58 fa 48 89 df <f3> a4 50 9d 48 83 c4 10 5b e9 d6 f9 ff ff
      0 41 57 49
       RSP: 0000:ffffffff82003d38 EFLAGS: 00010046
       RAX: 0000000000000046 RBX: ffffffffa000005c RCX: 0000000000000005
       RDX: 0000000000000005 RSI: ffffffff825b9a90 RDI: ffffffffa000005c
       RBP: ffffffffa000005c R08: 0000000000000000 R09: ffffffff8206e6e0
       R10: ffff88807b01f4c0 R11: ffffffff8176c106 R12: ffffffff8206e6e0
       R13: ffffffff824f2440 R14: 0000000000000000 R15: ffffffff8206eac0
       FS:  0000000000000000(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: ffffffffa000005c CR3: 0000000002012000 CR4: 00000000000006b0
       Call Trace:
        text_poke_bp+0x27/0x64
        ? mutex_lock+0x36/0x5d
        arch_ftrace_update_trampoline+0x287/0x2d5
        ? ftrace_replace_code+0x14b/0x160
        ? ftrace_update_ftrace_func+0x65/0x6c
        __register_ftrace_function+0x6d/0x81
        ftrace_startup+0x23/0xc1
        register_ftrace_function+0x20/0x37
        func_set_flag+0x59/0x77
        __set_tracer_option.isra.19+0x20/0x3e
        trace_set_options+0xd6/0x13e
        apply_trace_boot_options+0x44/0x6d
        register_tracer+0x19e/0x1ac
        early_trace_init+0x21b/0x2c9
        start_kernel+0x241/0x518
        ? load_ucode_intel_bsp+0x21/0x52
        secondary_startup_64+0xa4/0xb0
      
      I was able to trigger it on other machines, when I added to the kernel
      command line of both "ftrace=function" and "trace_options=func_stack_trace".
      
      The cause is the "ftrace=function" would register the function tracer
      and create a trampoline, and it will set it as executable and
      read-only. Then the "trace_options=func_stack_trace" would then update
      the same trampoline to include the stack tracer version of the function
      tracer. But since the trampoline already exists, it updates it with
      text_poke_bp(). The problem is that text_poke_bp() called while
      system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not
      the page mapping, as it would think that the text is still read-write.
      But in this case it is not, and we take a fault and crash.
      
      Instead, lets keep the ftrace trampolines read-write during boot up,
      and then when the kernel executable text is set to read-only, the
      ftrace trampolines get set to read-only as well.
      
      Link: https://lkml.kernel.org/r/20200430202147.4dc6e2de@oasis.local.home
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: stable@vger.kernel.org
      Fixes: 768ae440 ("x86/ftrace: Use text_poke()")
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      59566b0b
    • J
      ptp: fix struct member comment for do_aux_work · 2c864c78
      Jacob Keller 提交于
      The do_aux_work callback had documentation in the structure comment
      which referred to it as "do_work".
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c864c78
  4. 10 5月, 2020 1 次提交
  5. 08 5月, 2020 1 次提交
  6. 07 5月, 2020 2 次提交
  7. 06 5月, 2020 1 次提交
    • J
      bpf, sockmap: bpf_tcp_ingress needs to subtract bytes from sg.size · 81aabbb9
      John Fastabend 提交于
      In bpf_tcp_ingress we used apply_bytes to subtract bytes from sg.size
      which is used to track total bytes in a message. But this is not
      correct because apply_bytes is itself modified in the main loop doing
      the mem_charge.
      
      Then at the end of this we have sg.size incorrectly set and out of
      sync with actual sk values. Then we can get a splat if we try to
      cork the data later and again try to redirect the msg to ingress. To
      fix instead of trying to track msg.size do the easy thing and include
      it as part of the sk_msg_xfer logic so that when the msg is moved the
      sg.size is always correct.
      
      To reproduce the below users will need ingress + cork and hit an
      error path that will then try to 'free' the skmsg.
      
      [  173.699981] BUG: KASAN: null-ptr-deref in sk_msg_free_elem+0xdd/0x120
      [  173.699987] Read of size 8 at addr 0000000000000008 by task test_sockmap/5317
      
      [  173.700000] CPU: 2 PID: 5317 Comm: test_sockmap Tainted: G          I       5.7.0-rc1+ #43
      [  173.700005] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
      [  173.700009] Call Trace:
      [  173.700021]  dump_stack+0x8e/0xcb
      [  173.700029]  ? sk_msg_free_elem+0xdd/0x120
      [  173.700034]  ? sk_msg_free_elem+0xdd/0x120
      [  173.700042]  __kasan_report+0x102/0x15f
      [  173.700052]  ? sk_msg_free_elem+0xdd/0x120
      [  173.700060]  kasan_report+0x32/0x50
      [  173.700070]  sk_msg_free_elem+0xdd/0x120
      [  173.700080]  __sk_msg_free+0x87/0x150
      [  173.700094]  tcp_bpf_send_verdict+0x179/0x4f0
      [  173.700109]  tcp_bpf_sendpage+0x3ce/0x5d0
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/158861290407.14306.5327773422227552482.stgit@john-Precision-5820-Tower
      81aabbb9
  8. 05 5月, 2020 5 次提交
  9. 01 5月, 2020 2 次提交
    • K
      security: Fix the default value of fs_context_parse_param hook · 54261af4
      KP Singh 提交于
      security_fs_context_parse_param is called by vfs_parse_fs_param and
      a succussful return value (i.e 0) implies that a parameter will be
      consumed by the LSM framework. This stops all further parsing of the
      parmeter by VFS. Furthermore, if an LSM hook returns a success, the
      remaining LSM hooks are not invoked for the parameter.
      
      The current default behavior of returning success means that all the
      parameters are expected to be parsed by the LSM hook and none of them
      end up being populated by vfs in fs_context
      
      This was noticed when lsm=bpf is supplied on the command line before any
      other LSM. As the bpf lsm uses this default value to implement a default
      hook, this resulted in a failure to parse any fs_context parameters and
      a failure to mount the root filesystem.
      
      Fixes: 98e828a0 ("security: Refactor declaration of LSM hooks")
      Reported-by: NMikko Ylinen <mikko.ylinen@linux.intel.com>
      Signed-off-by: NKP Singh <kpsingh@google.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      54261af4
    • P
      mptcp: move option parsing into mptcp_incoming_options() · cfde141e
      Paolo Abeni 提交于
      The mptcp_options_received structure carries several per
      packet flags (mp_capable, mp_join, etc.). Such fields must
      be cleared on each packet, even on dropped ones or packet
      not carrying any MPTCP options, but the current mptcp
      code clears them only on TCP option reset.
      
      On several races/corner cases we end-up with stray bits in
      incoming options, leading to WARN_ON splats. e.g.:
      
      [  171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713
      [  171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata
      [  171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95
      [  171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      [  171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c
      [  171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282
      [  171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [  171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e
      [  171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955
      [  171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca
      [  171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020
      [  171.228460] FS:  00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000
      [  171.230065] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0
      [  171.232586] Call Trace:
      [  171.233109]  <IRQ>
      [  171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691)
      [  171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832)
      [  171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1))
      [  171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217)
      [  171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822)
      [  171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730)
      [  171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009)
      [  171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1))
      [  171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232)
      [  171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252)
      [  171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539)
      [  171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135)
      [  171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640)
      [  171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293)
      [  171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083)
      [  171.282358]  </IRQ>
      
      We could address the issue clearing explicitly the relevant fields
      in several places - tcp_parse_option, tcp_fast_parse_options,
      possibly others.
      
      Instead we move the MPTCP option parsing into the already existing
      mptcp ingress hook, so that we need to clear the fields in a single
      place.
      
      This allows us dropping an MPTCP hook from the TCP code and
      removing the quite large mptcp_options_received from the tcp_sock
      struct. On the flip side, the MPTCP sockets will traverse the
      option space twice (in tcp_parse_option() and in
      mptcp_incoming_options(). That looks acceptable: we already
      do that for syn and 3rd ack packets, plain TCP socket will
      benefit from it, and even MPTCP sockets will experience better
      code locality, reducing the jumps between TCP and MPTCP code.
      
      v1 -> v2:
       - rebased on current '-net' tree
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfde141e
  10. 30 4月, 2020 2 次提交
  11. 29 4月, 2020 2 次提交
    • O
      NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION · dff58530
      Olga Kornievskaia 提交于
      Currently, if the client sends BIND_CONN_TO_SESSION with
      NFS4_CDFC4_FORE_OR_BOTH but only gets NFS4_CDFS4_FORE back it ignores
      that it wasn't able to enable a backchannel.
      
      To make sure, the client sends BIND_CONN_TO_SESSION as the first
      operation on the connections (ie., no other session compounds haven't
      been sent before), and if the client's request to bind the backchannel
      is not satisfied, then reset the connection and retry.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      dff58530
    • N
      SUNRPC: defer slow parts of rpc_free_client() to a workqueue. · 7c4310ff
      NeilBrown 提交于
      The rpciod workqueue is on the write-out path for freeing dirty memory,
      so it is important that it never block waiting for memory to be
      allocated - this can lead to a deadlock.
      
      rpc_execute() - which is often called by an rpciod work item - calls
      rcp_task_release_client() which can lead to rpc_free_client().
      
      rpc_free_client() makes two calls which could potentially block wating
      for memory allocation.
      
      rpc_clnt_debugfs_unregister() calls into debugfs and will block while
      any of the debugfs files are being accessed.  In particular it can block
      while any of the 'open' methods are being called and all of these use
      malloc for one thing or another.  So this can deadlock if the memory
      allocation waits for NFS to complete some writes via rpciod.
      
      rpc_clnt_remove_pipedir() can take the inode_lock() and while it isn't
      obvious that memory allocations can happen while the lock it held, it is
      safer to assume they might and to not let rpciod call
      rpc_clnt_remove_pipedir().
      
      So this patch moves these two calls (together with the final kfree() and
      rpciod_down()) into a work-item to be run from the system work-queue.
      rpciod can continue its important work, and the final stages of the free
      can happen whenever they happen.
      
      I have seen this deadlock on a 4.12 based kernel where debugfs used
      synchronize_srcu() when removing objects.  synchronize_srcu() requires a
      workqueue and there were no free workther threads and none could be
      allocated.  While debugsfs no longer uses SRCU, I believe the deadlock
      is still possible.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      7c4310ff
  12. 28 4月, 2020 4 次提交
  13. 27 4月, 2020 4 次提交
    • D
      dmaengine: fix channel index enumeration · 08210094
      Dave Jiang 提交于
      When the channel register code was changed to allow hotplug operations,
      dynamic indexing wasn't taken into account. When channels are randomly
      plugged and unplugged out of order, the serial indexing breaks. Convert
      channel indexing to using IDA tracking in order to allow dynamic
      assignment. The previous code does not cause any regression bug for
      existing channel allocation besides idxd driver since the hotplug usage
      case is only used by idxd at this point.
      
      With this change, the chan->idr_ref is also not needed any longer. We can
      have a device with no channels registered due to hot plug. The channel
      device release code no longer should attempt to free the dma device id on
      the last channel release.
      
      Fixes: e81274cd ("dmaengine: add support to dynamic register/unregister of channels")
      Reported-by: NYixin Zhang <yixin.zhang@intel.com>
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Tested-by: NYixin Zhang <yixin.zhang@intel.com>
      Link: https://lore.kernel.org/r/158679961260.7674.8485924270472851852.stgit@djiang5-desk3.ch.intel.comSigned-off-by: NVinod Koul <vkoul@kernel.org>
      08210094
    • C
      SUNRPC: Revert 241b1f41 ("SUNRPC: Remove xdr_buf_trim()") · 0a8e7b7d
      Chuck Lever 提交于
      I've noticed that when krb5i or krb5p security is in use,
      retransmitted requests are missing the server's duplicate reply
      cache. The computed checksum on the retransmitted request does not
      match the cached checksum, resulting in the server performing the
      retransmitted request again instead of returning the cached reply.
      
      The assumptions made when removing xdr_buf_trim() were not correct.
      In the send paths, the upper layer has already set the segment
      lengths correctly, and shorting the buffer's content is simply a
      matter of reducing buf->len.
      
      xdr_buf_trim() is the right answer in the receive/unwrap path on
      both the client and the server. The buffer segment lengths have to
      be shortened one-by-one.
      
      On the server side in particular, head.iov_len needs to be updated
      correctly to enable nfsd_cache_csum() to work correctly. The simple
      buf->len computation doesn't do that, and that results in
      checksumming stale data in the buffer.
      
      The problem isn't noticed until there's significant instability of
      the RPC transport. At that point, the reliability of retransmit
      detection on the server becomes crucial.
      
      Fixes: 241b1f41 ("SUNRPC: Remove xdr_buf_trim()")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      0a8e7b7d
    • C
      SUNRPC: Fix GSS privacy computation of auth->au_ralign · a7e429a6
      Chuck Lever 提交于
      When the au_ralign field was added to gss_unwrap_resp_priv, the
      wrong calculation was used. Setting au_rslack == au_ralign is
      probably correct for kerberos_v1 privacy, but kerberos_v2 privacy
      adds additional GSS data after the clear text RPC message.
      au_ralign needs to be smaller than au_rslack in that fairly common
      case.
      
      When xdr_buf_trim() is restored to gss_unwrap_kerberos_v2(), it does
      exactly what I feared it would: it trims off part of the clear text
      RPC message. However, that's because rpc_prepare_reply_pages() does
      not set up the rq_rcv_buf's tail correctly because au_ralign is too
      large.
      
      Fixing the au_ralign computation also corrects the alignment of
      rq_rcv_buf->pages so that the client does not have to shift reply
      data payloads after they are received.
      
      Fixes: 35e77d21 ("SUNRPC: Add rpc_auth::au_ralign field")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      a7e429a6
    • C
      SUNRPC: Add "@len" parameter to gss_unwrap() · 31c9590a
      Chuck Lever 提交于
      Refactor: This is a pre-requisite to fixing the client-side ralign
      computation in gss_unwrap_resp_priv().
      
      The length value is passed in explicitly rather that as the value
      of buf->len. This will subsequently allow gss_unwrap_kerberos_v1()
      to compute a slack and align value, instead of computing it in
      gss_unwrap_resp_priv().
      
      Fixes: 35e77d21 ("SUNRPC: Add rpc_auth::au_ralign field")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      31c9590a
  14. 23 4月, 2020 2 次提交
    • N
      tracing: Remove DECLARE_TRACE_NOARGS · a2806ef7
      Nikolay Borisov 提交于
      This macro was intentionally broken so that the kernel code is not
      poluted with such noargs macro used simply as markers. This use case
      can be satisfied by using dummy no inline functions. Just remove it.
      
      Link: http://lkml.kernel.org/r/20200413153246.8511-1-nborisov@suse.comSigned-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      a2806ef7
    • M
      arch: split MODULE_ARCH_VERMAGIC definitions out to <asm/vermagic.h> · 62d0fd59
      Masahiro Yamada 提交于
      As the bug report [1] pointed out, <linux/vermagic.h> must be included
      after <linux/module.h>.
      
      I believe we should not impose any include order restriction. We often
      sort include directives alphabetically, but it is just coding style
      convention. Technically, we can include header files in any order by
      making every header self-contained.
      
      Currently, arch-specific MODULE_ARCH_VERMAGIC is defined in
      <asm/module.h>, which is not included from <linux/vermagic.h>.
      
      Hence, the straight-forward fix-up would be as follows:
      
      |--- a/include/linux/vermagic.h
      |+++ b/include/linux/vermagic.h
      |@@ -1,5 +1,6 @@
      | /* SPDX-License-Identifier: GPL-2.0 */
      | #include <generated/utsrelease.h>
      |+#include <linux/module.h>
      |
      | /* Simply sanity version stamp for modules. */
      | #ifdef CONFIG_SMP
      
      This works enough, but for further cleanups, I split MODULE_ARCH_VERMAGIC
      definitions into <asm/vermagic.h>.
      
      With this, <linux/module.h> and <linux/vermagic.h> will be orthogonal,
      and the location of MODULE_ARCH_VERMAGIC definitions will be consistent.
      
      For arc and ia64, MODULE_PROC_FAMILY is only used for defining
      MODULE_ARCH_VERMAGIC. I squashed it.
      
      For hexagon, nds32, and xtensa, I removed <asm/modules.h> entirely
      because they contained nothing but MODULE_ARCH_VERMAGIC definition.
      Kbuild will automatically generate <asm/modules.h> at build-time,
      wrapping <asm-generic/module.h>.
      
      [1] https://lore.kernel.org/lkml/20200411155623.GA22175@zn.tnicReported-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      Acked-by: NJessica Yu <jeyu@kernel.org>
      62d0fd59
  15. 22 4月, 2020 3 次提交
    • J
      pnp: Use list_for_each_entry() instead of open coding · 01b2bafe
      Jason Gunthorpe 提交于
      Aside from good practice, this avoids a warning from gcc 10:
      
      ./include/linux/kernel.h:997:3: warning: array subscript -31 is outside array bounds of ‘struct list_head[1]’ [-Warray-bounds]
        997 |  ((type *)(__mptr - offsetof(type, member))); })
            |  ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      ./include/linux/list.h:493:2: note: in expansion of macro ‘container_of’
        493 |  container_of(ptr, type, member)
            |  ^~~~~~~~~~~~
      ./include/linux/pnp.h:275:30: note: in expansion of macro ‘list_entry’
        275 | #define global_to_pnp_dev(n) list_entry(n, struct pnp_dev, global_list)
            |                              ^~~~~~~~~~
      ./include/linux/pnp.h:281:11: note: in expansion of macro ‘global_to_pnp_dev’
        281 |  (dev) != global_to_pnp_dev(&pnp_global); \
            |           ^~~~~~~~~~~~~~~~~
      arch/x86/kernel/rtc.c:189:2: note: in expansion of macro ‘pnp_for_each_dev’
        189 |  pnp_for_each_dev(dev) {
      
      Because the common code doesn't cast the starting list_head to the
      containing struct.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      [ rjw: Whitespace adjustments ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      01b2bafe
    • V
      net: stmmac: Enable SERDES power up/down sequence · b9663b7c
      Voon Weifeng 提交于
      This patch is to enable Intel SERDES power up/down sequence. The SERDES
      converts 8/10 bits data to SGMII signal. Below is an example of
      HW configuration for SGMII mode. The SERDES is located in the PHY IF
      in the diagram below.
      
      <-----------------GBE Controller---------->|<--External PHY chip-->
      +----------+         +----+            +---+           +----------+
      |   EQoS   | <-GMII->| DW | < ------ > |PHY| <-SGMII-> | External |
      |   MAC    |         |xPCS|            |IF |           | PHY      |
      +----------+         +----+            +---+           +----------+
             ^               ^                 ^                ^
             |               |                 |                |
             +---------------------MDIO-------------------------+
      
      PHY IF configuration and status registers are accessible through
      mdio address 0x15 which is defined as mdio_adhoc_addr. During D0,
      The driver will need to power up PHY IF by changing the power state
      to P0. Likewise, for D3, the driver sets PHY IF power state to P3.
      Signed-off-by: NVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: NOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9663b7c
    • J
      vmalloc: fix remap_vmalloc_range() bounds checks · bdebd6a2
      Jann Horn 提交于
      remap_vmalloc_range() has had various issues with the bounds checks it
      promises to perform ("This function checks that addr is a valid
      vmalloc'ed area, and that it is big enough to cover the vma") over time,
      e.g.:
      
       - not detecting pgoff<<PAGE_SHIFT overflow
      
       - not detecting (pgoff<<PAGE_SHIFT)+usize overflow
      
       - not checking whether addr and addr+(pgoff<<PAGE_SHIFT) are the same
         vmalloc allocation
      
       - comparing a potentially wildly out-of-bounds pointer with the end of
         the vmalloc region
      
      In particular, since commit fc970227 ("bpf: Add mmap() support for
      BPF_MAP_TYPE_ARRAY"), unprivileged users can cause kernel null pointer
      dereferences by calling mmap() on a BPF map with a size that is bigger
      than the distance from the start of the BPF map to the end of the
      address space.
      
      This could theoretically be used as a kernel ASLR bypass, by using
      whether mmap() with a given offset oopses or returns an error code to
      perform a binary search over the possible address range.
      
      To allow remap_vmalloc_range_partial() to verify that addr and
      addr+(pgoff<<PAGE_SHIFT) are in the same vmalloc region, pass the offset
      to remap_vmalloc_range_partial() instead of adding it to the pointer in
      remap_vmalloc_range().
      
      In remap_vmalloc_range_partial(), fix the check against
      get_vm_area_size() by using size comparisons instead of pointer
      comparisons, and add checks for pgoff.
      
      Fixes: 83342314 ("[PATCH] mm: introduce remap_vmalloc_range()")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: Andrii Nakryiko <andriin@fb.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: KP Singh <kpsingh@chromium.org>
      Link: http://lkml.kernel.org/r/20200415222312.236431-1-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bdebd6a2
  16. 20 4月, 2020 1 次提交
  17. 19 4月, 2020 3 次提交
    • G
      xattr.h: Replace zero-length array with flexible-array member · 43951585
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      43951585
    • G
      tpm_eventlog.h: Replace zero-length array with flexible-array member · 06ccf63d
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      06ccf63d
    • G
      ti_wilink_st.h: Replace zero-length array with flexible-array member · 4ea19ecf
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      4ea19ecf