1. 27 12月, 2019 4 次提交
  2. 21 12月, 2019 8 次提交
    • T
      tipc: fix ordering of tipc module init and exit routine · 9430afbc
      Taehee Yoo 提交于
      [ Upstream commit 9cf1cd8ee3ee09ef2859017df2058e2f53c5347f ]
      
      In order to set/get/dump, the tipc uses the generic netlink
      infrastructure. So, when tipc module is inserted, init function
      calls genl_register_family().
      After genl_register_family(), set/get/dump commands are immediately
      allowed and these callbacks internally use the net_generic.
      net_generic is allocated by register_pernet_device() but this
      is called after genl_register_family() in the __init function.
      So, these callbacks would use un-initialized net_generic.
      
      Test commands:
          #SHELL1
          while :
          do
              modprobe tipc
              modprobe -rv tipc
          done
      
          #SHELL2
          while :
          do
              tipc link list
          done
      
      Splat looks like:
      [   59.616322][ T2788] kasan: CONFIG_KASAN_INLINE enabled
      [   59.617234][ T2788] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [   59.618398][ T2788] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [   59.619389][ T2788] CPU: 3 PID: 2788 Comm: tipc Not tainted 5.4.0+ #194
      [   59.620231][ T2788] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   59.621428][ T2788] RIP: 0010:tipc_bcast_get_broadcast_mode+0x131/0x310 [tipc]
      [   59.622379][ T2788] Code: c7 c6 ef 8b 38 c0 65 ff 0d 84 83 c9 3f e8 d7 a5 f2 e3 48 8d bb 38 11 00 00 48 b8 00 00 00 00
      [   59.622550][ T2780] NET: Registered protocol family 30
      [   59.624627][ T2788] RSP: 0018:ffff88804b09f578 EFLAGS: 00010202
      [   59.624630][ T2788] RAX: dffffc0000000000 RBX: 0000000000000011 RCX: 000000008bc66907
      [   59.624631][ T2788] RDX: 0000000000000229 RSI: 000000004b3cf4cc RDI: 0000000000001149
      [   59.624633][ T2788] RBP: ffff88804b09f588 R08: 0000000000000003 R09: fffffbfff4fb3df1
      [   59.624635][ T2788] R10: fffffbfff50318f8 R11: ffff888066cadc18 R12: ffffffffa6cc2f40
      [   59.624637][ T2788] R13: 1ffff11009613eba R14: ffff8880662e9328 R15: ffff8880662e9328
      [   59.624639][ T2788] FS:  00007f57d8f7b740(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000
      [   59.624645][ T2788] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   59.625875][ T2780] tipc: Started in single node mode
      [   59.626128][ T2788] CR2: 00007f57d887a8c0 CR3: 000000004b140002 CR4: 00000000000606e0
      [   59.633991][ T2788] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   59.635195][ T2788] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   59.636478][ T2788] Call Trace:
      [   59.637025][ T2788]  tipc_nl_add_bc_link+0x179/0x1470 [tipc]
      [   59.638219][ T2788]  ? lock_downgrade+0x6e0/0x6e0
      [   59.638923][ T2788]  ? __tipc_nl_add_link+0xf90/0xf90 [tipc]
      [   59.639533][ T2788]  ? tipc_nl_node_dump_link+0x318/0xa50 [tipc]
      [   59.640160][ T2788]  ? mutex_lock_io_nested+0x1380/0x1380
      [   59.640746][ T2788]  tipc_nl_node_dump_link+0x4fd/0xa50 [tipc]
      [   59.641356][ T2788]  ? tipc_nl_node_reset_link_stats+0x340/0x340 [tipc]
      [   59.642088][ T2788]  ? __skb_ext_del+0x270/0x270
      [   59.642594][ T2788]  genl_lock_dumpit+0x85/0xb0
      [   59.643050][ T2788]  netlink_dump+0x49c/0xed0
      [   59.643529][ T2788]  ? __netlink_sendskb+0xc0/0xc0
      [   59.644044][ T2788]  ? __netlink_dump_start+0x190/0x800
      [   59.644617][ T2788]  ? __mutex_unlock_slowpath+0xd0/0x670
      [   59.645177][ T2788]  __netlink_dump_start+0x5a0/0x800
      [   59.645692][ T2788]  genl_rcv_msg+0xa75/0xe90
      [   59.646144][ T2788]  ? __lock_acquire+0xdfe/0x3de0
      [   59.646692][ T2788]  ? genl_family_rcv_msg_attrs_parse+0x320/0x320
      [   59.647340][ T2788]  ? genl_lock_dumpit+0xb0/0xb0
      [   59.647821][ T2788]  ? genl_unlock+0x20/0x20
      [   59.648290][ T2788]  ? genl_parallel_done+0xe0/0xe0
      [   59.648787][ T2788]  ? find_held_lock+0x39/0x1d0
      [   59.649276][ T2788]  ? genl_rcv+0x15/0x40
      [   59.649722][ T2788]  ? lock_contended+0xcd0/0xcd0
      [   59.650296][ T2788]  netlink_rcv_skb+0x121/0x350
      [   59.650828][ T2788]  ? genl_family_rcv_msg_attrs_parse+0x320/0x320
      [   59.651491][ T2788]  ? netlink_ack+0x940/0x940
      [   59.651953][ T2788]  ? lock_acquire+0x164/0x3b0
      [   59.652449][ T2788]  genl_rcv+0x24/0x40
      [   59.652841][ T2788]  netlink_unicast+0x421/0x600
      [ ... ]
      
      Fixes: 7e436905 ("tipc: fix a slab object leak")
      Fixes: a62fbcce ("tipc: make subscriber server support net namespace")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9430afbc
    • E
      tcp: md5: fix potential overestimation of TCP option space · a148815a
      Eric Dumazet 提交于
      [ Upstream commit 9424e2e7ad93ffffa88f882c9bc5023570904b55 ]
      
      Back in 2008, Adam Langley fixed the corner case of packets for flows
      having all of the following options : MD5 TS SACK
      
      Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block
      can be cooked from the remaining 8 bytes.
      
      tcp_established_options() correctly sets opts->num_sack_blocks
      to zero, but returns 36 instead of 32.
      
      This means TCP cooks packets with 4 extra bytes at the end
      of options, containing unitialized bytes.
      
      Fixes: 33ad798c ("tcp: options clean up")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a148815a
    • A
      openvswitch: support asymmetric conntrack · 6f99afcc
      Aaron Conole 提交于
      [ Upstream commit 5d50aa83e2c8e91ced2cca77c198b468ca9210f4 ]
      
      The openvswitch module shares a common conntrack and NAT infrastructure
      exposed via netfilter.  It's possible that a packet needs both SNAT and
      DNAT manipulation, due to e.g. tuple collision.  Netfilter can support
      this because it runs through the NAT table twice - once on ingress and
      again after egress.  The openvswitch module doesn't have such capability.
      
      Like netfilter hook infrastructure, we should run through NAT twice to
      keep the symmetry.
      
      Fixes: 05752523 ("openvswitch: Interface with NAT.")
      Signed-off-by: NAaron Conole <aconole@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f99afcc
    • D
      net: sched: fix dump qlen for sch_mq/sch_mqprio with NOLOCK subqueues · 0c5a4dd6
      Dust Li 提交于
      [ Upstream commit 2f23cd42e19c22c24ff0e221089b7b6123b117c5 ]
      
      sch->q.len hasn't been set if the subqueue is a NOLOCK qdisc
       in mq_dump() and mqprio_dump().
      
      Fixes: ce679e8d ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio")
      Signed-off-by: NDust Li <dust.li@linux.alibaba.com>
      Signed-off-by: NTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c5a4dd6
    • A
      net: dsa: fix flow dissection on Tx path · a7d80e75
      Alexander Lobakin 提交于
      [ Upstream commit 8bef0af09a5415df761b04fa487a6c34acae74bc ]
      
      Commit 43e66528 ("net-next: dsa: fix flow dissection") added an
      ability to override protocol and network offset during flow dissection
      for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
      in order to fix skb hashing for RPS on Rx path.
      
      However, skb_hash() and added part of code can be invoked not only on
      Rx, but also on Tx path if we have a multi-queued device and:
       - kernel is running on UP system or
       - XPS is not configured.
      
      The call stack in this two cases will be like: dev_queue_xmit() ->
      __dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
      skb_tx_hash() -> skb_get_hash().
      
      The problem is that skbs queued for Tx have both network offset and
      correct protocol already set up even after inserting a CPU tag by DSA
      tagger, so calling tag_ops->flow_dissect() on this path actually only
      breaks flow dissection and hashing.
      
      This can be observed by adding debug prints just before and right after
      tag_ops->flow_dissect() call to the related block of code:
      
      Before the patch:
      
      Rx path (RPS):
      
      [   19.240001] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   19.244271] tag_ops->flow_dissect()
      [   19.247811] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
      
      [   19.215435] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   19.219746] tag_ops->flow_dissect()
      [   19.223241] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
      
      [   18.654057] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   18.658332] tag_ops->flow_dissect()
      [   18.661826] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
      
      Tx path (UP system):
      
      [   18.759560] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
      [   18.763933] tag_ops->flow_dissect()
      [   18.767485] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      [   22.800020] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
      [   22.804392] tag_ops->flow_dissect()
      [   22.807921] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      [   16.898342] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
      [   16.902705] tag_ops->flow_dissect()
      [   16.906227] Tx: proto: 0x920b, nhoff: 34	/* junk */
      
      After:
      
      Rx path (RPS):
      
      [   16.520993] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   16.525260] tag_ops->flow_dissect()
      [   16.528808] Rx: proto: 0x0800, nhoff: 8	/* ETH_P_IP */
      
      [   15.484807] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   15.490417] tag_ops->flow_dissect()
      [   15.495223] Rx: proto: 0x0806, nhoff: 8	/* ETH_P_ARP */
      
      [   17.134621] Rx: proto: 0x00f8, nhoff: 0	/* ETH_P_XDSA */
      [   17.138895] tag_ops->flow_dissect()
      [   17.142388] Rx: proto: 0x8100, nhoff: 8	/* ETH_P_8021Q */
      
      Tx path (UP system):
      
      [   15.499558] Tx: proto: 0x0800, nhoff: 26	/* ETH_P_IP */
      
      [   20.664689] Tx: proto: 0x0806, nhoff: 26	/* ETH_P_ARP */
      
      [   18.565782] Tx: proto: 0x86dd, nhoff: 26	/* ETH_P_IPV6 */
      
      In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
      to prevent code from calling tag_ops->flow_dissect() on Tx.
      I also decided to initialize 'offset' variable so tagger callbacks can
      now safely leave it untouched without provoking a chaos.
      
      Fixes: 43e66528 ("net-next: dsa: fix flow dissection")
      Signed-off-by: NAlexander Lobakin <alobakin@dlink.ru>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7d80e75
    • N
      net: bridge: deny dev_set_mac_address() when unregistering · bb168ebe
      Nikolay Aleksandrov 提交于
      [ Upstream commit c4b4c421857dc7b1cf0dccbd738472360ff2cd70 ]
      
      We have an interesting memory leak in the bridge when it is being
      unregistered and is a slave to a master device which would change the
      mac of its slaves on unregister (e.g. bond, team). This is a very
      unusual setup but we do end up leaking 1 fdb entry because
      dev_set_mac_address() would cause the bridge to insert the new mac address
      into its table after all fdbs are flushed, i.e. after dellink() on the
      bridge has finished and we call NETDEV_UNREGISTER the bond/team would
      release it and will call dev_set_mac_address() to restore its original
      address and that in turn will add an fdb in the bridge.
      One fix is to check for the bridge dev's reg_state in its
      ndo_set_mac_address callback and return an error if the bridge is not in
      NETREG_REGISTERED.
      
      Easy steps to reproduce:
       1. add bond in mode != A/B
       2. add any slave to the bond
       3. add bridge dev as a slave to the bond
       4. destroy the bridge device
      
      Trace:
       unreferenced object 0xffff888035c4d080 (size 128):
         comm "ip", pid 4068, jiffies 4296209429 (age 1413.753s)
         hex dump (first 32 bytes):
           41 1d c9 36 80 88 ff ff 00 00 00 00 00 00 00 00  A..6............
           d2 19 c9 5e 3f d7 00 00 00 00 00 00 00 00 00 00  ...^?...........
         backtrace:
           [<00000000ddb525dc>] kmem_cache_alloc+0x155/0x26f
           [<00000000633ff1e0>] fdb_create+0x21/0x486 [bridge]
           [<0000000092b17e9c>] fdb_insert+0x91/0xdc [bridge]
           [<00000000f2a0f0ff>] br_fdb_change_mac_address+0xb3/0x175 [bridge]
           [<000000001de02dbd>] br_stp_change_bridge_id+0xf/0xff [bridge]
           [<00000000ac0e32b1>] br_set_mac_address+0x76/0x99 [bridge]
           [<000000006846a77f>] dev_set_mac_address+0x63/0x9b
           [<00000000d30738fc>] __bond_release_one+0x3f6/0x455 [bonding]
           [<00000000fc7ec01d>] bond_netdev_event+0x2f2/0x400 [bonding]
           [<00000000305d7795>] notifier_call_chain+0x38/0x56
           [<0000000028885d4a>] call_netdevice_notifiers+0x1e/0x23
           [<000000008279477b>] rollback_registered_many+0x353/0x6a4
           [<0000000018ef753a>] unregister_netdevice_many+0x17/0x6f
           [<00000000ba854b7a>] rtnl_delete_link+0x3c/0x43
           [<00000000adf8618d>] rtnl_dellink+0x1dc/0x20a
           [<000000009b6395fd>] rtnetlink_rcv_msg+0x23d/0x268
      
      Fixes: 43598813 ("bridge: add local MAC address to forwarding table (v2)")
      Reported-by: syzbot+2add91c08eb181fea1bf@syzkaller.appspotmail.com
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb168ebe
    • V
      mqprio: Fix out-of-bounds access in mqprio_dump · 588fac83
      Vladyslav Tarasiuk 提交于
      [ Upstream commit 9f104c7736904ac72385bbb48669e0c923ca879b ]
      
      When user runs a command like
      tc qdisc add dev eth1 root mqprio
      KASAN stack-out-of-bounds warning is emitted.
      Currently, NLA_ALIGN macro used in mqprio_dump provides too large
      buffer size as argument for nla_put and memcpy down the call stack.
      The flow looks like this:
      1. nla_put expects exact object size as an argument;
      2. Later it provides this size to memcpy;
      3. To calculate correct padding for SKB, nla_put applies NLA_ALIGN
         macro itself.
      
      Therefore, NLA_ALIGN should not be applied to the nla_put parameter.
      Otherwise it will lead to out-of-bounds memory access in memcpy.
      
      Fixes: 4e8b86c0 ("mqprio: Introduce new hardware offload mode and shaper in mqprio")
      Signed-off-by: NVladyslav Tarasiuk <vladyslavt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      588fac83
    • E
      inet: protect against too small mtu values. · d80d67cd
      Eric Dumazet 提交于
      [ Upstream commit 501a90c945103e8627406763dac418f20f3837b2 ]
      
      syzbot was once again able to crash a host by setting a very small mtu
      on loopback device.
      
      Let's make inetdev_valid_mtu() available in include/net/ip.h,
      and use it in ip_setup_cork(), so that we protect both ip_append_page()
      and __ip_append_data()
      
      Also add a READ_ONCE() when the device mtu is read.
      
      Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(),
      even if other code paths might write over this field.
      
      Add a big comment in include/linux/netdevice.h about dev->mtu
      needing READ_ONCE()/WRITE_ONCE() annotations.
      
      Hopefully we will add the missing ones in followup patches.
      
      [1]
      
      refcount_t: saturated; leaking memory.
      WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x197/0x210 lib/dump_stack.c:118
       panic+0x2e3/0x75c kernel/panic.c:221
       __warn.cold+0x2f/0x3e kernel/panic.c:582
       report_bug+0x289/0x300 lib/bug.c:195
       fixup_bug arch/x86/kernel/traps.c:174 [inline]
       fixup_bug arch/x86/kernel/traps.c:169 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
       do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
       invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
      RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
      Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89
      RSP: 0018:ffff88809689f550 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c
      RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1
      R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001
      R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40
       refcount_add include/linux/refcount.h:193 [inline]
       skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999
       sock_wmalloc+0xf1/0x120 net/core/sock.c:2096
       ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383
       udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276
       inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821
       kernel_sendpage+0x92/0xf0 net/socket.c:3794
       sock_sendpage+0x8b/0xc0 net/socket.c:936
       pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458
       splice_from_pipe_feed fs/splice.c:512 [inline]
       __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636
       splice_from_pipe+0x108/0x170 fs/splice.c:671
       generic_splice_sendpage+0x3c/0x50 fs/splice.c:842
       do_splice_from fs/splice.c:861 [inline]
       direct_splice_actor+0x123/0x190 fs/splice.c:1035
       splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990
       do_splice_direct+0x1da/0x2a0 fs/splice.c:1078
       do_sendfile+0x597/0xd00 fs/read_write.c:1464
       __do_sys_sendfile64 fs/read_write.c:1525 [inline]
       __se_sys_sendfile64 fs/read_write.c:1511 [inline]
       __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x441409
      Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409
      RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005
      RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010
      R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180
      R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Fixes: 1470ddf7 ("inet: Remove explicit write references to sk/inet in ip_append_data")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d80d67cd
  3. 18 12月, 2019 4 次提交
    • P
      sunrpc: fix crash when cache_head become valid before update · 67256225
      Pavel Tikhomirov 提交于
      [ Upstream commit 5fcaf6982d1167f1cd9b264704f6d1ef4c505d54 ]
      
      I was investigating a crash in our Virtuozzo7 kernel which happened in
      in svcauth_unix_set_client. I found out that we access m_client field
      in ip_map structure, which was received from sunrpc_cache_lookup (we
      have a bit older kernel, now the code is in sunrpc_cache_add_entry), and
      these field looks uninitialized (m_client == 0x74 don't look like a
      pointer) but in the cache_head in flags we see 0x1 which is CACHE_VALID.
      
      It looks like the problem appeared from our previous fix to sunrpc (1):
      commit 4ecd55ea0742 ("sunrpc: fix cache_head leak due to queued
      request")
      
      And we've also found a patch already fixing our patch (2):
      commit d58431eacb22 ("sunrpc: don't mark uninitialised items as VALID.")
      
      Though the crash is eliminated, I think the core of the problem is not
      completely fixed:
      
      Neil in the patch (2) makes cache_head CACHE_NEGATIVE, before
      cache_fresh_locked which was added in (1) to fix crash. These way
      cache_is_valid won't say the cache is valid anymore and in
      svcauth_unix_set_client the function cache_check will return error
      instead of 0, and we don't count entry as initialized.
      
      But it looks like we need to remove cache_fresh_locked completely in
      sunrpc_cache_lookup:
      
      In (1) we've only wanted to make cache_fresh_unlocked->cache_dequeue so
      that cache_requests with no readers also release corresponding
      cache_head, to fix their leak.  We with Vasily were not sure if
      cache_fresh_locked and cache_fresh_unlocked should be used in pair or
      not, so we've guessed to use them in pair.
      
      Now we see that we don't want the CACHE_VALID bit set here by
      cache_fresh_locked, as "valid" means "initialized" and there is no
      initialization in sunrpc_cache_add_entry. Both expiry_time and
      last_refresh are not used in cache_fresh_unlocked code-path and also not
      required for the initial fix.
      
      So to conclude cache_fresh_locked was called by mistake, and we can just
      safely remove it instead of crutching it with CACHE_NEGATIVE. It looks
      ideologically better for me. Hope I don't miss something here.
      
      Here is our crash backtrace:
      [13108726.326291] BUG: unable to handle kernel NULL pointer dereference at 0000000000000074
      [13108726.326365] IP: [<ffffffffc01f79eb>] svcauth_unix_set_client+0x2ab/0x520 [sunrpc]
      [13108726.326448] PGD 0
      [13108726.326468] Oops: 0002 [#1] SMP
      [13108726.326497] Modules linked in: nbd isofs xfs loop kpatch_cumulative_81_0_r1(O) xt_physdev nfnetlink_queue bluetooth rfkill ip6table_nat nf_nat_ipv6 ip_vs_wrr ip_vs_wlc ip_vs_sh nf_conntrack_netlink ip_vs_sed ip_vs_pe_sip nf_conntrack_sip ip_vs_nq ip_vs_lc ip_vs_lblcr ip_vs_lblc ip_vs_ftp ip_vs_dh nf_nat_ftp nf_conntrack_ftp iptable_raw xt_recent nf_log_ipv6 xt_hl ip6t_rt nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_TCPMSS xt_tcpmss vxlan ip6_udp_tunnel udp_tunnel xt_statistic xt_NFLOG nfnetlink_log dummy xt_mark xt_REDIRECT nf_nat_redirect raw_diag udp_diag tcp_diag inet_diag netlink_diag af_packet_diag unix_diag rpcsec_gss_krb5 xt_addrtype ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 ebtable_nat ebtable_broute nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_raw nfsv4
      [13108726.327173]  dns_resolver cls_u32 binfmt_misc arptable_filter arp_tables ip6table_filter ip6_tables devlink fuse_kio_pcs ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat iptable_nat nf_nat_ipv4 xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_wdog_tmo xt_multiport bonding xt_set xt_conntrack iptable_filter iptable_mangle kpatch(O) ebtable_filter ebt_among ebtables ip_set_hash_ip ip_set nfnetlink vfat fat skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass fuse pcspkr ses enclosure joydev sg mei_me hpwdt hpilo lpc_ich mei ipmi_si shpchp ipmi_devintf ipmi_msghandler xt_ipvs acpi_power_meter ip_vs_rr nfsv3 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache nf_nat cls_fw sch_htb sch_cbq sch_sfq ip_vs em_u32 nf_conntrack tun br_netfilter veth overlay ip6_vzprivnet ip6_vznetstat ip_vznetstat
      [13108726.327817]  ip_vzprivnet vziolimit vzevent vzlist vzstat vznetstat vznetdev vzmon vzdev bridge pio_kaio pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper scsi_transport_iscsi 8021q syscopyarea sysfillrect garp sysimgblt fb_sys_fops mrp stp ttm llc bnx2x crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel drm dm_multipath ghash_clmulni_intel uas aesni_intel lrw gf128mul glue_helper ablk_helper cryptd tg3 smartpqi scsi_transport_sas mdio libcrc32c i2c_core usb_storage ptp pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kpatch_cumulative_82_0_r1]
      [13108726.328403] CPU: 35 PID: 63742 Comm: nfsd ve: 51332 Kdump: loaded Tainted: G        W  O   ------------   3.10.0-862.20.2.vz7.73.29 #1 73.29
      [13108726.328491] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 10/02/2018
      [13108726.328554] task: ffffa0a6a41b1160 ti: ffffa0c2a74bc000 task.ti: ffffa0c2a74bc000
      [13108726.328610] RIP: 0010:[<ffffffffc01f79eb>]  [<ffffffffc01f79eb>] svcauth_unix_set_client+0x2ab/0x520 [sunrpc]
      [13108726.328706] RSP: 0018:ffffa0c2a74bfd80  EFLAGS: 00010246
      [13108726.328750] RAX: 0000000000000001 RBX: ffffa0a6183ae000 RCX: 0000000000000000
      [13108726.328811] RDX: 0000000000000074 RSI: 0000000000000286 RDI: ffffa0c2a74bfcf0
      [13108726.328864] RBP: ffffa0c2a74bfe00 R08: ffffa0bab8c22960 R09: 0000000000000001
      [13108726.328916] R10: 0000000000000001 R11: 0000000000000001 R12: ffffa0a32aa7f000
      [13108726.328969] R13: ffffa0a6183afac0 R14: ffffa0c233d88d00 R15: ffffa0c2a74bfdb4
      [13108726.329022] FS:  0000000000000000(0000) GS:ffffa0e17f9c0000(0000) knlGS:0000000000000000
      [13108726.329081] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [13108726.332311] CR2: 0000000000000074 CR3: 00000026a1b28000 CR4: 00000000007607e0
      [13108726.334606] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [13108726.336754] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [13108726.338908] PKRU: 00000000
      [13108726.341047] Call Trace:
      [13108726.343074]  [<ffffffff8a2c78b4>] ? groups_alloc+0x34/0x110
      [13108726.344837]  [<ffffffffc01f5eb4>] svc_set_client+0x24/0x30 [sunrpc]
      [13108726.346631]  [<ffffffffc01f2ac1>] svc_process_common+0x241/0x710 [sunrpc]
      [13108726.348332]  [<ffffffffc01f3093>] svc_process+0x103/0x190 [sunrpc]
      [13108726.350016]  [<ffffffffc07d605f>] nfsd+0xdf/0x150 [nfsd]
      [13108726.351735]  [<ffffffffc07d5f80>] ? nfsd_destroy+0x80/0x80 [nfsd]
      [13108726.353459]  [<ffffffff8a2bf741>] kthread+0xd1/0xe0
      [13108726.355195]  [<ffffffff8a2bf670>] ? create_kthread+0x60/0x60
      [13108726.356896]  [<ffffffff8a9556dd>] ret_from_fork_nospec_begin+0x7/0x21
      [13108726.358577]  [<ffffffff8a2bf670>] ? create_kthread+0x60/0x60
      [13108726.360240] Code: 4c 8b 45 98 0f 8e 2e 01 00 00 83 f8 fe 0f 84 76 fe ff ff 85 c0 0f 85 2b 01 00 00 49 8b 50 40 b8 01 00 00 00 48 89 93 d0 1a 00 00 <f0> 0f c1 02 83 c0 01 83 f8 01 0f 8e 53 02 00 00 49 8b 44 24 38
      [13108726.363769] RIP  [<ffffffffc01f79eb>] svcauth_unix_set_client+0x2ab/0x520 [sunrpc]
      [13108726.365530]  RSP <ffffa0c2a74bfd80>
      [13108726.367179] CR2: 0000000000000074
      
      Fixes: d58431eacb22 ("sunrpc: don't mark uninitialised items as VALID.")
      Signed-off-by: NPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      67256225
    • C
      gre: refetch erspan header from skb->data after pskb_may_pull() · f7654ebe
      Cong Wang 提交于
      [ Upstream commit 0e4940928c26527ce8f97237fef4c8a91cd34207 ]
      
      After pskb_may_pull() we should always refetch the header
      pointers from the skb->data in case it got reallocated.
      
      In gre_parse_header(), the erspan header is still fetched
      from the 'options' pointer which is fetched before
      pskb_may_pull().
      
      Found this during code review of a KMSAN bug report.
      
      Fixes: cb73ee40b1b3 ("net: ip_gre: use erspan key field for tunnel lookup")
      Cc: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f7654ebe
    • K
      net/smc: do not wait under send_lock · 82060d93
      Karsten Graul 提交于
      [ Upstream commit 33f3fcc290671590821ff3c0c9396db1ec9b7d4c ]
      
      smc_cdc_get_free_slot() might wait for free transfer buffers when using
      SMC-R. This wait should not be done under the send_lock, which is a
      spin_lock. This fixes a cpu loop in parallel threads waiting for the
      send_lock.
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      82060d93
    • T
      sch_cake: Correctly update parent qlen when splitting GSO packets · 146f563f
      Toke Høiland-Jørgensen 提交于
      [ Upstream commit 8c6c37fdc20ec9ffaa342f827a8e20afe736fb0c ]
      
      To ensure parent qdiscs have the same notion of the number of enqueued
      packets even after splitting a GSO packet, update the qdisc tree with the
      number of packets that was added due to the split.
      Reported-by: NPete Heist <pete@heistp.net>
      Tested-by: NPete Heist <pete@heistp.net>
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      146f563f
  4. 13 12月, 2019 19 次提交
  5. 05 12月, 2019 5 次提交