1. 15 1月, 2020 11 次提交
    • C
      xprtrdma: Eliminate per-transport "max pages" · 18d065a5
      Chuck Lever 提交于
      To support device hotplug and migrating a connection between devices
      of different capabilities, we have to guarantee that all in-kernel
      devices can support the same max NFS payload size (1 megabyte).
      
      This means that possibly one or two in-tree devices are no longer
      supported for NFS/RDMA because they cannot support 1MB rsize/wsize.
      The only one I confirmed was cxgb3, but it has already been removed
      from the kernel.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      18d065a5
    • C
      xprtrdma: Refactor initialization of ep->rep_max_requests · 7581d901
      Chuck Lever 提交于
      Clean up: there is no need to keep two copies of the same value.
      Also, in subsequent patches, rpcrdma_ep_create() will be called in
      the connect worker rather than at set-up time.
      
      Minor fix: Initialize the transport's sendctx to the value based on
      the capabilities of the underlying device, not the maximum setting.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      7581d901
    • C
      xprtrdma: Make sendctx queue lifetime the same as connection lifetime · cb586dec
      Chuck Lever 提交于
      The size of the sendctx queue depends on the value stored in
      ia->ri_max_send_sges. This value is determined by querying the
      underlying device.
      
      Eventually, rpcrdma_ia_open() and rpcrdma_ep_create() will be called
      in the connect worker rather than at transport set-up time. The
      underlying device will not have been chosen device set-up time.
      
      The sendctx queue will thus have to be created after the underlying
      device has been chosen via address and route resolution; in other
      words, in the connect worker.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      cb586dec
    • C
      xprtrdma: Eliminate ri_max_send_sges · 2e870368
      Chuck Lever 提交于
      Clean-up. The max_send_sge value also happens to be stored in
      ep->rep_attr. Let's keep just a single copy.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      2e870368
    • J
      SUNRPC: constify copied structure · c2bd2c0a
      Julia Lawall 提交于
      The empty_iov structure is only copied into another structure,
      so make it const.
      
      The opportunity for this change was found using Coccinelle.
      Signed-off-by: NJulia Lawall <Julia.Lawall@inria.fr>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      c2bd2c0a
    • C
      SUNRPC: call_connect_status should handle -EPROTO · b8457606
      Chuck Lever 提交于
      The xprtrdma connect logic can return -EPROTO if the underlying
      device or network path does not support RDMA. This can happen
      after a device removal/insertion.
      
      - When SOFTCONN is set, EPROTO is a permanent error.
      
      - When SOFTCONN is not set, EPROTO is treated as a temporary error.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b8457606
    • C
      SUNRPC: Capture signalled RPC tasks · abf8af78
      Chuck Lever 提交于
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      abf8af78
    • A
      sunrpc: convert to time64_t for expiry · 52879b46
      Arnd Bergmann 提交于
      Using signed 32-bit types for UTC time leads to the y2038 overflow,
      which is what happens in the sunrpc code at the moment.
      
      This changes the sunrpc code over to use time64_t where possible.
      The one exception is the gss_import_v{1,2}_context() function for
      kerberos5, which uses 32-bit timestamps in the protocol. Here,
      we can at least treat the numbers as 'unsigned', which extends the
      range from 2038 to 2106.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      52879b46
    • C
      xprtrdma: Fix oops in Receive handler after device removal · 671c450b
      Chuck Lever 提交于
      Since v5.4, a device removal occasionally triggered this oops:
      
      Dec  2 17:13:53 manet kernel: BUG: unable to handle page fault for address: 0000000c00000219
      Dec  2 17:13:53 manet kernel: #PF: supervisor read access in kernel mode
      Dec  2 17:13:53 manet kernel: #PF: error_code(0x0000) - not-present page
      Dec  2 17:13:53 manet kernel: PGD 0 P4D 0
      Dec  2 17:13:53 manet kernel: Oops: 0000 [#1] SMP
      Dec  2 17:13:53 manet kernel: CPU: 2 PID: 468 Comm: kworker/2:1H Tainted: G        W         5.4.0-00050-g53717e43af61 #883
      Dec  2 17:13:53 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
      Dec  2 17:13:53 manet kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
      Dec  2 17:13:53 manet kernel: RIP: 0010:rpcrdma_wc_receive+0x7c/0xf6 [rpcrdma]
      Dec  2 17:13:53 manet kernel: Code: 6d 8b 43 14 89 c1 89 45 78 48 89 4d 40 8b 43 2c 89 45 14 8b 43 20 89 45 18 48 8b 45 20 8b 53 14 48 8b 30 48 8b 40 10 48 8b 38 <48> 8b 87 18 02 00 00 48 85 c0 75 18 48 8b 05 1e 24 c4 e1 48 85 c0
      Dec  2 17:13:53 manet kernel: RSP: 0018:ffffc900035dfe00 EFLAGS: 00010246
      Dec  2 17:13:53 manet kernel: RAX: ffff888467290000 RBX: ffff88846c638400 RCX: 0000000000000048
      Dec  2 17:13:53 manet kernel: RDX: 0000000000000048 RSI: 00000000f942e000 RDI: 0000000c00000001
      Dec  2 17:13:53 manet kernel: RBP: ffff888467611b00 R08: ffff888464e4a3c4 R09: 0000000000000000
      Dec  2 17:13:53 manet kernel: R10: ffffc900035dfc88 R11: fefefefefefefeff R12: ffff888865af4428
      Dec  2 17:13:53 manet kernel: R13: ffff888466023000 R14: ffff88846c63f000 R15: 0000000000000010
      Dec  2 17:13:53 manet kernel: FS:  0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000
      Dec  2 17:13:53 manet kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Dec  2 17:13:53 manet kernel: CR2: 0000000c00000219 CR3: 0000000002009002 CR4: 00000000001606e0
      Dec  2 17:13:53 manet kernel: Call Trace:
      Dec  2 17:13:53 manet kernel: __ib_process_cq+0x5c/0x14e [ib_core]
      Dec  2 17:13:53 manet kernel: ib_cq_poll_work+0x26/0x70 [ib_core]
      Dec  2 17:13:53 manet kernel: process_one_work+0x19d/0x2cd
      Dec  2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
      Dec  2 17:13:53 manet kernel: worker_thread+0x1a6/0x25a
      Dec  2 17:13:53 manet kernel: ? cancel_delayed_work_sync+0xf/0xf
      Dec  2 17:13:53 manet kernel: kthread+0xf4/0xf9
      Dec  2 17:13:53 manet kernel: ? kthread_queue_delayed_work+0x74/0x74
      Dec  2 17:13:53 manet kernel: ret_from_fork+0x24/0x30
      
      The proximal cause is that this rpcrdma_rep has a rr_rdmabuf that
      is still pointing to the old ib_device, which has been freed. The
      only way that is possible is if this rpcrdma_rep was not destroyed
      by rpcrdma_ia_remove.
      
      Debugging showed that was indeed the case: this rpcrdma_rep was
      still in use by a completing RPC at the time of the device removal,
      and thus wasn't on the rep free list. So, it was not found by
      rpcrdma_reps_destroy().
      
      The fix is to introduce a list of all rpcrdma_reps so that they all
      can be found when a device is removed. That list is used to perform
      only regbuf DMA unmapping, replacing that call to
      rpcrdma_reps_destroy().
      
      Meanwhile, to prevent corruption of this list, I've moved the
      destruction of temp rpcrdma_rep objects to rpcrdma_post_recvs().
      rpcrdma_xprt_drain() ensures that post_recvs (and thus rep_destroy) is
      not invoked while rpcrdma_reps_unmap is walking rb_all_reps, thus
      protecting the rb_all_reps list.
      
      Fixes: b0b227f0 ("xprtrdma: Use an llist to manage free rpcrdma_reps")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      671c450b
    • C
      xprtrdma: Fix completion wait during device removal · 13cb886c
      Chuck Lever 提交于
      I've found that on occasion, "rmmod <dev>" will hang while if an NFS
      is under load.
      
      Ensure that ri_remove_done is initialized only just before the
      transport is woken up to force a close. This avoids the completion
      possibly getting initialized again while the CM event handler is
      waiting for a wake-up.
      
      Fixes: bebd0318 ("xprtrdma: Support unplugging an HCA from under an NFS mount")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      13cb886c
    • C
      xprtrdma: Fix create_qp crash on device unload · b32b9ed4
      Chuck Lever 提交于
      On device re-insertion, the RDMA device driver crashes trying to set
      up a new QP:
      
      Nov 27 16:32:06 manet kernel: BUG: kernel NULL pointer dereference, address: 00000000000001c0
      Nov 27 16:32:06 manet kernel: #PF: supervisor write access in kernel mode
      Nov 27 16:32:06 manet kernel: #PF: error_code(0x0002) - not-present page
      Nov 27 16:32:06 manet kernel: PGD 0 P4D 0
      Nov 27 16:32:06 manet kernel: Oops: 0002 [#1] SMP
      Nov 27 16:32:06 manet kernel: CPU: 1 PID: 345 Comm: kworker/u28:0 Tainted: G        W         5.4.0 #852
      Nov 27 16:32:06 manet kernel: Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
      Nov 27 16:32:06 manet kernel: Workqueue: xprtiod xprt_rdma_connect_worker [rpcrdma]
      Nov 27 16:32:06 manet kernel: RIP: 0010:atomic_try_cmpxchg+0x2/0x12
      Nov 27 16:32:06 manet kernel: Code: ff ff 48 8b 04 24 5a c3 c6 07 00 0f 1f 40 00 c3 31 c0 48 81 ff 08 09 68 81 72 0c 31 c0 48 81 ff 83 0c 68 81 0f 92 c0 c3 8b 06 <f0> 0f b1 17 0f 94 c2 84 d2 75 02 89 06 88 d0 c3 53 ba 01 00 00 00
      Nov 27 16:32:06 manet kernel: RSP: 0018:ffffc900035abbf0 EFLAGS: 00010046
      Nov 27 16:32:06 manet kernel: RAX: 0000000000000000 RBX: 00000000000001c0 RCX: 0000000000000000
      Nov 27 16:32:06 manet kernel: RDX: 0000000000000001 RSI: ffffc900035abbfc RDI: 00000000000001c0
      Nov 27 16:32:06 manet kernel: RBP: ffffc900035abde0 R08: 000000000000000e R09: ffffffffffffc000
      Nov 27 16:32:06 manet kernel: R10: 0000000000000000 R11: 000000000002e800 R12: ffff88886169d9f8
      Nov 27 16:32:06 manet kernel: R13: ffff88886169d9f4 R14: 0000000000000246 R15: 0000000000000000
      Nov 27 16:32:06 manet kernel: FS:  0000000000000000(0000) GS:ffff88846fa40000(0000) knlGS:0000000000000000
      Nov 27 16:32:06 manet kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Nov 27 16:32:06 manet kernel: CR2: 00000000000001c0 CR3: 0000000002009006 CR4: 00000000001606e0
      Nov 27 16:32:06 manet kernel: Call Trace:
      Nov 27 16:32:06 manet kernel: do_raw_spin_lock+0x2f/0x5a
      Nov 27 16:32:06 manet kernel: create_qp_common.isra.47+0x856/0xadf [mlx4_ib]
      Nov 27 16:32:06 manet kernel: ? slab_post_alloc_hook.isra.60+0xa/0x1a
      Nov 27 16:32:06 manet kernel: ? __kmalloc+0x125/0x139
      Nov 27 16:32:06 manet kernel: mlx4_ib_create_qp+0x57f/0x972 [mlx4_ib]
      
      The fix is to copy the qp_init_attr struct that was just created by
      rpcrdma_ep_create() instead of using the one from the previous
      connection instance.
      
      Fixes: 98ef77d1 ("xprtrdma: Send Queue size grows after a reconnect")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b32b9ed4
  2. 09 1月, 2020 8 次提交
    • T
      tipc: fix wrong connect() return code · 9546a0b7
      Tuong Lien 提交于
      The current 'tipc_wait_for_connect()' function does a wait-loop for the
      condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the socket
      connecting has done. However, when the condition is met, it returns '0'
      even in the case the connecting is actually failed, the socket state is
      set to 'TIPC_DISCONNECTING' (e.g. when the server socket has closed..).
      This results in a wrong return code for the 'connect()' call from user,
      making it believe that the connection is established and go ahead with
      building, sending a message, etc. but finally failed e.g. '-EPIPE'.
      
      This commit fixes the issue by changing the wait condition to the
      'tipc_sk_connected(sk)', so the function will return '0' only when the
      connection is really established. Otherwise, either the socket 'sk_err'
      if any or '-ETIMEDOUT'/'-EINTR' will be returned correspondingly.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9546a0b7
    • T
      tipc: fix link overflow issue at socket shutdown · 49afb806
      Tuong Lien 提交于
      When a socket is suddenly shutdown or released, it will reject all the
      unreceived messages in its receive queue. This applies to a connected
      socket too, whereas there is only one 'FIN' message required to be sent
      back to its peer in this case.
      
      In case there are many messages in the queue and/or some connections
      with such messages are shutdown at the same time, the link layer will
      easily get overflowed at the 'TIPC_SYSTEM_IMPORTANCE' backlog level
      because of the message rejections. As a result, the link will be taken
      down. Moreover, immediately when the link is re-established, the socket
      layer can continue to reject the messages and the same issue happens...
      
      The commit refactors the '__tipc_shutdown()' function to only send one
      'FIN' in the situation mentioned above. For the connectionless case, it
      is unavoidable but usually there is no rejections for such socket
      messages because they are 'dest-droppable' by default.
      
      In addition, the new code makes the other socket states clear
      (e.g.'TIPC_LISTEN') and treats as a separate case to avoid misbehaving.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49afb806
    • F
      netfilter: ipset: avoid null deref when IPSET_ATTR_LINENO is present · 22dad713
      Florian Westphal 提交于
      The set uadt functions assume lineno is never NULL, but it is in
      case of ip_set_utest().
      
      syzkaller managed to generate a netlink message that calls this with
      LINENO attr present:
      
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      RIP: 0010:hash_mac4_uadt+0x1bc/0x470 net/netfilter/ipset/ip_set_hash_mac.c:104
      Call Trace:
       ip_set_utest+0x55b/0x890 net/netfilter/ipset/ip_set_core.c:1867
       nfnetlink_rcv_msg+0xcf2/0xfb0 net/netfilter/nfnetlink.c:229
       netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
       nfnetlink_rcv+0x1ba/0x460 net/netfilter/nfnetlink.c:563
      
      pass a dummy lineno storage, its easier than patching all set
      implementations.
      
      This seems to be a day-0 bug.
      
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Reported-by: syzbot+34bd2369d38707f3f4a7@syzkaller.appspotmail.com
      Fixes: a7b4f989 ("netfilter: ipset: IP set core support")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      22dad713
    • F
      netfilter: conntrack: dccp, sctp: handle null timeout argument · 1d9a7acd
      Florian Westphal 提交于
      The timeout pointer can be NULL which means we should modify the
      per-nets timeout instead.
      
      All do this, except sctp and dccp which instead give:
      
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      net/netfilter/nf_conntrack_proto_dccp.c:682
       ctnl_timeout_parse_policy+0x150/0x1d0 net/netfilter/nfnetlink_cttimeout.c:67
       cttimeout_default_set+0x150/0x1c0 net/netfilter/nfnetlink_cttimeout.c:368
       nfnetlink_rcv_msg+0xcf2/0xfb0 net/netfilter/nfnetlink.c:229
       netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
      
      Reported-by: syzbot+46a4ad33f345d1dd346e@syzkaller.appspotmail.com
      Fixes: c779e849 ("netfilter: conntrack: remove get_timeout() indirection")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1d9a7acd
    • P
      net: sch_prio: When ungrafting, replace with FIFO · 240ce7f6
      Petr Machata 提交于
      When a child Qdisc is removed from one of the PRIO Qdisc's bands, it is
      replaced unconditionally by a NOOP qdisc. As a result, any traffic hitting
      that band gets dropped. That is incorrect--no Qdisc was explicitly added
      when PRIO was created, and after removal, none should have to be added
      either.
      
      Fix PRIO by first attempting to create a default Qdisc and only falling
      back to noop when that fails. This pattern of attempting to create an
      invisible FIFO, using NOOP only as a fallback, is also seen in other
      Qdiscs.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      240ce7f6
    • E
      pkt_sched: fq: do not accept silly TCA_FQ_QUANTUM · d9e15a27
      Eric Dumazet 提交于
      As diagnosed by Florian :
      
      If TCA_FQ_QUANTUM is set to 0x80000000, fq_deueue()
      can loop forever in :
      
      if (f->credit <= 0) {
        f->credit += q->quantum;
        goto begin;
      }
      
      ... because f->credit is either 0 or -2147483648.
      
      Let's limit TCA_FQ_QUANTUM to no more than 1 << 20 :
      This max value should limit risks of breaking user setups
      while fixing this bug.
      
      Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Diagnosed-by: NFlorian Westphal <fw@strlen.de>
      Reported-by: syzbot+dc9071cc5a85950bdfce@syzkaller.appspotmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9e15a27
    • M
      tipc: remove meaningless assignment in Makefile · b969fee1
      Masahiro Yamada 提交于
      There is no module named tipc_diag.
      
      The assignment to tipc_diag-y has no effect.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b969fee1
    • M
      tipc: do not add socket.o to tipc-y twice · ea04b445
      Masahiro Yamada 提交于
      net/tipc/Makefile adds socket.o twice.
      
      tipc-y	+= addr.o bcast.o bearer.o \
                 core.o link.o discover.o msg.o  \
                 name_distr.o  subscr.o monitor.o name_table.o net.o  \
                 netlink.o netlink_compat.o node.o socket.o eth_media.o \
                                                   ^^^^^^^^
                 topsrv.o socket.o group.o trace.o
                          ^^^^^^^^
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea04b445
  3. 08 1月, 2020 2 次提交
    • E
      vlan: vlan_changelink() should propagate errors · eb8ef2a3
      Eric Dumazet 提交于
      Both vlan_dev_change_flags() and vlan_dev_set_egress_priority()
      can return an error. vlan_changelink() should not ignore them.
      
      Fixes: 07b5b17e ("[VLAN]: Use rtnl_link API")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb8ef2a3
    • E
      vlan: fix memory leak in vlan_dev_set_egress_priority · 9bbd917e
      Eric Dumazet 提交于
      There are few cases where the ndo_uninit() handler might be not
      called if an error happens while device is initialized.
      
      Since vlan_newlink() calls vlan_changelink() before
      trying to register the netdevice, we need to make sure
      vlan_dev_uninit() has been called at least once,
      or we might leak allocated memory.
      
      BUG: memory leak
      unreferenced object 0xffff888122a206c0 (size 32):
        comm "syz-executor511", pid 7124, jiffies 4294950399 (age 32.240s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 61 73 00 00 00 00 00 00 00 00  ......as........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000000eb3bb85>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
          [<000000000eb3bb85>] slab_post_alloc_hook mm/slab.h:586 [inline]
          [<000000000eb3bb85>] slab_alloc mm/slab.c:3320 [inline]
          [<000000000eb3bb85>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3549
          [<000000007b99f620>] kmalloc include/linux/slab.h:556 [inline]
          [<000000007b99f620>] vlan_dev_set_egress_priority+0xcc/0x150 net/8021q/vlan_dev.c:194
          [<000000007b0cb745>] vlan_changelink+0xd6/0x140 net/8021q/vlan_netlink.c:126
          [<0000000065aba83a>] vlan_newlink+0x135/0x200 net/8021q/vlan_netlink.c:181
          [<00000000fb5dd7a2>] __rtnl_newlink+0x89a/0xb80 net/core/rtnetlink.c:3305
          [<00000000ae4273a1>] rtnl_newlink+0x4e/0x80 net/core/rtnetlink.c:3363
          [<00000000decab39f>] rtnetlink_rcv_msg+0x178/0x4b0 net/core/rtnetlink.c:5424
          [<00000000accba4ee>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
          [<00000000319fe20f>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
          [<00000000d51938dc>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
          [<00000000d51938dc>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328
          [<00000000e539ac79>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917
          [<000000006250c27e>] sock_sendmsg_nosec net/socket.c:639 [inline]
          [<000000006250c27e>] sock_sendmsg+0x54/0x70 net/socket.c:659
          [<00000000e2a156d1>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330
          [<000000008c87466e>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384
          [<00000000110e3054>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417
          [<00000000d71077c8>] __do_sys_sendmsg net/socket.c:2426 [inline]
          [<00000000d71077c8>] __se_sys_sendmsg net/socket.c:2424 [inline]
          [<00000000d71077c8>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424
      
      Fixe: 07b5b17e ("[VLAN]: Use rtnl_link API")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bbd917e
  4. 07 1月, 2020 2 次提交
    • X
      sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY · be7a7729
      Xin Long 提交于
      This patch is to fix a memleak caused by no place to free cmd->obj.chunk
      for the unprocessed SCTP_CMD_REPLY. This issue occurs when failing to
      process a cmd while there're still SCTP_CMD_REPLY cmds on the cmd seq
      with an allocated chunk in cmd->obj.chunk.
      
      So fix it by freeing cmd->obj.chunk for each SCTP_CMD_REPLY cmd left on
      the cmd seq when any cmd returns error. While at it, also remove 'nomem'
      label.
      
      Reported-by: syzbot+107c4aff5f392bf1517f@syzkaller.appspotmail.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be7a7729
    • Y
      tipc: eliminate KMSAN: uninit-value in __tipc_nl_compat_dumpit error · a7869e5f
      Ying Xue 提交于
      syzbot found the following crash on:
      =====================================================
      BUG: KMSAN: uninit-value in __nlmsg_parse include/net/netlink.h:661 [inline]
      BUG: KMSAN: uninit-value in nlmsg_parse_deprecated
      include/net/netlink.h:706 [inline]
      BUG: KMSAN: uninit-value in __tipc_nl_compat_dumpit+0x553/0x11e0
      net/tipc/netlink_compat.c:215
      CPU: 0 PID: 12425 Comm: syz-executor062 Not tainted 5.5.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x1c9/0x220 lib/dump_stack.c:118
        kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
        __msan_warning+0x57/0xa0 mm/kmsan/kmsan_instr.c:245
        __nlmsg_parse include/net/netlink.h:661 [inline]
        nlmsg_parse_deprecated include/net/netlink.h:706 [inline]
        __tipc_nl_compat_dumpit+0x553/0x11e0 net/tipc/netlink_compat.c:215
        tipc_nl_compat_dumpit+0x761/0x910 net/tipc/netlink_compat.c:308
        tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline]
        tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311
        genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline]
        genl_family_rcv_msg net/netlink/genetlink.c:717 [inline]
        genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734
        netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
        genl_rcv+0x63/0x80 net/netlink/genetlink.c:745
        netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
        netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328
        netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917
        sock_sendmsg_nosec net/socket.c:639 [inline]
        sock_sendmsg net/socket.c:659 [inline]
        ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330
        ___sys_sendmsg net/socket.c:2384 [inline]
        __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417
        __do_sys_sendmsg net/socket.c:2426 [inline]
        __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
        do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x444179
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 1b d8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffd2d6409c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000444179
      RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
      RBP: 00000000006ce018 R08: 0000000000000000 R09: 00000000004002e0
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401e20
      R13: 0000000000401eb0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
        kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
        kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:86
        slab_alloc_node mm/slub.c:2774 [inline]
        __kmalloc_node_track_caller+0xe47/0x11f0 mm/slub.c:4382
        __kmalloc_reserve net/core/skbuff.c:141 [inline]
        __alloc_skb+0x309/0xa50 net/core/skbuff.c:209
        alloc_skb include/linux/skbuff.h:1049 [inline]
        nlmsg_new include/net/netlink.h:888 [inline]
        tipc_nl_compat_dumpit+0x6e4/0x910 net/tipc/netlink_compat.c:301
        tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline]
        tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311
        genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline]
        genl_family_rcv_msg net/netlink/genetlink.c:717 [inline]
        genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734
        netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
        genl_rcv+0x63/0x80 net/netlink/genetlink.c:745
        netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
        netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328
        netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917
        sock_sendmsg_nosec net/socket.c:639 [inline]
        sock_sendmsg net/socket.c:659 [inline]
        ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330
        ___sys_sendmsg net/socket.c:2384 [inline]
        __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417
        __do_sys_sendmsg net/socket.c:2426 [inline]
        __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
        do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      =====================================================
      
      The complaint above occurred because the memory region pointed by attrbuf
      variable was not initialized. To eliminate this warning, we use kcalloc()
      rather than kmalloc_array() to allocate memory for attrbuf.
      
      Reported-by: syzbot+b1fd2bf2c89d8407e15f@syzkaller.appspotmail.com
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7869e5f
  5. 06 1月, 2020 2 次提交
    • P
      netfilter: flowtable: add nf_flowtable_time_stamp · fb46f1b7
      Pablo Neira Ayuso 提交于
      This patch adds nf_flowtable_time_stamp and updates the existing code to
      use it.
      
      This patch is also implicitly fixing up hardware statistic fetching via
      nf_flow_offload_stats() where casting to u32 is missing. Use
      nf_flow_timeout_delta() to fix this.
      
      Fixes: c29f74e0 ("netfilter: nf_flow_table: hardware offload support")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: Nwenxu <wenxu@ucloud.cn>
      fb46f1b7
    • C
      net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue · ce57785b
      Carl Huang 提交于
      The len used for skb_put_padto is wrong, it need to add len of hdr.
      
      In qrtr_node_enqueue, local variable size_t len is assign with
      skb->len, then skb_push(skb, sizeof(*hdr)) will add skb->len with
      sizeof(*hdr), so local variable size_t len is not same with skb->len
      after skb_push(skb, sizeof(*hdr)).
      
      Then the purpose of skb_put_padto(skb, ALIGN(len, 4)) is to add add
      pad to the end of the skb's data if skb->len is not aligned to 4, but
      unfortunately it use len instead of skb->len, at this line, skb->len
      is 32 bytes(sizeof(*hdr)) more than len, for example, len is 3 bytes,
      then skb->len is 35 bytes(3 + 32), and ALIGN(len, 4) is 4 bytes, so
      __skb_put_padto will do nothing after check size(35) < len(4), the
      correct value should be 36(sizeof(*hdr) + ALIGN(len, 4) = 32 + 4),
      then __skb_put_padto will pass check size(35) < len(36) and add 1 byte
      to the end of skb's data, then logic is correct.
      
      function of skb_push:
      void *skb_push(struct sk_buff *skb, unsigned int len)
      {
      	skb->data -= len;
      	skb->len  += len;
      	if (unlikely(skb->data < skb->head))
      		skb_under_panic(skb, len, __builtin_return_address(0));
      	return skb->data;
      }
      
      function of skb_put_padto
      static inline int skb_put_padto(struct sk_buff *skb, unsigned int len)
      {
      	return __skb_put_padto(skb, len, true);
      }
      
      function of __skb_put_padto
      static inline int __skb_put_padto(struct sk_buff *skb, unsigned int len,
      				  bool free_on_error)
      {
      	unsigned int size = skb->len;
      
      	if (unlikely(size < len)) {
      		len -= size;
      		if (__skb_pad(skb, len, free_on_error))
      			return -ENOMEM;
      		__skb_put(skb, len);
      	}
      	return 0;
      }
      Signed-off-by: NCarl Huang <cjhuang@codeaurora.org>
      Signed-off-by: NWen Gong <wgong@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce57785b
  6. 05 1月, 2020 5 次提交
  7. 03 1月, 2020 2 次提交
    • W
      sch_cake: avoid possible divide by zero in cake_enqueue() · 68aab823
      Wen Yang 提交于
      The variables 'window_interval' is u64 and do_div()
      truncates it to 32 bits, which means it can test
      non-zero and be truncated to zero for division.
      The unit of window_interval is nanoseconds,
      so its lower 32-bit is relatively easy to exceed.
      Fix this issue by using div64_u64() instead.
      
      Fixes: 7298de9c ("sch_cake: Add ingress mode")
      Signed-off-by: NWen Yang <wenyang@linux.alibaba.com>
      Cc: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Cc: Toke Høiland-Jørgensen <toke@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: cake@lists.bufferbloat.net
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Acked-by: NToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68aab823
    • P
      tcp: fix "old stuff" D-SACK causing SACK to be treated as D-SACK · c9655008
      Pengcheng Yang 提交于
      When we receive a D-SACK, where the sequence number satisfies:
      	undo_marker <= start_seq < end_seq <= prior_snd_una
      we consider this is a valid D-SACK and tcp_is_sackblock_valid()
      returns true, then this D-SACK is discarded as "old stuff",
      but the variable first_sack_index is not marked as negative
      in tcp_sacktag_write_queue().
      
      If this D-SACK also carries a SACK that needs to be processed
      (for example, the previous SACK segment was lost), this SACK
      will be treated as a D-SACK in the following processing of
      tcp_sacktag_write_queue(), which will eventually lead to
      incorrect updates of undo_retrans and reordering.
      
      Fixes: fd6dad61 ("[TCP]: Earlier SACK block verification & simplify access to them")
      Signed-off-by: NPengcheng Yang <yangpc@wangsu.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9655008
  8. 31 12月, 2019 3 次提交
    • T
      hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename() · 04b69426
      Taehee Yoo 提交于
      hsr slave interfaces don't have debugfs directory.
      So, hsr_debugfs_rename() shouldn't be called when hsr slave interface name
      is changed.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add dummy1 type dummy
          ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
          ip link set dummy0 name ap
      
      Splat looks like:
      [21071.899367][T22666] ap: renamed from dummy0
      [21071.914005][T22666] ==================================================================
      [21071.919008][T22666] BUG: KASAN: slab-out-of-bounds in hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.923640][T22666] Read of size 8 at addr ffff88805febcd98 by task ip/22666
      [21071.926941][T22666]
      [21071.927750][T22666] CPU: 0 PID: 22666 Comm: ip Not tainted 5.5.0-rc2+ #240
      [21071.929919][T22666] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [21071.935094][T22666] Call Trace:
      [21071.935867][T22666]  dump_stack+0x96/0xdb
      [21071.936687][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.937774][T22666]  print_address_description.constprop.5+0x1be/0x360
      [21071.939019][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.940081][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.940949][T22666]  __kasan_report+0x12a/0x16f
      [21071.941758][T22666]  ? hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.942674][T22666]  kasan_report+0xe/0x20
      [21071.943325][T22666]  hsr_debugfs_rename+0xaa/0xb0 [hsr]
      [21071.944187][T22666]  hsr_netdev_notify+0x1fe/0x9b0 [hsr]
      [21071.945052][T22666]  ? __module_text_address+0x13/0x140
      [21071.945897][T22666]  notifier_call_chain+0x90/0x160
      [21071.946743][T22666]  dev_change_name+0x419/0x840
      [21071.947496][T22666]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
      [21071.948600][T22666]  ? netdev_adjacent_rename_links+0x280/0x280
      [21071.949577][T22666]  ? __read_once_size_nocheck.constprop.6+0x10/0x10
      [21071.950672][T22666]  ? lock_downgrade+0x6e0/0x6e0
      [21071.951345][T22666]  ? do_setlink+0x811/0x2ef0
      [21071.951991][T22666]  do_setlink+0x811/0x2ef0
      [21071.952613][T22666]  ? is_bpf_text_address+0x81/0xe0
      [ ... ]
      
      Reported-by: syzbot+9328206518f08318a5fd@syzkaller.appspotmail.com
      Fixes: 4c2d5e33 ("hsr: rename debugfs file when interface name is changed")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04b69426
    • D
      net/sched: add delete_empty() to filters and use it in cls_flower · a5b72a08
      Davide Caratti 提交于
      Revert "net/sched: cls_u32: fix refcount leak in the error path of
      u32_change()", and fix the u32 refcount leak in a more generic way that
      preserves the semantic of rule dumping.
      On tc filters that don't support lockless insertion/removal, there is no
      need to guard against concurrent insertion when a removal is in progress.
      Therefore, for most of them we can avoid a full walk() when deleting, and
      just decrease the refcount, like it was done on older Linux kernels.
      This fixes situations where walk() was wrongly detecting a non-empty
      filter, like it happened with cls_u32 in the error path of change(), thus
      leading to failures in the following tdc selftests:
      
       6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev
       6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle
       74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id
      
      On cls_flower, and on (future) lockless filters, this check is necessary:
      move all the check_empty() logic in a callback so that each filter
      can have its own implementation. For cls_flower, it's sufficient to check
      if no IDRs have been allocated.
      
      This reverts commit 275c44aa.
      
      Changes since v1:
       - document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED
         is used, thanks to Vlad Buslov
       - implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov
       - squash revert and new fix in a single patch, to be nice with bisect
         tests that run tdc on u32 filter, thanks to Dave Miller
      
      Fixes: 275c44aa ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()")
      Fixes: 6676d5e4 ("net: sched: set dedicated tcf_walker flag when tp is empty")
      Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Suggested-by: NVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: NVlad Buslov <vladbu@mellanox.com>
      Tested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5b72a08
    • C
      tcp: Fix highest_sack and highest_sack_seq · 85369750
      Cambda Zhu 提交于
      >From commit 50895b9d ("tcp: highest_sack fix"), the logic about
      setting tp->highest_sack to the head of the send queue was removed.
      Of course the logic is error prone, but it is logical. Before we
      remove the pointer to the highest sack skb and use the seq instead,
      we need to set tp->highest_sack to NULL when there is no skb after
      the last sack, and then replace NULL with the real skb when new skb
      inserted into the rtx queue, because the NULL means the highest sack
      seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent,
      the next ACK with sack option will increase tp->reordering unexpectedly.
      
      This patch sets tp->highest_sack to the tail of the rtx queue if
      it's NULL and new data is sent. The patch keeps the rule that the
      highest_sack can only be maintained by sack processing, except for
      this only case.
      
      Fixes: 50895b9d ("tcp: highest_sack fix")
      Signed-off-by: NCambda Zhu <cambda@linux.alibaba.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85369750
  9. 30 12月, 2019 1 次提交
    • F
      netfilter: arp_tables: init netns pointer in xt_tgchk_param struct · 1b789577
      Florian Westphal 提交于
      We get crash when the targets checkentry function tries to make
      use of the network namespace pointer for arptables.
      
      When the net pointer got added back in 2010, only ip/ip6/ebtables were
      changed to initialize it, so arptables has this set to NULL.
      
      This isn't a problem for normal arptables because no existing
      arptables target has a checkentry function that makes use of par->net.
      
      However, direct users of the setsockopt interface can provide any
      target they want as long as its registered for ARP or UNPSEC protocols.
      
      syzkaller managed to send a semi-valid arptables rule for RATEEST target
      which is enough to trigger NULL deref:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      RIP: xt_rateest_tg_checkentry+0x11d/0xb40 net/netfilter/xt_RATEEST.c:109
      [..]
       xt_check_target+0x283/0x690 net/netfilter/x_tables.c:1019
       check_target net/ipv4/netfilter/arp_tables.c:399 [inline]
       find_check_entry net/ipv4/netfilter/arp_tables.c:422 [inline]
       translate_table+0x1005/0x1d70 net/ipv4/netfilter/arp_tables.c:572
       do_replace net/ipv4/netfilter/arp_tables.c:977 [inline]
       do_arpt_set_ctl+0x310/0x640 net/ipv4/netfilter/arp_tables.c:1456
      
      Fixes: add67461 ("netfilter: add struct net * to target parameters")
      Reported-by: syzbot+d7358a458d8a81aee898@syzkaller.appspotmail.com
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1b789577
  10. 28 12月, 2019 1 次提交
    • S
      net/sched: act_mirred: Pull mac prior redir to non mac_header_xmit device · 70cf3dc7
      Shmulik Ladkani 提交于
      There's no skb_pull performed when a mirred action is set at egress of a
      mac device, with a target device/action that expects skb->data to point
      at the network header.
      
      As a result, either the target device is errornously given an skb with
      data pointing to the mac (egress case), or the net stack receives the
      skb with data pointing to the mac (ingress case).
      
      E.g:
       # tc qdisc add dev eth9 root handle 1: prio
       # tc filter add dev eth9 parent 1: prio 9 protocol ip handle 9 basic \
         action mirred egress redirect dev tun0
      
       (tun0 is a tun device. result: tun0 errornously gets the eth header
        instead of the iph)
      
      Revise the push/pull logic of tcf_mirred_act() to not rely on the
      skb_at_tc_ingress() vs tcf_mirred_act_wants_ingress() comparison, as it
      does not cover all "pull" cases.
      
      Instead, calculate whether the required action on the target device
      requires the data to point at the network header, and compare this to
      whether skb->data points to network header - and make the push/pull
      adjustments as necessary.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NShmulik Ladkani <sladkani@proofpoint.com>
      Tested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70cf3dc7
  11. 27 12月, 2019 1 次提交
  12. 26 12月, 2019 2 次提交
    • T
      hsr: reset network header when supervision frame is created · 3ed0a1d5
      Taehee Yoo 提交于
      The supervision frame is L2 frame.
      When supervision frame is created, hsr module doesn't set network header.
      If tap routine is enabled, dev_queue_xmit_nit() is called and it checks
      network_header. If network_header pointer wasn't set(or invalid),
      it resets network_header and warns.
      In order to avoid unnecessary warning message, resetting network_header
      is needed.
      
      Test commands:
          ip netns add nst
          ip link add veth0 type veth peer name veth1
          ip link add veth2 type veth peer name veth3
          ip link set veth1 netns nst
          ip link set veth3 netns nst
          ip link set veth0 up
          ip link set veth2 up
          ip link add hsr0 type hsr slave1 veth0 slave2 veth2
          ip a a 192.168.100.1/24 dev hsr0
          ip link set hsr0 up
          ip netns exec nst ip link set veth1 up
          ip netns exec nst ip link set veth3 up
          ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
          ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
          ip netns exec nst ip link set hsr1 up
          tcpdump -nei veth0
      
      Splat looks like:
      [  175.852292][    C3] protocol 88fb is buggy, dev veth0
      
      Fixes: f421436a ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ed0a1d5
    • T
      hsr: fix a race condition in node list insertion and deletion · 92a35678
      Taehee Yoo 提交于
      hsr nodes are protected by RCU and there is no write side lock.
      But node insertions and deletions could be being operated concurrently.
      So write side locking is needed.
      
      Test commands:
          ip netns add nst
          ip link add veth0 type veth peer name veth1
          ip link add veth2 type veth peer name veth3
          ip link set veth1 netns nst
          ip link set veth3 netns nst
          ip link set veth0 up
          ip link set veth2 up
          ip link add hsr0 type hsr slave1 veth0 slave2 veth2
          ip a a 192.168.100.1/24 dev hsr0
          ip link set hsr0 up
          ip netns exec nst ip link set veth1 up
          ip netns exec nst ip link set veth3 up
          ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3
          ip netns exec nst ip a a 192.168.100.2/24 dev hsr1
          ip netns exec nst ip link set hsr1 up
      
          for i in {0..9}
          do
              for j in {0..9}
      	do
      	    for k in {0..9}
      	    do
      	        for l in {0..9}
      		do
      	        arping 192.168.100.2 -I hsr0 -s 00:01:3$i:4$j:5$k:6$l -c1 &
      		done
      	    done
      	done
          done
      
      Splat looks like:
      [  236.066091][ T3286] list_add corruption. next->prev should be prev (ffff8880a5940300), but was ffff8880a5940d0.
      [  236.069617][ T3286] ------------[ cut here ]------------
      [  236.070545][ T3286] kernel BUG at lib/list_debug.c:25!
      [  236.071391][ T3286] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  236.072343][ T3286] CPU: 0 PID: 3286 Comm: arping Tainted: G        W         5.5.0-rc1+ #209
      [  236.073463][ T3286] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  236.074695][ T3286] RIP: 0010:__list_add_valid+0x74/0xd0
      [  236.075499][ T3286] Code: 48 39 da 75 27 48 39 f5 74 36 48 39 dd 74 31 48 83 c4 08 b8 01 00 00 00 5b 5d c3 48 b
      [  236.078277][ T3286] RSP: 0018:ffff8880aaa97648 EFLAGS: 00010286
      [  236.086991][ T3286] RAX: 0000000000000075 RBX: ffff8880d4624c20 RCX: 0000000000000000
      [  236.088000][ T3286] RDX: 0000000000000075 RSI: 0000000000000008 RDI: ffffed1015552ebf
      [  236.098897][ T3286] RBP: ffff88809b53d200 R08: ffffed101b3c04f9 R09: ffffed101b3c04f9
      [  236.099960][ T3286] R10: 00000000308769a1 R11: ffffed101b3c04f8 R12: ffff8880d4624c28
      [  236.100974][ T3286] R13: ffff8880d4624c20 R14: 0000000040310100 R15: ffff8880ce17ee02
      [  236.138967][ T3286] FS:  00007f23479fa680(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000
      [  236.144852][ T3286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  236.145720][ T3286] CR2: 00007f4a14bab210 CR3: 00000000a61c6001 CR4: 00000000000606f0
      [  236.146776][ T3286] Call Trace:
      [  236.147222][ T3286]  hsr_add_node+0x314/0x490 [hsr]
      [  236.153633][ T3286]  hsr_forward_skb+0x2b6/0x1bc0 [hsr]
      [  236.154362][ T3286]  ? rcu_read_lock_sched_held+0x90/0xc0
      [  236.155091][ T3286]  ? rcu_read_lock_bh_held+0xa0/0xa0
      [  236.156607][ T3286]  hsr_dev_xmit+0x70/0xd0 [hsr]
      [  236.157254][ T3286]  dev_hard_start_xmit+0x160/0x740
      [  236.157941][ T3286]  __dev_queue_xmit+0x1961/0x2e10
      [  236.158565][ T3286]  ? netdev_core_pick_tx+0x2e0/0x2e0
      [ ... ]
      
      Reported-by: syzbot+3924327f9ad5f4d2b343@syzkaller.appspotmail.com
      Fixes: f421436a ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92a35678