1. 06 8月, 2016 2 次提交
    • D
      rxrpc: Fix races between skb free, ACK generation and replying · 372ee163
      David Howells 提交于
      Inside the kafs filesystem it is possible to occasionally have a call
      processed and terminated before we've had a chance to check whether we need
      to clean up the rx queue for that call because afs_send_simple_reply() ends
      the call when it is done, but this is done in a workqueue item that might
      happen to run to completion before afs_deliver_to_call() completes.
      
      Further, it is possible for rxrpc_kernel_send_data() to be called to send a
      reply before the last request-phase data skb is released.  The rxrpc skb
      destructor is where the ACK processing is done and the call state is
      advanced upon release of the last skb.  ACK generation is also deferred to
      a work item because it's possible that the skb destructor is not called in
      a context where kernel_sendmsg() can be invoked.
      
      To this end, the following changes are made:
      
       (1) kernel_rxrpc_data_consumed() is added.  This should be called whenever
           an skb is emptied so as to crank the ACK and call states.  This does
           not release the skb, however.  kernel_rxrpc_free_skb() must now be
           called to achieve that.  These together replace
           rxrpc_kernel_data_delivered().
      
       (2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().
      
           This makes afs_deliver_to_call() easier to work as the skb can simply
           be discarded unconditionally here without trying to work out what the
           return value of the ->deliver() function means.
      
           The ->deliver() functions can, via afs_data_complete(),
           afs_transfer_reply() and afs_extract_data() mark that an skb has been
           consumed (thereby cranking the state) without the need to
           conditionally free the skb to make sure the state is correct on an
           incoming call for when the call processor tries to send the reply.
      
       (3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
           has finished with a packet and MSG_PEEK isn't set.
      
       (4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().
      
           Because of this, we no longer need to clear the destructor and put the
           call before we free the skb in cases where we don't want the ACK/call
           state to be cranked.
      
       (5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
           than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
           the delivery function already), and the caller is now responsible for
           producing an abort if that was the last packet.
      
       (6) There are many bits of unmarshalling code where:
      
       		ret = afs_extract_data(call, skb, last, ...);
      		switch (ret) {
      		case 0:		break;
      		case -EAGAIN:	return 0;
      		default:	return ret;
      		}
      
           is to be found.  As -EAGAIN can now be passed back to the caller, we
           now just return if ret < 0:
      
       		ret = afs_extract_data(call, skb, last, ...);
      		if (ret < 0)
      			return ret;
      
       (7) Checks for trailing data and empty final data packets has been
           consolidated as afs_data_complete().  So:
      
      		if (skb->len > 0)
      			return -EBADMSG;
      		if (!last)
      			return 0;
      
           becomes:
      
      		ret = afs_data_complete(call, skb, last);
      		if (ret < 0)
      			return ret;
      
       (8) afs_transfer_reply() now checks the amount of data it has against the
           amount of data desired and the amount of data in the skb and returns
           an error to induce an abort if we don't get exactly what we want.
      
      Without these changes, the following oops can occasionally be observed,
      particularly if some printks are inserted into the delivery path:
      
      general protection fault: 0000 [#1] SMP
      Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
      CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G            E   4.7.0-fsdevel+ #1303
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      Workqueue: kafsd afs_async_workfn [kafs]
      task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000
      RIP: 0010:[<ffffffff8108fd3c>]  [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1
      RSP: 0018:ffff88040c073bc0  EFLAGS: 00010002
      RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710
      RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001
      R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f
      FS:  0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0
      Stack:
       0000000000000006 000000000be04930 0000000000000000 ffff880400000000
       ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446
       ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38
      Call Trace:
       [<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74
       [<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1
       [<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189
       [<ffffffff810915f4>] lock_acquire+0x122/0x1b6
       [<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6
       [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
       [<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49
       [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
       [<ffffffff814c928f>] skb_dequeue+0x18/0x61
       [<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs]
       [<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs]
       [<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs]
       [<ffffffff81063a3a>] process_one_work+0x29d/0x57c
       [<ffffffff81064ac2>] worker_thread+0x24a/0x385
       [<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0
       [<ffffffff810696f5>] kthread+0xf3/0xfb
       [<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40
       [<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      372ee163
    • I
      OVS: Ignore negative headroom value · 5ef9f289
      Ian Wienand 提交于
      net_device->ndo_set_rx_headroom (introduced in
      871b642a) says
      
        "Setting a negtaive value reset the rx headroom
         to the default value".
      
      It seems that the OVS implementation in
      3a927bc7 overlooked this and sets
      dev->needed_headroom unconditionally.
      
      This doesn't have an immediate effect, but can mess up later
      LL_RESERVED_SPACE calculations, such as done in
      net/ipv6/mcast.c:mld_newpack.  For reference, this issue was found
      from a skb_panic raised there after the length calculations had given
      the wrong result.
      
      Note the other current users of this interface
      (drivers/net/tun.c:tun_set_headroom and
      drivers/net/veth.c:veth_set_rx_headroom) are both checking this
      correctly thus need no modification.
      
      Thanks to Ben for some pointers from the crash dumps!
      
      Cc: Benjamin Poirier <bpoirier@suse.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414Signed-off-by: NIan Wienand <iwienand@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ef9f289
  2. 04 8月, 2016 1 次提交
  3. 31 7月, 2016 6 次提交
  4. 28 7月, 2016 6 次提交
  5. 27 7月, 2016 9 次提交
  6. 26 7月, 2016 16 次提交
    • W
      net_sched: get rid of struct tcf_common · ec0595cc
      WANG Cong 提交于
      After the previous patch, struct tc_action should be enough
      to represent the generic tc action, tcf_common is not necessary
      any more. This patch gets rid of it to make tc action code
      more readable.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec0595cc
    • W
      net_sched: move tc_action into tcf_common · a85a970a
      WANG Cong 提交于
      struct tc_action is confusing, currently we use it for two purposes:
      1) Pass in arguments and carry out results from helper functions
      2) A generic representation for tc actions
      
      The first one is error-prone, since we need to make sure we don't
      miss anything. This patch aims to get rid of this use, by moving
      tc_action into tcf_common, so that they are allocated together
      in hashtable and can be cast'ed easily.
      
      And together with the following patch, we could really make
      tc_action a generic representation for all tc actions and each
      type of action can inherit from it.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a85a970a
    • D
      udp: use sk_filter_trim_cap for udp{,6}_queue_rcv_skb · ba66bbe5
      Daniel Borkmann 提交于
      After a6127697 ("udp: prevent bugcheck if filter truncates packet
      too much"), there followed various other fixes for similar cases such
      as f4979fce ("rose: limit sk_filter trim to payload").
      
      Latter introduced a new helper sk_filter_trim_cap(), where we can pass
      the trim limit directly to the socket filter handling. Make use of it
      here as well with sizeof(struct udphdr) as lower cap limit and drop the
      extra skb->len test in UDP's input path.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba66bbe5
    • V
      net/sctp: terminate rhashtable walk correctly · 5fc382d8
      Vegard Nossum 提交于
      I was seeing a lot of these:
      
          BUG: sleeping function called from invalid context at mm/slab.h:388
          in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2
          Preemption disabled at:[<ffffffff819bcd46>] rhashtable_walk_start+0x46/0x150
      
           [<ffffffff81149abb>] preempt_count_add+0x1fb/0x280
           [<ffffffff83295722>] _raw_spin_lock+0x12/0x40
           [<ffffffff811aac87>] console_unlock+0x2f7/0x930
           [<ffffffff811ab5bb>] vprintk_emit+0x2fb/0x520
           [<ffffffff811aba6a>] vprintk_default+0x1a/0x20
           [<ffffffff812c171a>] printk+0x94/0xb0
           [<ffffffff811d6ed0>] print_stack_trace+0xe0/0x170
           [<ffffffff8115835e>] ___might_sleep+0x3be/0x460
           [<ffffffff81158490>] __might_sleep+0x90/0x1a0
           [<ffffffff8139b823>] kmem_cache_alloc+0x153/0x1e0
           [<ffffffff819bca1e>] rhashtable_walk_init+0xfe/0x2d0
           [<ffffffff82ec64de>] sctp_transport_walk_start+0x1e/0x60
           [<ffffffff82edd8ad>] sctp_transport_seq_start+0x4d/0x150
           [<ffffffff8143a82b>] seq_read+0x27b/0x1180
           [<ffffffff814f97fc>] proc_reg_read+0xbc/0x180
           [<ffffffff813d471b>] __vfs_read+0xdb/0x610
           [<ffffffff813d4d3a>] vfs_read+0xea/0x2d0
           [<ffffffff813d615b>] SyS_pread64+0x11b/0x150
           [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
           [<ffffffff832960a5>] return_from_SYSCALL_64+0x0/0x6a
           [<ffffffffffffffff>] 0xffffffffffffffff
      
      Apparently we always need to call rhashtable_walk_stop(), even when
      rhashtable_walk_start() fails:
      
       * rhashtable_walk_start - Start a hash table walk
       * @iter:       Hash table iterator
       *
       * Start a hash table walk.  Note that we take the RCU lock in all
       * cases including when we return an error.  So you must always call
       * rhashtable_walk_stop to clean up.
      
      otherwise we never call rcu_read_unlock() and we get the splat above.
      
      Fixes: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag")
      See-also: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag")
      See-also: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*")
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: stable@vger.kernel.org
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fc382d8
    • V
      net/irda: fix NULL pointer dereference on memory allocation failure · d3e6952c
      Vegard Nossum 提交于
      I ran into this:
      
          kasan: CONFIG_KASAN_INLINE enabled
          kasan: GPF could be caused by NULL-ptr deref or user memory access
          general protection fault: 0000 [#1] PREEMPT SMP KASAN
          CPU: 2 PID: 2012 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
          task: ffff8800b745f2c0 ti: ffff880111740000 task.ti: ffff880111740000
          RIP: 0010:[<ffffffff82bbf066>]  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
          RSP: 0018:ffff880111747bb8  EFLAGS: 00010286
          RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000069dd8358
          RDX: 0000000000000009 RSI: 0000000000000027 RDI: 0000000000000048
          RBP: ffff880111747c00 R08: 0000000000000000 R09: 0000000000000000
          R10: 0000000069dd8358 R11: 1ffffffff0759723 R12: 0000000000000000
          R13: ffff88011a7e4780 R14: 0000000000000027 R15: 0000000000000000
          FS:  00007fc738404700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00007fc737fdfb10 CR3: 0000000118087000 CR4: 00000000000006e0
          Stack:
           0000000000000200 ffff880111747bd8 ffffffff810ee611 ffff880119f1f220
           ffff880119f1f4f8 ffff880119f1f4f0 ffff88011a7e4780 ffff880119f1f232
           ffff880119f1f220 ffff880111747d58 ffffffff82bca542 0000000000000000
          Call Trace:
           [<ffffffff82bca542>] irda_connect+0x562/0x1190
           [<ffffffff825ae582>] SYSC_connect+0x202/0x2a0
           [<ffffffff825b4489>] SyS_connect+0x9/0x10
           [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
           [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25
          Code: 41 89 ca 48 89 e5 41 57 41 56 41 55 41 54 41 89 d7 53 48 89 fb 48 83 c7 48 48 89 fa 41 89 f6 48 c1 ea 03 48 83 ec 20 4c 8b 65 10 <0f> b6 04 02 84 c0 74 08 84 c0 0f 8e 4c 04 00 00 80 7b 48 00 74
          RIP  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
           RSP <ffff880111747bb8>
          ---[ end trace 4cda2588bc055b30 ]---
      
      The problem is that irda_open_tsap() can fail and leave self->tsap = NULL,
      and then irttp_connect_request() almost immediately dereferences it.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3e6952c
    • M
      sctp: also point GSO head_skb to the sk when it's available · 52253db9
      Marcelo Ricardo Leitner 提交于
      The head skb for GSO packets won't travel through the inner depths of
      SCTP stack as it doesn't contain any chunks on it. That means skb->sk
      doesn't get set and then when sctp_recvmsg() calls
      sctp_inet6_skb_msgname() on the head_skb it panics, as this last needs
      to check flags at the socket (sp->v4mapped).
      
      The fix is to initialize skb->sk for th head skb once we are able to do
      it. That is, when the first chunk is processed.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52253db9
    • M
      sctp: fix BH handling on socket backlog · eefc1b1d
      Marcelo Ricardo Leitner 提交于
      Now that the backlog processing is called with BH enabled, we have to
      disable BH before taking the socket lock via bh_lock_sock() otherwise
      it may dead lock:
      
      sctp_backlog_rcv()
                      bh_lock_sock(sk);
      
                      if (sock_owned_by_user(sk)) {
                              if (sk_add_backlog(sk, skb, sk->sk_rcvbuf))
                                      sctp_chunk_free(chunk);
                              else
                                      backloged = 1;
                      } else
                              sctp_inq_push(inqueue, chunk);
      
                      bh_unlock_sock(sk);
      
      while sctp_inq_push() was disabling/enabling BH, but enabling BH
      triggers pending softirq, which then may try to re-lock the socket in
      sctp_rcv().
      
      [  219.187215]  <IRQ>
      [  219.187217]  [<ffffffff817ca3e0>] _raw_spin_lock+0x20/0x30
      [  219.187223]  [<ffffffffa041888c>] sctp_rcv+0x48c/0xba0 [sctp]
      [  219.187225]  [<ffffffff816e7db2>] ? nf_iterate+0x62/0x80
      [  219.187226]  [<ffffffff816f1b14>] ip_local_deliver_finish+0x94/0x1e0
      [  219.187228]  [<ffffffff816f1e1f>] ip_local_deliver+0x6f/0xf0
      [  219.187229]  [<ffffffff816f1a80>] ? ip_rcv_finish+0x3b0/0x3b0
      [  219.187230]  [<ffffffff816f17a8>] ip_rcv_finish+0xd8/0x3b0
      [  219.187232]  [<ffffffff816f2122>] ip_rcv+0x282/0x3a0
      [  219.187233]  [<ffffffff810d8bb6>] ? update_curr+0x66/0x180
      [  219.187235]  [<ffffffff816abac4>] __netif_receive_skb_core+0x524/0xa90
      [  219.187236]  [<ffffffff810d8e00>] ? update_cfs_shares+0x30/0xf0
      [  219.187237]  [<ffffffff810d557c>] ? __enqueue_entity+0x6c/0x70
      [  219.187239]  [<ffffffff810dc454>] ? enqueue_entity+0x204/0xdf0
      [  219.187240]  [<ffffffff816ac048>] __netif_receive_skb+0x18/0x60
      [  219.187242]  [<ffffffff816ad1ce>] process_backlog+0x9e/0x140
      [  219.187243]  [<ffffffff816ac8ec>] net_rx_action+0x22c/0x370
      [  219.187245]  [<ffffffff817cd352>] __do_softirq+0x112/0x2e7
      [  219.187247]  [<ffffffff817cc3bc>] do_softirq_own_stack+0x1c/0x30
      [  219.187247]  <EOI>
      [  219.187248]  [<ffffffff810aa1c8>] do_softirq.part.14+0x38/0x40
      [  219.187249]  [<ffffffff810aa24d>] __local_bh_enable_ip+0x7d/0x80
      [  219.187254]  [<ffffffffa0408428>] sctp_inq_push+0x68/0x80 [sctp]
      [  219.187258]  [<ffffffffa04190f1>] sctp_backlog_rcv+0x151/0x1c0 [sctp]
      [  219.187260]  [<ffffffff81692b07>] __release_sock+0x87/0xf0
      [  219.187261]  [<ffffffff81692ba0>] release_sock+0x30/0xa0
      [  219.187265]  [<ffffffffa040e46d>] sctp_accept+0x17d/0x210 [sctp]
      [  219.187266]  [<ffffffff810e7510>] ? prepare_to_wait_event+0xf0/0xf0
      [  219.187268]  [<ffffffff8172d52c>] inet_accept+0x3c/0x130
      [  219.187269]  [<ffffffff8168d7a3>] SYSC_accept4+0x103/0x210
      [  219.187271]  [<ffffffff817ca2ba>] ? _raw_spin_unlock_bh+0x1a/0x20
      [  219.187272]  [<ffffffff81692bfc>] ? release_sock+0x8c/0xa0
      [  219.187276]  [<ffffffffa0413e22>] ? sctp_inet_listen+0x62/0x1b0 [sctp]
      [  219.187277]  [<ffffffff8168f2d0>] SyS_accept+0x10/0x20
      
      Fixes: 860fbbc3 ("sctp: prepare for socket backlog behavior change")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eefc1b1d
    • C
      kcm: remove redundant -ve error check and return path · 0a58f474
      Colin Ian King 提交于
      The check for a -ve error is redundant, remove it and just
      immediately return the return value from the call to
      seq_open_net.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a58f474
    • M
      net: ipv6: Always leave anycast and multicast groups on link down · ea06f717
      Mike Manning 提交于
      Default kernel behavior is to delete IPv6 addresses on link
      down, which entails deletion of the multicast and the
      subnet-router anycast addresses. These deletions do not
      happen with sysctl setting to keep global IPv6 addresses on
      link down, so every link down/up causes an increment of the
      anycast and multicast refcounts. These bogus refcounts may
      stop these addrs from being removed on subsequent calls to
      delete them. The solution is to leave the groups for the
      multicast and subnet anycast on link down for the callflow
      when global IPv6 addresses are kept.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Signed-off-by: NMike Manning <mmanning@brocade.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea06f717
    • X
      sctp: use inet_recvmsg to support sctp RFS well · fd2d180a
      Xin Long 提交于
      Commit 486bdee0 ("sctp: add support for RPS and RFS")
      saves skb->hash into sk->sk_rxhash so that the inet_* can
      record it to flow table.
      
      But sctp uses sock_common_recvmsg as .recvmsg instead
      of inet_recvmsg, sock_common_recvmsg doesn't invoke
      sock_rps_record_flow to record the flow. It may cause
      that the receiver has no chances to record the flow if
      it doesn't send msg or poll the socket.
      
      So this patch fixes it by using inet_recvmsg as .recvmsg
      in sctp.
      
      Fixes: 486bdee0 ("sctp: add support for RPS and RFS")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd2d180a
    • I
      bridge: Fix incorrect re-injection of LLDP packets · baedbe55
      Ido Schimmel 提交于
      Commit 8626c56c ("bridge: fix potential use-after-free when hook
      returns QUEUE or STOLEN verdict") caused LLDP packets arriving through a
      bridge port to be re-injected to the Rx path with skb->dev set to the
      bridge device, but this breaks the lldpad daemon.
      
      The lldpad daemon opens a packet socket with protocol set to ETH_P_LLDP
      for any valid device on the system, which doesn't not include soft
      devices such as bridge and VLAN.
      
      Since packet sockets (ptype_base) are processed in the Rx path after the
      Rx handler, LLDP packets with skb->dev set to the bridge device never
      reach the lldpad daemon.
      
      Fix this by making the bridge's Rx handler re-inject LLDP packets with
      RX_HANDLER_PASS, which effectively restores the behaviour prior to the
      mentioned commit.
      
      This means netfilter will never receive LLDP packets coming through a
      bridge port, as I don't see a way in which we can have okfn() consume
      the packet without breaking existing behaviour. I've already carried out
      a similar fix for STP packets in commit 56fae404 ("bridge: Fix
      incorrect re-injection of STP packets").
      
      Fixes: 8626c56c ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      baedbe55
    • X
      sctp: support ipv6 nonlocal bind · 9b974202
      Xin Long 提交于
      This patch makes sctp support ipv6 nonlocal bind by adding
      sp->inet.freebind and net->ipv6.sysctl.ip_nonlocal_bind
      check in sctp_v6_available as what sctp did to support
      ipv4 nonlocal bind (commit cdac4e07).
      Reported-by: NShijoe George <spanjikk@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b974202
    • D
      bpf, events: fix offset in skb copy handler · aa7145c1
      Daniel Borkmann 提交于
      This patch fixes the __output_custom() routine we currently use with
      bpf_skb_copy(). I missed that when len is larger than the size of the
      current handle, we can issue multiple invocations of copy_func, and
      __output_custom() advances destination but also source buffer by the
      written amount of bytes. When we have __output_custom(), this is actually
      wrong since in that case the source buffer points to a non-linear object,
      in our case an skb, which the copy_func helper is supposed to walk.
      Therefore, since this is non-linear we thus need to pass the offset into
      the helper, so that copy_func can use it for extracting the data from
      the source object.
      
      Therefore, adjust the callback signatures properly and pass offset
      into the skb_header_pointer() invoked from bpf_skb_copy() callback. The
      __DEFINE_OUTPUT_COPY_BODY() is adjusted to accommodate for two things:
      i) to pass in whether we should advance source buffer or not; this is
      a compile-time constant condition, ii) to pass in the offset for
      __output_custom(), which we do with help of __VA_ARGS__, so everything
      can stay inlined as is currently. Both changes allow for adapting the
      __output_* fast-path helpers w/o extra overhead.
      
      Fixes: 555c8a86 ("bpf: avoid stack copy and use skb ctx for event output")
      Fixes: 7e3f977e ("perf, events: add non-linear data support for raw records")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa7145c1
    • A
      net/ncsi: avoid maybe-uninitialized warning · a1b43edd
      Arnd Bergmann 提交于
      gcc-4.9 and higher warn about the newly added NSCI code:
      
      net/ncsi/ncsi-manage.c: In function 'ncsi_process_next_channel':
      net/ncsi/ncsi-manage.c:1003:2: error: 'old_state' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      The warning is a false positive and therefore harmless, but it would be good to
      avoid it anyway. I have determined that the barrier in the spin_unlock_irqsave()
      is what confuses gcc to the point that it cannot track whether the variable
      was unused or not.
      
      This rearranges the code in a way that makes it obvious to gcc that old_state
      is always initialized at the time of use, functionally this should not
      change anything.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1b43edd
    • V
      net: bridge: br_set_ageing_time takes a clock_t · 9e0b27fe
      Vivien Didelot 提交于
      Change the ageing_time type in br_set_ageing_time() from u32 to what it
      is expected to be, i.e. a clock_t.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e0b27fe
    • V
      net: bridge: fix br_stp_enable_bridge comment · dba479f3
      Vivien Didelot 提交于
      br_stp_enable_bridge() does take the br->lock spinlock. Fix its wrongly
      pasted comment and use the same as br_stp_disable_bridge().
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dba479f3