1. 17 6月, 2020 2 次提交
    • H
      tipc: update a binding service via broadcast · cad2929d
      Hoang Huu Le 提交于
      Currently, updating binding table (add service binding to
      name table/withdraw a service binding) is being sent over replicast.
      However, if we are scaling up clusters to > 100 nodes/containers this
      method is less affection because of looping through nodes in a cluster one
      by one.
      
      It is worth to use broadcast to update a binding service. This way, the
      binding table can be updated on all peer nodes in one shot.
      
      Broadcast is used when all peer nodes, as indicated by a new capability
      flag TIPC_NAMED_BCAST, support reception of this message type.
      
      Four problems need to be considered when introducing this feature.
      1) When establishing a link to a new peer node we still update this by a
      unicast 'bulk' update. This may lead to race conditions, where a later
      broadcast publication/withdrawal bypass the 'bulk', resulting in
      disordered publications, or even that a withdrawal may arrive before the
      corresponding publication. We solve this by adding an 'is_last_bulk' bit
      in the last bulk messages so that it can be distinguished from all other
      messages. Only when this message has arrived do we open up for reception
      of broadcast publications/withdrawals.
      
      2) When a first legacy node is added to the cluster all distribution
      will switch over to use the legacy 'replicast' method, while the
      opposite happens when the last legacy node leaves the cluster. This
      entails another risk of message disordering that has to be handled. We
      solve this by adding a sequence number to the broadcast/replicast
      messages, so that disordering can be discovered and corrected. Note
      however that we don't need to consider potential message loss or
      duplication at this protocol level.
      
      3) Bulk messages don't contain any sequence numbers, and will always
      arrive in order. Hence we must exempt those from the sequence number
      control and deliver them unconditionally. We solve this by adding a new
      'is_bulk' bit in those messages so that they can be recognized.
      
      4) Legacy messages, which don't contain any new bits or sequence
      numbers, but neither can arrive out of order, also need to be exempt
      from the initial synchronization and sequence number check, and
      delivered unconditionally. Therefore, we add another 'is_not_legacy' bit
      to all new messages so that those can be distinguished from legacy
      messages and the latter delivered directly.
      
      v1->v2:
       - fix warning issue reported by kbuild test robot <lkp@intel.com>
       - add santiy check to drop the publication message with a sequence
      number that is lower than the agreed synch point
      Signed-off-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NHoang Huu Le <hoang.h.le@dektech.com.au>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cad2929d
    • E
      tcp: grow window for OOO packets only for SACK flows · 66205121
      Eric Dumazet 提交于
      Back in 2013, we made a change that broke fast retransmit
      for non SACK flows.
      
      Indeed, for these flows, a sender needs to receive three duplicate
      ACK before starting fast retransmit. Sending ACK with different
      receive window do not count.
      
      Even if enabling SACK is strongly recommended these days,
      there still are some cases where it has to be disabled.
      
      Not increasing the window seems better than having to
      rely on RTO.
      
      After the fix, following packetdrill test gives :
      
      // Initialize connection
          0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
         +0 bind(3, ..., ...) = 0
         +0 listen(3, 1) = 0
      
         +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
         +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 8>
         +0 < . 1:1(0) ack 1 win 514
      
         +0 accept(3, ..., ...) = 4
      
         +0 < . 1:1001(1000) ack 1 win 514
      // Quick ack
         +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 2001:3001(1000) ack 1 win 514
      // DUPACK : Normally we should not change the window
         +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 3001:4001(1000) ack 1 win 514
      // DUPACK : Normally we should not change the window
         +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 4001:5001(1000) ack 1 win 514
      // DUPACK : Normally we should not change the window
          +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 1001:2001(1000) ack 1 win 514
      // Hole is repaired.
         +0 > . 1:1(0) ack 5001 win 272
      
      Fixes: 4e4f1fc2 ("tcp: properly increase rcv_ssthresh for ofo packets")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66205121
  2. 16 6月, 2020 7 次提交
  3. 14 6月, 2020 2 次提交
  4. 13 6月, 2020 4 次提交
  5. 12 6月, 2020 16 次提交
  6. 11 6月, 2020 5 次提交
  7. 10 6月, 2020 4 次提交
    • W
      dccp: Fix possible memleak in dccp_init and dccp_fini · c96b6acc
      Wang Hai 提交于
      There are some memory leaks in dccp_init() and dccp_fini().
      
      In dccp_fini() and the error handling path in dccp_init(), free lhash2
      is missing. Add inet_hashinfo2_free_mod() to do it.
      
      If inet_hashinfo2_init_mod() failed in dccp_init(),
      percpu_counter_destroy() should be called to destroy dccp_orphan_count.
      It need to goto out_free_percpu when inet_hashinfo2_init_mod() failed.
      
      Fixes: c92c81df ("net: dccp: fix kernel crash on module load")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NWang Hai <wanghai38@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c96b6acc
    • V
      net: sched: export __netdev_watchdog_up() · 1a3db27a
      Valentin Longchamp 提交于
      Since the quiesce/activate rework, __netdev_watchdog_up() is directly
      called in the ucc_geth driver.
      
      Unfortunately, this function is not available for modules and thus
      ucc_geth cannot be built as a module anymore. Fix it by exporting
      __netdev_watchdog_up().
      
      Since the commit introducing the regression was backported to stable
      branches, this one should ideally be as well.
      
      Fixes: 79dde73c ("net/ethernet/freescale: rework quiesce/activate for ucc_geth")
      Signed-off-by: NValentin Longchamp <valentin@longchamp.me>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a3db27a
    • C
      net: change addr_list_lock back to static key · 845e0ebb
      Cong Wang 提交于
      The dynamic key update for addr_list_lock still causes troubles,
      for example the following race condition still exists:
      
      CPU 0:				CPU 1:
      (RCU read lock)			(RTNL lock)
      dev_mc_seq_show()		netdev_update_lockdep_key()
      				  -> lockdep_unregister_key()
       -> netif_addr_lock_bh()
      
      because lockdep doesn't provide an API to update it atomically.
      Therefore, we have to move it back to static keys and use subclass
      for nest locking like before.
      
      In commit 1a33e10e ("net: partially revert dynamic lockdep key
      changes"), I already reverted most parts of commit ab92d68f
      ("net: core: add generic lockdep keys").
      
      This patch reverts the rest and also part of commit f3b0a18b
      ("net: remove unnecessary variables and callback"). After this
      patch, addr_list_lock changes back to using static keys and
      subclasses to satisfy lockdep. Thanks to dev->lower_level, we do
      not have to change back to ->ndo_get_lock_subclass().
      
      And hopefully this reduces some syzbot lockdep noises too.
      
      Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com
      Cc: Taehee Yoo <ap420073@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      845e0ebb
    • J
      bpf, sockhash: Synchronize delete from bucket list on map free · 75e68e5b
      Jakub Sitnicki 提交于
      We can end up modifying the sockhash bucket list from two CPUs when a
      sockhash is being destroyed (sock_hash_free) on one CPU, while a socket
      that is in the sockhash is unlinking itself from it on another CPU
      it (sock_hash_delete_from_link).
      
      This results in accessing a list element that is in an undefined state as
      reported by KASAN:
      
      | ==================================================================
      | BUG: KASAN: wild-memory-access in sock_hash_free+0x13c/0x280
      | Write of size 8 at addr dead000000000122 by task kworker/2:1/95
      |
      | CPU: 2 PID: 95 Comm: kworker/2:1 Not tainted 5.7.0-rc7-02961-ge22c35ab0038-dirty #691
      | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
      | Workqueue: events bpf_map_free_deferred
      | Call Trace:
      |  dump_stack+0x97/0xe0
      |  ? sock_hash_free+0x13c/0x280
      |  __kasan_report.cold+0x5/0x40
      |  ? mark_lock+0xbc1/0xc00
      |  ? sock_hash_free+0x13c/0x280
      |  kasan_report+0x38/0x50
      |  ? sock_hash_free+0x152/0x280
      |  sock_hash_free+0x13c/0x280
      |  bpf_map_free_deferred+0xb2/0xd0
      |  ? bpf_map_charge_finish+0x50/0x50
      |  ? rcu_read_lock_sched_held+0x81/0xb0
      |  ? rcu_read_lock_bh_held+0x90/0x90
      |  process_one_work+0x59a/0xac0
      |  ? lock_release+0x3b0/0x3b0
      |  ? pwq_dec_nr_in_flight+0x110/0x110
      |  ? rwlock_bug.part.0+0x60/0x60
      |  worker_thread+0x7a/0x680
      |  ? _raw_spin_unlock_irqrestore+0x4c/0x60
      |  kthread+0x1cc/0x220
      |  ? process_one_work+0xac0/0xac0
      |  ? kthread_create_on_node+0xa0/0xa0
      |  ret_from_fork+0x24/0x30
      | ==================================================================
      
      Fix it by reintroducing spin-lock protected critical section around the
      code that removes the elements from the bucket on sockhash free.
      
      To do that we also need to defer processing of removed elements, until out
      of atomic context so that we can unlink the socket from the map when
      holding the sock lock.
      
      Fixes: 90db6d77 ("bpf, sockmap: Remove bucket->lock from sock_{hash|map}_free")
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200607205229.2389672-3-jakub@cloudflare.com
      75e68e5b