1. 20 2月, 2016 1 次提交
  2. 19 2月, 2016 2 次提交
    • J
      gre: clear IFF_TX_SKB_SHARING · d13b161c
      Jiri Benc 提交于
      ether_setup sets IFF_TX_SKB_SHARING but this is not supported by gre
      as it modifies the skb on xmit.
      
      Also, clean up whitespace in ipgre_tap_setup when we're already touching it.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d13b161c
    • E
      tcp/dccp: fix another race at listener dismantle · 7716682c
      Eric Dumazet 提交于
      Ilya reported following lockdep splat:
      
      kernel: =========================
      kernel: [ BUG: held lock freed! ]
      kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted
      kernel: -------------------------
      kernel: swapper/5/0 is freeing memory
      ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
      kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at:
      [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
      kernel: 4 locks held by swapper/5/0:
      kernel: #0:  (rcu_read_lock){......}, at: [<ffffffff8169ef6b>]
      netif_receive_skb_internal+0x4b/0x1f0
      kernel: #1:  (rcu_read_lock){......}, at: [<ffffffff816e977f>]
      ip_local_deliver_finish+0x3f/0x380
      kernel: #2:  (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>]
      sk_clone_lock+0x19b/0x440
      kernel: #3:  (&(&queue->rskq_lock)->rlock){+.-...}, at:
      [<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
      
      To properly fix this issue, inet_csk_reqsk_queue_add() needs
      to return to its callers if the child as been queued
      into accept queue.
      
      We also need to make sure listener is still there before
      calling sk->sk_data_ready(), by holding a reference on it,
      since the reference carried by the child can disappear as
      soon as the child is put on accept queue.
      Reported-by: NIlya Dryomov <idryomov@gmail.com>
      Fixes: ebb516af ("tcp/dccp: fix race at listener dismantle phase")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7716682c
  3. 09 2月, 2016 1 次提交
  4. 08 2月, 2016 1 次提交
  5. 06 2月, 2016 1 次提交
    • S
      ipv6: addrconf: Fix recursive spin lock call · 16186a82
      subashab@codeaurora.org 提交于
      A rcu stall with the following backtrace was seen on a system with
      forwarding, optimistic_dad and use_optimistic set. To reproduce,
      set these flags and allow ipv6 autoconf.
      
      This occurs because the device write_lock is acquired while already
      holding the read_lock. Back trace below -
      
      INFO: rcu_preempt self-detected stall on CPU { 1}  (t=2100 jiffies
       g=3992 c=3991 q=4471)
      <6> Task dump for CPU 1:
      <2> kworker/1:0     R  running task    12168    15   2 0x00000002
      <2> Workqueue: ipv6_addrconf addrconf_dad_work
      <6> Call trace:
      <2> [<ffffffc000084da8>] el1_irq+0x68/0xdc
      <2> [<ffffffc000cc4e0c>] _raw_write_lock_bh+0x20/0x30
      <2> [<ffffffc000bc5dd8>] __ipv6_dev_ac_inc+0x64/0x1b4
      <2> [<ffffffc000bcbd2c>] addrconf_join_anycast+0x9c/0xc4
      <2> [<ffffffc000bcf9f0>] __ipv6_ifa_notify+0x160/0x29c
      <2> [<ffffffc000bcfb7c>] ipv6_ifa_notify+0x50/0x70
      <2> [<ffffffc000bd035c>] addrconf_dad_work+0x314/0x334
      <2> [<ffffffc0000b64c8>] process_one_work+0x244/0x3fc
      <2> [<ffffffc0000b7324>] worker_thread+0x2f8/0x418
      <2> [<ffffffc0000bb40c>] kthread+0xe0/0xec
      
      v2: do addrconf_dad_kick inside read lock and then acquire write
      lock for ipv6_ifa_notify as suggested by Eric
      
      Fixes: 7fd2561e ("net: ipv6: Add a sysctl to make optimistic
      addresses useful candidates")
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Erik Kline <ek@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16186a82
  6. 01 2月, 2016 1 次提交
    • F
      netfilter: conntrack: resched in nf_ct_iterate_cleanup · d93c6258
      Florian Westphal 提交于
      Ulrich reports soft lockup with following (shortened) callchain:
      
      NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
      __netif_receive_skb_core+0x6e4/0x774
      process_backlog+0x94/0x160
      net_rx_action+0x88/0x178
      call_do_softirq+0x24/0x3c
      do_softirq+0x54/0x6c
      __local_bh_enable_ip+0x7c/0xbc
      nf_ct_iterate_cleanup+0x11c/0x22c [nf_conntrack]
      masq_inet_event+0x20/0x30 [nf_nat_masquerade_ipv6]
      atomic_notifier_call_chain+0x1c/0x2c
      ipv6_del_addr+0x1bc/0x220 [ipv6]
      
      Problem is that nf_ct_iterate_cleanup can run for a very long time
      since it can be interrupted by softirq processing.
      Moreover, atomic_notifier_call_chain runs with rcu readlock held.
      
      So lets call cond_resched() in nf_ct_iterate_cleanup and defer
      the call to a work queue for the atomic_notifier_call_chain case.
      
      We also need another cond_resched in get_next_corpse, since we
      have to deal with iter() always returning false, in that case
      get_next_corpse will walk entire conntrack table.
      Reported-by: NUlrich Weber <uw@ocedo.com>
      Tested-by: NUlrich Weber <uw@ocedo.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d93c6258
  7. 30 1月, 2016 2 次提交
  8. 26 1月, 2016 2 次提交
  9. 21 1月, 2016 1 次提交
  10. 20 1月, 2016 1 次提交
    • E
      udp: fix potential infinite loop in SO_REUSEPORT logic · ed0dfffd
      Eric Dumazet 提交于
      Using a combination of connected and un-connected sockets, Dmitry
      was able to trigger soft lockups with his fuzzer.
      
      The problem is that sockets in the SO_REUSEPORT array might have
      different scores.
      
      Right after sk2=socket(), setsockopt(sk2,...,SO_REUSEPORT, on) and
      bind(sk2, ...), but _before_ the connect(sk2) is done, sk2 is added into
      the soreuseport array, with a score which is smaller than the score of
      first socket sk1 found in hash table (I am speaking of the regular UDP
      hash table), if sk1 had the connect() done, giving a +8 to its score.
      
      hash bucket [X] -> sk1 -> sk2 -> NULL
      
      sk1 score = 14  (because it did a connect())
      sk2 score = 6
      
      SO_REUSEPORT fast selection is an optimization. If it turns out the
      score of the selected socket does not match score of first socket, just
      fallback to old SO_REUSEPORT logic instead of trying to be too smart.
      
      Normal SO_REUSEPORT users do not mix different kind of sockets, as this
      mechanism is used for load balance traffic.
      
      Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Craig Gallek <kraigatgoog@gmail.com>
      Acked-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed0dfffd
  11. 16 1月, 2016 1 次提交
  12. 15 1月, 2016 1 次提交
  13. 12 1月, 2016 1 次提交
  14. 11 1月, 2016 2 次提交
  15. 06 1月, 2016 2 次提交
  16. 05 1月, 2016 3 次提交
    • C
      soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF · 538950a1
      Craig Gallek 提交于
      Expose socket options for setting a classic or extended BPF program
      for use when selecting sockets in an SO_REUSEPORT group.  These options
      can be used on the first socket to belong to a group before bind or
      on any socket in the group after bind.
      
      This change includes refactoring of the existing sk_filter code to
      allow reuse of the existing BPF filter validation checks.
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      538950a1
    • C
      soreuseport: fast reuseport UDP socket selection · e32ea7e7
      Craig Gallek 提交于
      Include a struct sock_reuseport instance when a UDP socket binds to
      a specific address for the first time with the reuseport flag set.
      When selecting a socket for an incoming UDP packet, use the information
      available in sock_reuseport if present.
      
      This required adding an additional field to the UDP source address
      equality function to differentiate between exact and wildcard matches.
      The original use case allowed wildcard matches when checking for
      existing port uses during bind.  The new use case of adding a socket
      to a reuseport group requires exact address matching.
      
      Performance test (using a machine with 2 CPU sockets and a total of
      48 cores):  Create reuseport groups of varying size.  Use one socket
      from this group per user thread (pinning each thread to a different
      core) calling recvmmsg in a tight loop.  Record number of messages
      received per second while saturating a 10G link.
        10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
        20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
        40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)
      
      This work is based off a similar implementation written by
      Ying Cai <ycai@google.com> for implementing policy-based reuseport
      selection.
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e32ea7e7
    • E
      udp: properly support MSG_PEEK with truncated buffers · 197c949e
      Eric Dumazet 提交于
      Backport of this upstream commit into stable kernels :
      89c22d8c ("net: Fix skb csum races when peeking")
      exposed a bug in udp stack vs MSG_PEEK support, when user provides
      a buffer smaller than skb payload.
      
      In this case,
      skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
                                       msg->msg_iov);
      returns -EFAULT.
      
      This bug does not happen in upstream kernels since Al Viro did a great
      job to replace this into :
      skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
      This variant is safe vs short buffers.
      
      For the time being, instead reverting Herbert Xu patch and add back
      skb->ip_summed invalid changes, simply store the result of
      udp_lib_checksum_complete() so that we avoid computing the checksum a
      second time, and avoid the problematic
      skb_copy_and_csum_datagram_iovec() call.
      
      This patch can be applied on recent kernels as it avoids a double
      checksumming, then backported to stable kernels as a bug fix.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      197c949e
  17. 29 12月, 2015 1 次提交
  18. 26 12月, 2015 1 次提交
  19. 24 12月, 2015 1 次提交
  20. 23 12月, 2015 4 次提交
  21. 19 12月, 2015 3 次提交
    • D
      net: Allow accepted sockets to be bound to l3mdev domain · 6dd9a14e
      David Ahern 提交于
      Allow accepted sockets to derive their sk_bound_dev_if setting from the
      l3mdev domain in which the packets originated. A sysctl setting is added
      to control the behavior which is similar to sk_mark and
      sysctl_tcp_fwmark_accept.
      
      This effectively allow a process to have a "VRF-global" listen socket,
      with child sockets bound to the VRF device in which the packet originated.
      A similar behavior can be achieved using sk_mark, but a solution using marks
      is incomplete as it does not handle duplicate addresses in different L3
      domains/VRFs. Allowing sockets to inherit the sk_bound_dev_if from l3mdev
      domain provides a complete solution.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dd9a14e
    • B
      ipv6: addrconf: use stable address generator for ARPHRD_NONE · cc9da6cc
      Bjørn Mork 提交于
      Add a new address generator mode, using the stable address generator
      with an automatically generated secret. This is intended as a default
      address generator mode for device types with no EUI64 implementation.
      The new generator is used for ARPHRD_NONE interfaces initially, adding
      default IPv6 autoconf support to e.g. tun interfaces.
      
      If the addrgenmode is set to 'random', either by default or manually,
      and no stable secret is available, then a random secret is used as
      input for the stable-privacy address generator.  The secret can be
      read and modified like manually configured secrets, using the proc
      interface.  Modifying the secret will change the addrgen mode to
      'stable-privacy' to indicate that it operates on a known secret.
      
      Existing behaviour of the 'stable-privacy' mode is kept unchanged. If
      a known secret is available when the device is created, then the mode
      will default to 'stable-privacy' as before.  The mode can be manually
      set to 'random' but it will behave exactly like 'stable-privacy' in
      this case. The secret will not change.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: 吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc9da6cc
    • A
      ila: add NETFILTER dependency · 8cb964da
      Arnd Bergmann 提交于
      The recently added generic ILA translation facility fails to
      build when CONFIG_NETFILTER is disabled:
      
      net/ipv6/ila/ila_xlat.c:229:20: warning: 'struct nf_hook_state' declared inside parameter list
      net/ipv6/ila/ila_xlat.c:235:27: error: array type has incomplete element type 'struct nf_hook_ops'
       static struct nf_hook_ops ila_nf_hook_ops[] __read_mostly = {
      
      This adds an explicit Kconfig dependency to avoid that case.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: 7f00feaf ("ila: Add generic ILA translation facility")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb964da
  22. 18 12月, 2015 2 次提交
  23. 16 12月, 2015 5 次提交