1. 02 4月, 2016 1 次提交
    • D
      tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter · 5a5abb1f
      Daniel Borkmann 提交于
      Sasha Levin reported a suspicious rcu_dereference_protected() warning
      found while fuzzing with trinity that is similar to this one:
      
        [   52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage!
        [   52.765688] other info that might help us debug this:
        [   52.765695] rcu_scheduler_active = 1, debug_locks = 1
        [   52.765701] 1 lock held by a.out/1525:
        [   52.765704]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816a64b7>] rtnl_lock+0x17/0x20
        [   52.765721] stack backtrace:
        [   52.765728] CPU: 1 PID: 1525 Comm: a.out Not tainted 4.5.0+ #264
        [...]
        [   52.765768] Call Trace:
        [   52.765775]  [<ffffffff813e488d>] dump_stack+0x85/0xc8
        [   52.765784]  [<ffffffff810f2fa5>] lockdep_rcu_suspicious+0xd5/0x110
        [   52.765792]  [<ffffffff816afdc2>] sk_detach_filter+0x82/0x90
        [   52.765801]  [<ffffffffa0883425>] tun_detach_filter+0x35/0x90 [tun]
        [   52.765810]  [<ffffffffa0884ed4>] __tun_chr_ioctl+0x354/0x1130 [tun]
        [   52.765818]  [<ffffffff8136fed0>] ? selinux_file_ioctl+0x130/0x210
        [   52.765827]  [<ffffffffa0885ce3>] tun_chr_ioctl+0x13/0x20 [tun]
        [   52.765834]  [<ffffffff81260ea6>] do_vfs_ioctl+0x96/0x690
        [   52.765843]  [<ffffffff81364af3>] ? security_file_ioctl+0x43/0x60
        [   52.765850]  [<ffffffff81261519>] SyS_ioctl+0x79/0x90
        [   52.765858]  [<ffffffff81003ba2>] do_syscall_64+0x62/0x140
        [   52.765866]  [<ffffffff817d563f>] entry_SYSCALL64_slow_path+0x25/0x25
      
      Same can be triggered with PROVE_RCU (+ PROVE_RCU_REPEATEDLY) enabled
      from tun_attach_filter() when user space calls ioctl(tun_fd, TUN{ATTACH,
      DETACH}FILTER, ...) for adding/removing a BPF filter on tap devices.
      
      Since the fix in f91ff5b9 ("net: sk_{detach|attach}_filter() rcu
      fixes") sk_attach_filter()/sk_detach_filter() now dereferences the
      filter with rcu_dereference_protected(), checking whether socket lock
      is held in control path.
      
      Since its introduction in 99405162 ("tun: socket filter support"),
      tap filters are managed under RTNL lock from __tun_chr_ioctl(). Thus the
      sock_owned_by_user(sk) doesn't apply in this specific case and therefore
      triggers the false positive.
      
      Extend the BPF API with __sk_attach_filter()/__sk_detach_filter() pair
      that is used by tap filters and pass in lockdep_rtnl_is_held() for the
      rcu_dereference_protected() checks instead.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a5abb1f
  2. 02 3月, 2016 1 次提交
    • P
      net/tun: implement ndo_set_rx_headroom · eaea34b2
      Paolo Abeni 提交于
      ndo_set_rx_headroom controls the align value used by tun devices to
      allocate skbs on frame reception.
      When the xmit device adds a large encapsulation, this avoids an skb
      head reallocation on forwarding.
      
      The measured improvement when forwarding towards a vxlan dev with
      frame size below the egress device MTU is as follow:
      
      vxlan over ipv6, bridged: +6%
      vxlan over ipv6, ovs: +7%
      
      In case of ipv4 tunnels there is no improvement, since the tun
      device default alignment provides enough headroom to avoid the skb
      head reallocation.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaea34b2
  3. 18 12月, 2015 1 次提交
  4. 02 12月, 2015 1 次提交
    • E
      net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA · 9cd3e072
      Eric Dumazet 提交于
      This patch is a cleanup to make following patch easier to
      review.
      
      Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
      from (struct socket)->flags to a (struct socket_wq)->flags
      to benefit from RCU protection in sock_wake_async()
      
      To ease backports, we rename both constants.
      
      Two new helpers, sk_set_bit(int nr, struct sock *sk)
      and sk_clear_bit(int net, struct sock *sk) are added so that
      following patch can change their implementation.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cd3e072
  5. 13 10月, 2015 1 次提交
  6. 04 8月, 2015 1 次提交
  7. 01 6月, 2015 3 次提交
  8. 11 5月, 2015 2 次提交
  9. 12 4月, 2015 1 次提交
  10. 03 3月, 2015 1 次提交
  11. 09 2月, 2015 1 次提交
    • E
      net: rfs: add hash collision detection · 567e4b79
      Eric Dumazet 提交于
      Receive Flow Steering is a nice solution but suffers from
      hash collisions when a mix of connected and unconnected traffic
      is received on the host, when flow hash table is populated.
      
      Also, clearing flow in inet_release() makes RFS not very good
      for short lived flows, as many packets can follow close().
      (FIN , ACK packets, ...)
      
      This patch extends the information stored into global hash table
      to not only include cpu number, but upper part of the hash value.
      
      I use a 32bit value, and dynamically split it in two parts.
      
      For host with less than 64 possible cpus, this gives 6 bits for the
      cpu number, and 26 (32-6) bits for the upper part of the hash.
      
      Since hash bucket selection use low order bits of the hash, we have
      a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big
      enough.
      
      If the hash found in flow table does not match, we fallback to RPS (if
      it is enabled for the rxqueue).
      
      This means that a packet for an non connected flow can avoid the
      IPI through a unrelated/victim CPU.
      
      This also means we no longer have to clear the table at socket
      close time, and this helps short lived flows performance.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      567e4b79
  12. 05 2月, 2015 1 次提交
  13. 04 2月, 2015 2 次提交
  14. 14 1月, 2015 1 次提交
  15. 13 1月, 2015 1 次提交
    • P
      tuntap: Increase the number of queues in tun. · baf71c5c
      Pankaj Gupta 提交于
      Networking under kvm works best if we allocate a per-vCPU RX and TX
      queue in a virtual NIC. This requires a per-vCPU queue on the host side.
      
      It is now safe to increase the maximum number of queues.
      Preceding patch: 'net: allow large number of rx queues'
      made sure this won't cause failures due to high order memory
      allocations. Increase it to 256: this is the max number of vCPUs
      KVM supports.
      
      Size of tun_struct changes from 8512 to 10496 after this patch. This keeps
      pages allocated for tun_struct before and after the patch to 3.
      Signed-off-by: NPankaj Gupta <pagupta@redhat.com>
      Reviewed-by: NDavid Gibson <dgibson@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      baf71c5c
  16. 01 1月, 2015 2 次提交
  17. 17 12月, 2014 1 次提交
  18. 10 12月, 2014 1 次提交
    • A
      put iov_iter into msghdr · c0371da6
      Al Viro 提交于
      Note that the code _using_ ->msg_iter at that point will be very
      unhappy with anything other than unshifted iovec-backed iov_iter.
      We still need to convert users to proper primitives.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c0371da6
  19. 09 12月, 2014 3 次提交
  20. 06 12月, 2014 1 次提交
  21. 03 12月, 2014 1 次提交
  22. 24 11月, 2014 2 次提交
  23. 20 11月, 2014 1 次提交
  24. 14 11月, 2014 1 次提交
  25. 08 11月, 2014 1 次提交
  26. 06 11月, 2014 1 次提交
  27. 04 11月, 2014 2 次提交
    • H
      tun: Fix TUN_PKT_STRIP setting · 2eb783c4
      Herbert Xu 提交于
      We set the flag TUN_PKT_STRIP if the user buffer provided is too
      small to contain the entire packet plus meta-data.  However, this
      has been broken ever since we added GSO meta-data.  VLAN acceleration
      also has the same problem.
      
      This patch fixes this by taking both into account when setting the
      TUN_PKT_STRIP flag.
      
      The fact that this has been broken for six years without anyone
      realising means that nobody actually uses this flag.
      
      Fixes: f43798c2 ("tun: Allow GSO using virtio_net_hdr")
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2eb783c4
    • H
      tun: Fix csum_start with VLAN acceleration · a8f9bfdf
      Herbert Xu 提交于
      When VLAN acceleration is in use on the xmit path, we end up
      setting csum_start to the wrong place.  The result is that the
      whoever ends up doing the checksum setting will corrupt the packet
      instead of writing the checksum to the expected location, usually
      this means writing the checksum with an offset of -4.
      
      This patch fixes this by adjusting csum_start when VLAN acceleration
      is detected.
      
      Fixes: 6680ec68 ("tuntap: hardware vlan tx support")
      Cc: stable@vger.kernel.org
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8f9bfdf
  28. 31 10月, 2014 2 次提交
    • B
      drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets · 5188cd44
      Ben Hutchings 提交于
      UFO is now disabled on all drivers that work with virtio net headers,
      but userland may try to send UFO/IPv6 packets anyway.  Instead of
      sending with ID=0, we should select identifiers on their behalf (as we
      used to).
      Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Fixes: 916e4cf4 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5188cd44
    • B
      drivers/net: Disable UFO through virtio · 3d0ad094
      Ben Hutchings 提交于
      IPv6 does not allow fragmentation by routers, so there is no
      fragmentation ID in the fixed header.  UFO for IPv6 requires the ID to
      be passed separately, but there is no provision for this in the virtio
      net protocol.
      
      Until recently our software implementation of UFO/IPv6 generated a new
      ID, but this was a bug.  Now we will use ID=0 for any UFO/IPv6 packet
      passed through a tap, which is even worse.
      
      Unfortunately there is no distinction between UFO/IPv4 and v6
      features, so disable UFO on taps and virtio_net completely until we
      have a proper solution.
      
      We cannot depend on VM managers respecting the tap feature flags, so
      keep accepting UFO packets but log a warning the first time we do
      this.
      Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Fixes: 916e4cf4 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d0ad094
  29. 10 9月, 2014 1 次提交
  30. 16 7月, 2014 1 次提交
    • T
      net: set name_assign_type in alloc_netdev() · c835a677
      Tom Gundersen 提交于
      Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
      all users to pass NET_NAME_UNKNOWN.
      
      Coccinelle patch:
      
      @@
      expression sizeof_priv, name, setup, txqs, rxqs, count;
      @@
      
      (
      -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
      +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
      |
      -alloc_netdev_mq(sizeof_priv, name, setup, count)
      +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
      |
      -alloc_netdev(sizeof_priv, name, setup)
      +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
      )
      
      v9: move comments here from the wrong commit
      Signed-off-by: NTom Gundersen <teg@jklm.no>
      Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c835a677