1. 04 7月, 2018 2 次提交
    • E
      net: ipv4: listified version of ip_rcv · 17266ee9
      Edward Cree 提交于
      Also involved adding a way to run a netfilter hook over a list of packets.
       Rather than attempting to make netfilter know about lists (which would be
       a major project in itself) we just let it call the regular okfn (in this
       case ip_rcv_finish()) for any packets it steals, and have it give us back
       a list of packets it's synchronously accepted (which normally NF_HOOK
       would automatically call okfn() on, but we want to be able to potentially
       pass the list to a listified version of okfn().)
      The netfilter hooks themselves are indirect calls that still happen per-
       packet (see nf_hook_entry_hookfn()), but again, changing that can be left
       for future work.
      
      There is potential for out-of-order receives if the netfilter hook ends up
       synchronously stealing packets, as they will be processed before any
       accepts earlier in the list.  However, it was already possible for an
       asynchronous accept to cause out-of-order receives, so presumably this is
       considered OK.
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17266ee9
    • E
      net: core: trivial netif_receive_skb_list() entry point · f6ad8c1b
      Edward Cree 提交于
      Just calls netif_receive_skb() in a loop.
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6ad8c1b
  2. 02 7月, 2018 2 次提交
  3. 26 6月, 2018 2 次提交
  4. 05 6月, 2018 3 次提交
  5. 03 6月, 2018 1 次提交
  6. 29 5月, 2018 2 次提交
  7. 25 5月, 2018 2 次提交
    • J
      net: include hash policy in LAG changeupper info · f44aa9ef
      John Hurley 提交于
      LAG upper event notifiers contain the tx type used by the LAG device.
      Extend this to also include the hash policy used for tx types that
      utilize hashing.
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f44aa9ef
    • J
      xdp: change ndo_xdp_xmit API to support bulking · 735fc405
      Jesper Dangaard Brouer 提交于
      This patch change the API for ndo_xdp_xmit to support bulking
      xdp_frames.
      
      When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
      Most of the slowdown is caused by DMA API indirect function calls, but
      also the net_device->ndo_xdp_xmit() call.
      
      Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
      single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
      performance improved:
       for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
       for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps
      
      With frames avail as a bulk inside the driver ndo_xdp_xmit call,
      further optimizations are possible, like bulk DMA-mapping for TX.
      
      Testing without CONFIG_RETPOLINE show the same performance for
      physical NIC drivers.
      
      The virtual NIC driver tun sees a huge performance boost, as it can
      avoid doing per frame producer locking, but instead amortize the
      locking cost over the bulk.
      
      V2: Fix compile errors reported by kbuild test robot <lkp@intel.com>
      V4: Isolated ndo, driver changes and callers.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      735fc405
  8. 04 5月, 2018 1 次提交
  9. 02 5月, 2018 1 次提交
  10. 01 5月, 2018 1 次提交
  11. 30 4月, 2018 2 次提交
  12. 27 4月, 2018 1 次提交
  13. 17 4月, 2018 1 次提交
  14. 30 3月, 2018 2 次提交
  15. 26 3月, 2018 2 次提交
    • K
      net: Drop NETDEV_UNREGISTER_FINAL · 070f2d7e
      Kirill Tkhai 提交于
      Last user is gone after bdf5bd7f "rds: tcp: remove
      register_netdevice_notifier infrastructure.", so we can
      remove this netdevice command. This allows to delete
      rtnl_lock() in netdev_run_todo(), which is hot path for
      net namespace unregistration.
      
      dev_change_net_namespace() and netdev_wait_allrefs()
      have rcu_barrier() before NETDEV_UNREGISTER_FINAL call,
      and the source commits say they were introduced to
      delemit the call with NETDEV_UNREGISTER, but this patch
      leaves them on the places, since they require additional
      analysis, whether we need in them for something else.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      070f2d7e
    • K
      net: Make NETDEV_XXX commands enum { } · ede2762d
      Kirill Tkhai 提交于
      This patch is preparation to drop NETDEV_UNREGISTER_FINAL.
      Since the cmd is used in usnic_ib_netdev_event_to_string()
      to get cmd name, after plain removing NETDEV_UNREGISTER_FINAL
      from everywhere, we'd have holes in event2str[] in this
      function.
      
      Instead of that, let's make NETDEV_XXX commands names
      available for everyone, and to define netdev_cmd_to_name()
      in the way we won't have to shaffle names after their
      numbers are changed.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ede2762d
  16. 14 3月, 2018 1 次提交
    • A
      net: fix sysctl_fb_tunnels_only_for_init_net link error · be9fc097
      Arnd Bergmann 提交于
      The new variable is only available when CONFIG_SYSCTL is enabled,
      otherwise we get a link error:
      
      net/ipv4/ip_tunnel.o: In function `ip_tunnel_init_net':
      ip_tunnel.c:(.text+0x278b): undefined reference to `sysctl_fb_tunnels_only_for_init_net'
      net/ipv6/sit.o: In function `sit_init_net':
      sit.c:(.init.text+0x4c): undefined reference to `sysctl_fb_tunnels_only_for_init_net'
      net/ipv6/ip6_tunnel.o: In function `ip6_tnl_init_net':
      ip6_tunnel.c:(.init.text+0x39): undefined reference to `sysctl_fb_tunnels_only_for_init_net'
      
      This adds an extra condition, keeping the traditional behavior when
      CONFIG_SYSCTL is disabled.
      
      Fixes: 79134e6c ("net: do not create fallback tunnels for non-default namespaces")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be9fc097
  17. 10 3月, 2018 2 次提交
    • P
      net: introduce IFF_NO_RX_HANDLER · f5426250
      Paolo Abeni 提交于
      Some network devices - notably ipvlan slave - are not compatible with
      any kind of rx_handler. Currently the hook can be installed but any
      configuration (bridge, bond, macsec, ...) is nonfunctional.
      
      This change allocates a priv_flag bit to mark such devices and explicitly
      forbid installing a rx_handler if such bit is set. The new bit is used
      by ipvlan slave device.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5426250
    • E
      net: do not create fallback tunnels for non-default namespaces · 79134e6c
      Eric Dumazet 提交于
      fallback tunnels (like tunl0, gre0, gretap0, erspan0, sit0,
      ip6tnl0, ip6gre0) are automatically created when the corresponding
      module is loaded.
      
      These tunnels are also automatically created when a new network
      namespace is created, at a great cost.
      
      In many cases, netns are used for isolation purposes, and these
      extra network devices are a waste of resources. We are using
      thousands of netns per host, and hit the netns creation/delete
      bottleneck a lot. (Many thanks to Kirill for recent work on this)
      
      Add a new sysctl so that we can opt-out from this automatic creation.
      
      Note that these tunnels are still created for the initial namespace,
      to be the least intrusive for typical setups.
      
      Tested:
      lpk43:~# cat add_del_unshare.sh
      for i in `seq 1 40`
      do
       (for j in `seq 1 100` ; do  unshare -n /bin/true >/dev/null ; done) &
      done
      wait
      
      lpk43:~# echo 0 >/proc/sys/net/core/fb_tunnels_only_for_init_net
      lpk43:~# time ./add_del_unshare.sh
      
      real	0m37.521s
      user	0m0.886s
      sys	7m7.084s
      lpk43:~# echo 1 >/proc/sys/net/core/fb_tunnels_only_for_init_net
      lpk43:~# time ./add_del_unshare.sh
      
      real	0m4.761s
      user	0m0.851s
      sys	1m8.343s
      lpk43:~#
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79134e6c
  18. 08 3月, 2018 1 次提交
    • P
      net: unpollute priv_flags space · 1ec54cb4
      Paolo Abeni 提交于
      the ipvlan device driver defines and uses 2 bits inside the priv_flags
      net_device field. Such bits and the related helper are used only
      inside the ipvlan device driver, and the core networking does not
      need to be aware of them.
      
      This change moves netif_is_ipvlan* helper in the ipvlan driver and
      re-implement them looking for ipvlan specific symbols instead of
      using priv_flags.
      
      Overall this frees two bits inside priv_flags - and move the following
      ones to avoid gaps - without any intended functional change.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ec54cb4
  19. 15 2月, 2018 3 次提交
  20. 01 2月, 2018 1 次提交
    • D
      bpf: fix null pointer deref in bpf_prog_test_run_xdp · 65073a67
      Daniel Borkmann 提交于
      syzkaller was able to generate the following XDP program ...
      
        (18) r0 = 0x0
        (61) r5 = *(u32 *)(r1 +12)
        (04) (u32) r0 += (u32) 0
        (95) exit
      
      ... and trigger a NULL pointer dereference in ___bpf_prog_run()
      via bpf_prog_test_run_xdp() where this was attempted to run.
      
      Reason is that recent xdp_rxq_info addition to XDP programs
      updated all drivers, but not bpf_prog_test_run_xdp(), where
      xdp_buff is set up. Thus when context rewriter does the deref
      on the netdev it's NULL at runtime. Fix it by using xdp_rxq
      from loopback dev. __netif_get_rx_queue() helper can also be
      reused in various other locations later on.
      
      Fixes: 02dd3291 ("bpf: finally expose xdp_rxq_info to XDP bpf-programs")
      Reported-by: syzbot+1eb094057b338eb1fc00@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      65073a67
  21. 30 1月, 2018 1 次提交
  22. 25 1月, 2018 2 次提交
  23. 24 1月, 2018 1 次提交
  24. 23 1月, 2018 1 次提交
  25. 18 1月, 2018 1 次提交
  26. 15 1月, 2018 1 次提交
    • J
      bpf: offload: add map offload infrastructure · a3884572
      Jakub Kicinski 提交于
      BPF map offload follow similar path to program offload.  At creation
      time users may specify ifindex of the device on which they want to
      create the map.  Map will be validated by the kernel's
      .map_alloc_check callback and device driver will be called for the
      actual allocation.  Map will have an empty set of operations
      associated with it (save for alloc and free callbacks).  The real
      device callbacks are kept in map->offload->dev_ops because they
      have slightly different signatures.  Map operations are called in
      process context so the driver may communicate with HW freely,
      msleep(), wait() etc.
      
      Map alloc and free callbacks are muxed via existing .ndo_bpf, and
      are always called with rtnl lock held.  Maps and programs are
      guaranteed to be destroyed before .ndo_uninit (i.e. before
      unregister_netdev() returns).  Map callbacks are invoked with
      bpf_devs_lock *read* locked, drivers must take care of exclusive
      locking if necessary.
      
      All offload-specific branches are marked with unlikely() (through
      bpf_map_is_dev_bound()), given that branch penalty will be
      negligible compared to IO anyway, and we don't want to penalize
      SW path unnecessarily.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      a3884572