1. 03 12月, 2017 1 次提交
    • P
      openvswitch: do not propagate headroom updates to internal port · 183dea58
      Paolo Abeni 提交于
      After commit 3a927bc7 ("ovs: propagate per dp max headroom to
      all vports") the need_headroom for the internal vport is updated
      accordingly to the max needed headroom in its datapath.
      
      That avoids the pskb_expand_head() costs when sending/forwarding
      packets towards tunnel devices, at least for some scenarios.
      
      We still require such copy when using the ovs-preferred configuration
      for vxlan tunnels:
      
          br_int
        /       \
      tap      vxlan
                 (remote_ip:X)
      
      br_phy
           \
          NIC
      
      where the route towards the IP 'X' is via 'br_phy'.
      
      When forwarding traffic from the tap towards the vxlan device, we
      will call pskb_expand_head() in vxlan_build_skb() because
      br-phy->needed_headroom is equal to tun->needed_headroom.
      
      With this change we avoid updating the internal vport needed_headroom,
      so that in the above scenario no head copy is needed, giving 5%
      performance improvement in UDP throughput test.
      
      As a trade-off, packets sent from the internal port towards a tunnel
      device will now experience the head copy overhead. The rationale is
      that the latter use-case is less relevant performance-wise.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      183dea58
  2. 08 6月, 2017 1 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
  3. 16 2月, 2017 1 次提交
  4. 09 1月, 2017 1 次提交
  5. 21 10月, 2016 1 次提交
    • J
      net: use core MTU range checking in core net infra · 91572088
      Jarod Wilson 提交于
      geneve:
      - Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
      - This one isn't quite as straight-forward as others, could use some
        closer inspection and testing
      
      macvlan:
      - set min/max_mtu
      
      tun:
      - set min/max_mtu, remove tun_net_change_mtu
      
      vxlan:
      - Merge __vxlan_change_mtu back into vxlan_change_mtu
      - Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
        change_mtu function
      - This one is also not as straight-forward and could use closer inspection
        and testing from vxlan folks
      
      bridge:
      - set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
        change_mtu function
      
      openvswitch:
      - set min/max_mtu, remove internal_dev_change_mtu
      - note: max_mtu wasn't checked previously, it's been set to 65535, which
        is the largest possible size supported
      
      sch_teql:
      - set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)
      
      macsec:
      - min_mtu = 0, max_mtu = 65535
      
      macvlan:
      - min_mtu = 0, max_mtu = 65535
      
      ntb_netdev:
      - min_mtu = 0, max_mtu = 65535
      
      veth:
      - min_mtu = 68, max_mtu = 65535
      
      8021q:
      - min_mtu = 0, max_mtu = 65535
      
      CC: netdev@vger.kernel.org
      CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      CC: Tom Herbert <tom@herbertland.com>
      CC: Daniel Borkmann <daniel@iogearbox.net>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Paolo Abeni <pabeni@redhat.com>
      CC: Jiri Benc <jbenc@redhat.com>
      CC: WANG Cong <xiyou.wangcong@gmail.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      CC: Pravin B Shelar <pshelar@ovn.org>
      CC: Sabrina Dubroca <sd@queasysnail.net>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: Pravin Shelar <pshelar@nicira.com>
      CC: Maxim Krasnyansky <maxk@qti.qualcomm.com>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91572088
  6. 13 10月, 2016 1 次提交
  7. 06 8月, 2016 1 次提交
    • I
      OVS: Ignore negative headroom value · 5ef9f289
      Ian Wienand 提交于
      net_device->ndo_set_rx_headroom (introduced in
      871b642a) says
      
        "Setting a negtaive value reset the rx headroom
         to the default value".
      
      It seems that the OVS implementation in
      3a927bc7 overlooked this and sets
      dev->needed_headroom unconditionally.
      
      This doesn't have an immediate effect, but can mess up later
      LL_RESERVED_SPACE calculations, such as done in
      net/ipv6/mcast.c:mld_newpack.  For reference, this issue was found
      from a skb_panic raised there after the length calculations had given
      the wrong result.
      
      Note the other current users of this interface
      (drivers/net/tun.c:tun_set_headroom and
      drivers/net/veth.c:veth_set_rx_headroom) are both checking this
      correctly thus need no modification.
      
      Thanks to Ben for some pointers from the crash dumps!
      
      Cc: Benjamin Poirier <bpoirier@suse.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414Signed-off-by: NIan Wienand <iwienand@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ef9f289
  8. 03 6月, 2016 1 次提交
  9. 17 4月, 2016 1 次提交
  10. 19 3月, 2016 1 次提交
  11. 02 3月, 2016 1 次提交
  12. 22 10月, 2015 2 次提交
    • P
      openvswitch: Use dev_queue_xmit for vport send. · aec15924
      Pravin B Shelar 提交于
      With use of lwtunnel, we can directly call dev_queue_xmit()
      rather than calling netdev vport send operation.
      Following change make tunnel vport code bit cleaner.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aec15924
    • J
      openvswitch: Allocate memory for ovs internal device stats. · 1241365f
      James Morse 提交于
      "openvswitch: Remove vport stats" removed the per-vport statistics, in
      order to use the netdev's statistics fields.
      "openvswitch: Fix ovs_vport_get_stats()" fixed the export of these stats
      to user-space, by using the provided netdev_ops to collate them - but ovs
      internal devices still use an unallocated dev->tstats field to count
      packets, which are no longer exported by this api.
      
      Allocate the dev->tstats field for ovs internal devices, and wire up
      ndo_get_stats64 with the original implementation of
      ovs_vport_get_stats().
      
      On its own, "openvswitch: Fix ovs_vport_get_stats()" fixes the OOPs,
      unmasking a full-on panic on arm64:
      
      =============%<==============
      [<ffffffbffc00ce4c>] internal_dev_recv+0xa8/0x170 [openvswitch]
      [<ffffffbffc0008b4>] do_output.isra.31+0x60/0x19c [openvswitch]
      [<ffffffbffc000bf8>] do_execute_actions+0x208/0x11c0 [openvswitch]
      [<ffffffbffc001c78>] ovs_execute_actions+0xc8/0x238 [openvswitch]
      [<ffffffbffc003dfc>] ovs_packet_cmd_execute+0x21c/0x288 [openvswitch]
      [<ffffffc0005e8c5c>] genl_family_rcv_msg+0x1b0/0x310
      [<ffffffc0005e8e60>] genl_rcv_msg+0xa4/0xe4
      [<ffffffc0005e7ddc>] netlink_rcv_skb+0xb0/0xdc
      [<ffffffc0005e8a94>] genl_rcv+0x38/0x50
      [<ffffffc0005e76c0>] netlink_unicast+0x164/0x210
      [<ffffffc0005e7b70>] netlink_sendmsg+0x304/0x368
      [<ffffffc0005a21c0>] sock_sendmsg+0x30/0x4c
      [SNIP]
      Kernel panic - not syncing: Fatal exception in interrupt
      =============%<==============
      
      Fixes: 8c876639 ("openvswitch: Remove vport stats.")
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1241365f
  13. 30 8月, 2015 1 次提交
  14. 28 8月, 2015 1 次提交
  15. 22 7月, 2015 2 次提交
  16. 06 11月, 2014 1 次提交
  17. 29 10月, 2014 1 次提交
  18. 24 7月, 2014 1 次提交
  19. 16 7月, 2014 1 次提交
    • T
      net: set name_assign_type in alloc_netdev() · c835a677
      Tom Gundersen 提交于
      Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
      all users to pass NET_NAME_UNKNOWN.
      
      Coccinelle patch:
      
      @@
      expression sizeof_priv, name, setup, txqs, rxqs, count;
      @@
      
      (
      -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
      +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
      |
      -alloc_netdev_mq(sizeof_priv, name, setup, count)
      +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
      |
      -alloc_netdev(sizeof_priv, name, setup)
      +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
      )
      
      v9: move comments here from the wrong commit
      Signed-off-by: NTom Gundersen <teg@jklm.no>
      Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c835a677
  20. 02 7月, 2014 1 次提交
  21. 14 5月, 2014 1 次提交
  22. 02 11月, 2013 1 次提交
  23. 20 6月, 2013 1 次提交
  24. 15 6月, 2013 1 次提交
  25. 30 4月, 2013 1 次提交
  26. 20 4月, 2013 1 次提交
  27. 16 4月, 2013 1 次提交
  28. 07 1月, 2013 1 次提交
  29. 04 1月, 2013 1 次提交
  30. 05 12月, 2012 1 次提交
  31. 23 8月, 2012 1 次提交
  32. 26 5月, 2012 1 次提交
    • J
      openvswitch: Reset upper layer protocol info on internal devices. · 7fe99e2d
      Jesse Gross 提交于
      It's possible that packets that are sent on internal devices (from
      the OVS perspective) have already traversed the local IP stack.
      After they go through the internal device, they will again travel
      through the IP stack which may get confused by the presence of
      existing information in the skb. The problem can be observed
      when switching between namespaces. This clears out that information
      to avoid problems but deliberately leaves other metadata alone.
      This is to provide maximum flexibility in chaining together OVS
      and other Linux components.
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      7fe99e2d
  33. 04 5月, 2012 1 次提交
  34. 16 2月, 2012 1 次提交
  35. 17 1月, 2012 1 次提交
  36. 04 12月, 2011 1 次提交
    • J
      net: Add Open vSwitch kernel components. · ccb1352e
      Jesse Gross 提交于
      Open vSwitch is a multilayer Ethernet switch targeted at virtualized
      environments.  In addition to supporting a variety of features
      expected in a traditional hardware switch, it enables fine-grained
      programmatic extension and flow-based control of the network.
      This control is useful in a wide variety of applications but is
      particularly important in multi-server virtualization deployments,
      which are often characterized by highly dynamic endpoints and the need
      to maintain logical abstractions for multiple tenants.
      
      The Open vSwitch datapath provides an in-kernel fast path for packet
      forwarding.  It is complemented by a userspace daemon, ovs-vswitchd,
      which is able to accept configuration from a variety of sources and
      translate it into packet processing rules.
      
      See http://openvswitch.org for more information and userspace
      utilities.
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      ccb1352e