1. 01 5月, 2014 1 次提交
    • V
      Revert "macvlan : fix checksums error when we are in bridge mode" · f114890c
      Vlad Yasevich 提交于
      This reverts commit 12a2856b.
      The commit above doesn't appear to be necessary any more as the
      checksums appear to be correctly computed/validated.
      
      Additionally the above commit breaks kvm configurations where
      one VM is using a device that support checksum offload (virtio) and
      the other VM does not.
      In this case, packets leaving virtio device will have CHECKSUM_PARTIAL
      set.  The packets is forwarded to a macvtap that has offload features
      turned off.  Since we use CHECKSUM_UNNECESSARY, the host does does not
      update the checksum and thus a bad checksum is passed up to
      the guest.
      
      CC: Daniel Lezcano <daniel.lezcano@free.fr>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Andrian Nord <nightnord@gmail.com>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Michael S. Tsirkin <mst@redhat.com>
      CC: Jason Wang <jasowang@redhat.com>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f114890c
  2. 15 3月, 2014 1 次提交
  3. 04 3月, 2014 1 次提交
    • V
      macvlan: Add support for 'always_on' offload features · 8b4703e9
      Vlad Yasevich 提交于
      Macvlan currently inherits all of its features from the lower
      device.  When lower device disables offload support, this causes
      macvlan to disable offload support as well.  This causes
      performance regression when using macvlan/macvtap in bridge
      mode.
      
      It can be easily demonstrated by creating 2 namespaces using
      macvlan in bridge mode and running netperf between them:
      
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    20.00    1204.61
      
      To restore the performance, we add software offload features
      to the list of "always_on" features for macvlan.  This way
      when a namespace or a guest using macvtap initially sends a
      packet, this packet will not be segmented at macvlan level.
      It will only be segmented when macvlan sends the packet
      to the lower device.
      
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    20.00    5507.35
      
      Fixes: 6acf54f1 (macvtap: Add support of packet capture on macvtap device.)
      Fixes: 797f87f8 (macvlan: fix netdev feature propagation from lower device)
      CC: Florian Westphal <fw@strlen.de>
      CC: Christian Borntraeger <borntraeger@de.ibm.com>
      CC: Jason Wang <jasowang@redhat.com>
      CC: Michael S. Tsirkin <mst@redhat.com>
      Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b4703e9
  4. 15 2月, 2014 1 次提交
  5. 14 2月, 2014 1 次提交
  6. 11 1月, 2014 2 次提交
    • J
      net: core: explicitly select a txq before doing l2 forwarding · f663dd9a
      Jason Wang 提交于
      Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
      will cause several issues:
      
      - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
        instead of lower device which misses the necessary txq synchronization for
        lower device such as txq stopping or frozen required by dev watchdog or
        control path.
      - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
        watchdog.
      - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
        when tso is disabled for lower device.
      
      Fix this by explicitly introducing a new param for .ndo_select_queue() for just
      selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
      extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
      forwarding transmission.
      
      With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
      to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
      a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
      dev_queue_xmit() to do the transmission.
      
      In the future, it was also required for macvtap l2 forwarding support since it
      provides a necessary synchronization method.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: e1000-devel@lists.sourceforge.net
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f663dd9a
    • J
      macvlan: forbid L2 fowarding offload for macvtap · b13ba1b8
      Jason Wang 提交于
      L2 fowarding offload will bypass the rx handler of real device. This will make
      the packet could not be forwarded to macvtap device. Another problem is the
      dev_hard_start_xmit() called for macvtap does not have any synchronization.
      
      Fix this by forbidding L2 forwarding for macvtap.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b13ba1b8
  7. 05 1月, 2014 1 次提交
  8. 28 12月, 2013 1 次提交
  9. 27 12月, 2013 1 次提交
    • F
      macvlan: fix netdev feature propagation from lower device · 797f87f8
      Florian Westphal 提交于
      There are inconsistencies wrt. feature propagation/inheritance between
      macvlan and the underlying interface.
      
      When a feature is turned off on the real device before a macvlan is
      created on top, these will remain enabled on the macvlan device, whereas
      turning off the feature on the lower device after macvlan creation the
      kernel will propagate the changes to the macvlan.
      
      The second issue is that, when propagating changes from underlying device
      to the macvlan interface, macvlan can erronously lose its NETIF_F_LLTX flag,
      as features are anded with the underlying device.
      
      However, LLTX should be kept since it has no dependencies on physical
      hardware (LLTX is set on macvlan creation regardless of the lower
      device properties, see 8ffab51b
      (macvlan: lockless tx path).
      
      The LLTX flag is now forced regardless of user settings in absence of
      layer2 hw acceleration (a6cc0cfa,
      net: Add layer 2 hardware acceleration operations for macvlan devices).
      
      Use netdev_increment_features to rebuild the feature set on capability
      changes on either the lower device or on the macvlan interface.
      
      As pointed out by Ben Hutchings, use netdev_update_features on
      NETDEV_FEAT_CHANGE event (it calls macvlan_fix_features/netdev_features_change
      if needed).
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      797f87f8
  10. 13 12月, 2013 1 次提交
  11. 06 12月, 2013 1 次提交
    • K
      macvlan: Support creating macvtaps from macvlans · d70f2cf5
      Kevin Wallace 提交于
      When running in a network namespace whose only link to the outside
      world is a macvlan device, not being able to create a macvtap off of
      it is a real pain.
      
      So modify macvtap creation to automatically forward a creation of a
      macvtap on a macvlan to become a creation of a macvtap on the
      underlying network device, just like is currently done with
      macvlan-on-macvlan devices.
      
      v2: Use netif_is_macvlan and macvlan_dev_real_dev helpers to make it
          more clear what we're doing.
      Signed-off-by: NKevin Wallace <kevin@pentabarf.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d70f2cf5
  12. 08 11月, 2013 1 次提交
  13. 06 11月, 2013 1 次提交
    • J
      net: Explicitly initialize u64_stats_sync structures for lockdep · 827da44c
      John Stultz 提交于
      In order to enable lockdep on seqcount/seqlock structures, we
      must explicitly initialize any locks.
      
      The u64_stats_sync structure, uses a seqcount, and thus we need
      to introduce a u64_stats_init() function and use it to initialize
      the structure.
      
      This unfortunately adds a lot of fairly trivial initialization code
      to a number of drivers. But the benefit of ensuring correctness makes
      this worth while.
      
      Because these changes are required for lockdep to be enabled, and the
      changes are quite trivial, I've not yet split this patch out into 30-some
      separate patches, as I figured it would be better to get the various
      maintainers thoughts on how to best merge this change along with
      the seqcount lockdep enablement.
      
      Feedback would be appreciated!
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Mirko Lindner <mlindner@marvell.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Roger Luethi <rl@hellgate.ch>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Simon Horman <horms@verge.net.au>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Wensong Zhang <wensong@linux-vs.org>
      Cc: netdev@vger.kernel.org
      Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      827da44c
  14. 23 10月, 2013 1 次提交
    • J
      macvlan: resolve ENOENT errors on creation · 47d4ab91
      John Fastabend 提交于
      After the commit below attempting to create macvlan devices was
      resulting in ENOENT errors,
      
      # ip link add link p3p2 type macvlan
      RTNETLINK answers: Invalid argument
      
      This happens because netdev_upper_dev_link() is called before
      register_netdevice() in the macvlan code. Through a call chain
      this results in a call to __netdev_adjacent_dev_insert() and
      finally a sysfs_create_link(). This requires the kobject of
      the macvlan to be registered which is done in register_netdevice().
      If there is no kobject which is the case here the ENOENT error
      is seen on the command line.
      
      To resolve this move the netdev_upper_dev_link() call below
      the register_netdevice() call. This aligns with vlan driver
      flow.
      
      Regression introduced here,
      
      commit 5831d66e
      Author: Veaceslav Falico <vfalico@redhat.com>
      Date:   Wed Sep 25 09:20:32 2013 +0200
      
          net: create sysfs symlinks for neighbour devices
      
      CC: Veaceslav Falico <vfalico@redhat.com>
      CC: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NVeaceslav Falico <vfalico@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47d4ab91
  15. 12 9月, 2013 1 次提交
  16. 04 9月, 2013 1 次提交
  17. 31 8月, 2013 1 次提交
  18. 06 8月, 2013 1 次提交
    • M
      macvlan: validate flags · 15127478
      Michael S. Tsirkin 提交于
      commit df8ef8f3
          macvlan: add FDB bridge ops and macvlan flags
      added a flags field to macvlan, which can be
      controlled from userspace.
      The idea is to make the interface future-proof
      so we can add flags and not new fields.
      
      However, flags value isn't validated, as a result,
      userspace can't detect which flags are supported.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15127478
  19. 02 8月, 2013 2 次提交
  20. 24 7月, 2013 1 次提交
  21. 26 6月, 2013 1 次提交
  22. 13 6月, 2013 1 次提交
    • M
      macvlan: don't touch promisc without passthrough · 99ffc3e7
      Michael S. Tsirkin 提交于
      commit df8ef8f3
      "macvlan: add FDB bridge ops and macvlan flags"
      added a way to control NOPROMISC macvlan flag through netlink.
      
      However, with a non passthrough device we never set promisc on open,
      even if NOPROMISC is off.  As a result:
      
      If userspace clears NOPROMISC on open, then does not clear it on a
      netlink command, promisc counter is not decremented on stop and there
      will be no way to clear it once macvlan is detached.
      
      If userspace does not clear NOPROMISC on open, then sets NOPROMISC on a
      netlink command, promisc counter will be decremented from 0 and overflow
      to fffffffff with no way to clear promisc.
      
      To fix, simply ignore NOPROMISC flag in a netlink command for
      non-passthrough devices, same as we do at open/close.
      
      Since we touch this code anyway - check dev_set_promiscuity return code
      and pass it to users (though an error here is unlikely).
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Reviewed-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99ffc3e7
  23. 29 5月, 2013 1 次提交
  24. 12 5月, 2013 1 次提交
    • J
      macvlan: fix passthru mode race between dev removal and rx path · 233c7df0
      Jiri Pirko 提交于
      Currently, if macvlan in passthru mode is created and data are rxed and
      you remove this device, following panic happens:
      
      NULL pointer dereference at 0000000000000198
      IP: [<ffffffffa0196058>] macvlan_handle_frame+0x153/0x1f7 [macvlan]
      
      I'm using following script to trigger this:
      <script>
      while [ 1 ]
      do
      	ip link add link e1 name macvtap0 type macvtap mode passthru
      	ip link set e1 up
      	ip link set macvtap0 up
      	IFINDEX=`ip link |grep macvtap0 | cut -f 1 -d ':'`
      	cat /dev/tap$IFINDEX  >/dev/null &
      	ip link del dev macvtap0
      done
      </script>
      
      I run this script while "ping -f" is running on another machine to send
      packets to e1 rx.
      
      Reason of the panic is that list_first_entry() is blindly called in
      macvlan_handle_frame() even if the list was empty. vlan is set to
      incorrect pointer which leads to the crash.
      
      I'm fixing this by protecting port->vlans list by rcu and by preventing
      from getting incorrect pointer in case the list is empty.
      
      Introduced by: commit eb06acdc "macvlan: Introduce 'passthru' mode to takeover the underlying device"
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      233c7df0
  25. 20 4月, 2013 3 次提交
  26. 31 3月, 2013 1 次提交
    • E
      macvlan: use the right RCU api · e052f7e6
      Eric Dumazet 提交于
      Make sure we use proper API to fetch dev->rx_handler_data,
      instead of ugly casts.
      
      Rename macvlan_port_get() to macvlan_port_get_rtnl() to document fact
      that we hold RTNL when needed, with lockdep support.
      
      This removes sparse warnings as well (CONFIG_SPARSE_RCU_POINTER=y)
      
      CHECK   drivers/net/macvlan.c
      drivers/net/macvlan.c:706:37: warning: cast removes address space of expression
      drivers/net/macvlan.c:775:16: warning: cast removes address space of expression
      drivers/net/macvlan.c:924:16: warning: cast removes address space of expression
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e052f7e6
  27. 08 3月, 2013 1 次提交
  28. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  29. 14 2月, 2013 1 次提交
  30. 09 2月, 2013 2 次提交
  31. 07 2月, 2013 1 次提交
  32. 18 1月, 2013 1 次提交
  33. 07 1月, 2013 1 次提交
  34. 05 1月, 2013 1 次提交
  35. 04 1月, 2013 1 次提交