1. 05 6月, 2014 23 次提交
    • T
      vxlan: Add support for UDP checksums (v4 sending, v6 zero csums) · 359a0ea9
      Tom Herbert 提交于
      Added VXLAN link configuration for sending UDP checksums, and allowing
      TX and RX of UDP6 checksums.
      
      Also, call common iptunnel_handle_offloads and added GSO support for
      checksums.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      359a0ea9
    • T
      gre: Call gso_make_checksum · 4749c09c
      Tom Herbert 提交于
      Call gso_make_checksum. This should have the benefit of using a
      checksum that may have been previously computed for the packet.
      
      This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that
      offload GRE GSO with and without the GRE checksum offloaed.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4749c09c
    • T
      net: Add GSO support for UDP tunnels with checksum · 0f4f4ffa
      Tom Herbert 提交于
      Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates
      that a device is capable of computing the UDP checksum in the
      encapsulating header of a UDP tunnel.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f4f4ffa
    • T
      tcp: Call gso_make_checksum · e9c3a24b
      Tom Herbert 提交于
      Call common gso_make_checksum when calculating checksum for a
      TCP GSO segment.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9c3a24b
    • T
      net: Support for multiple checksums with gso · 7e2b10c1
      Tom Herbert 提交于
      When creating a GSO packet segment we may need to set more than
      one checksum in the packet (for instance a TCP checksum and
      UDP checksum for VXLAN encapsulation). To be efficient, we want
      to do checksum calculation for any part of the packet at most once.
      
      This patch adds csum_start offset to skb_gso_cb. This tracks the
      starting offset for skb->csum which is initially set in skb_segment.
      When a protocol needs to compute a transport checksum it calls
      gso_make_checksum which computes the checksum value from the start
      of transport header to csum_start and then adds in skb->csum to get
      the full checksum. skb->csum and csum_start are then updated to reflect
      the checksum of the resultant packet starting from the transport header.
      
      This patch also adds a flag to skbuff, encap_hdr_csum, which is set
      in *gso_segment fucntions to indicate that a tunnel protocol needs
      checksum calculation
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e2b10c1
    • T
      l2tp: call udp{6}_set_csum · 77157e19
      Tom Herbert 提交于
      Call common functions to set checksum for UDP tunnel.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77157e19
    • T
      udp: Generic functions to set checksum · af5fcba7
      Tom Herbert 提交于
      Added udp_set_csum and udp6_set_csum functions to set UDP checksums
      in packets. These are for simple UDP packets such as those that might
      be created in UDP tunnels.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af5fcba7
    • D
      Merge branch 'bonding-macvlan' · 6579867c
      David S. Miller 提交于
      Vlad Yasevich says:
      
      ====================
      Fix support for macvlan devices on top bonding
      
      Currently, macvlan devices do not work well over bond interfaces.
      Everything works well, untill a failover is triggered in the bond
      device and then macvlan becomes unreachble untill arp entries
      are flushed.   This series adds needed functionality to
      handle correct notifications and update switches with mac addresses
      assigned to macvlans.
      
      The first patch simply addes IFF_UNICAST_FLT flag to bonds since they
      already correctly manage the unicast filter list of the slaves, so
      we might as well prevent the bond from needlessly going into promiscuous
      mode.
      
      The second patch adds notifier handler to macvlan to trigger correct
      ARP notifications.
      
      The third patch adds handling for TLB and RLB modes that use special
      ETH_P_LOOPBACK type packets to teach switch about mac addresses.
      It also allow ARPs for the macvlan mac addresses to be handled by
      RLB mode.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6579867c
    • V
      bonding: Support macvlans on top of tlb/rlb mode bonds · 14af9963
      Vlad Yasevich 提交于
      To make TLB mode work, the patch allows learning packets
      to be sent using mac addresses assigned to macvlan devices,
      also taking into an account vlans that may be between the
      bond and macvlan device.
      
      To make RLB work, all we have to do is accept ARP packets
      for addresses added to the bond dev->uc list.  Since RLB
      mode will take care to update the peers directly with
      correct mac addresses, learning packets for these addresses
      do not have be send to switch.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14af9963
    • V
      macvlan: Support bonding events · 4c991255
      Vlad Yasevich 提交于
      Bonding and team drivers generate specific events during failover
      that trigger switch updates.  When a macvlan device is configured
      on top of bonding, we want switches to learn about the macvlan
      devices as well.   This patch adds a handler to macvlan driver to
      propagate these events to all macvlan devices.  We let the generic
      inetdev event handler do the work.
      
      This allows macvlan to operated correctly over active-backup
      mode bond.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c991255
    • V
      bonding: Turn on IFF_UNICAST_FLT on bond devices · c565b488
      Vlad Yasevich 提交于
      Bonding devices manage the unicast filters of the underlying
      interfaces, but do not turn on IFF_UNICAST_FLT flag.  Thus
      anytime a unicast address is added to the bond, the bond is
      places in promiscuous mode.
      
      Turn on IFF_UNICAST_FLT on the bond device so that the bond does
      not go into promiscuous mode needlesly.  If an underlying device
      does not support unicast filtering, that device will automaticall
      enter promiscuous mode already.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c565b488
    • S
      net: Revert "fib_trie: use seq_file_net rather than seq->private" · f830b022
      Sasha Levin 提交于
      This reverts commit 30f38d2f.
      
      fib_triestat is surrounded by a big lie: while it claims that it's a
      seq_file (fib_triestat_seq_open, fib_triestat_seq_show), it isn't:
      
      	static const struct file_operations fib_triestat_fops = {
      	        .owner  = THIS_MODULE,
      	        .open   = fib_triestat_seq_open,
      	        .read   = seq_read,
      	        .llseek = seq_lseek,
      	        .release = single_release_net,
      	};
      
      Yes, fib_triestat is just a regular file.
      
      A small detail (assuming CONFIG_NET_NS=y) is that while for seq_files
      you could do seq_file_net() to get the net ptr, doing so for a regular
      file would be wrong and would dereference an invalid pointer.
      
      The fib_triestat lie claimed a victim, and trying to show the file would
      be bad for the kernel. This patch just reverts the issue and fixes
      fib_triestat, which still needs a rewrite to either be a seq_file or
      stop claiming it is.
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f830b022
    • A
      trivial: drivers/net/ethernet/nvidia/forcedeth.c: fix typo s/SUBSTRACT1/SUBTRACT1/ · cef33c81
      Antonio Ospite 提交于
      Signed-off-by: NAntonio Ospite <ao2@ao2.it>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexander Gordeev <agordeev@redhat.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cef33c81
    • X
      gianfar: Fix the section mismatch warnings. · 898157ed
      Xiubo Li 提交于
      Building with CONFIG_DEBUG_SECTION_MISMATCH enabled, the following
      WARNING is occured:
      
        LD      drivers/net/built-in.o
      WARNING: drivers/net/built-in.o(.text+0xcd4c): Section mismatch in
      reference from the function gfar_probe() to the function
      .init.text:gfar_init_addr_hash_table()
      The function gfar_probe() references
      the function __init gfar_init_addr_hash_table().
      This is often because gfar_probe lacks a __init
      annotation or the annotation of gfar_init_addr_hash_table is wrong.
      Signed-off-by: NXiubo Li <Li.Xiubo@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      898157ed
    • D
      Merge branch 'xen-netback-netfront-multiqueue' · 9ab89acc
      David S. Miller 提交于
      Wei Liu says:
      
      ====================
      This is rebased version of Andrew's V8 patch series. The original cover letter:
      
      --------------------
      xen-net{back,	front}: Multiple transmit and receive queues
      
      This patch series implements multiple transmit and receive queues (i.e.
      multiple shared rings) for the xen virtual network interfaces.
      
      The series is split up as follows:
       - Patch 1 brings the 'grant_copy_op' array back into struct xenvif, in
         preparation for multi-queue support. See the patch itself for more details.
      - Patches 2 and 4 factor out the queue-specific data for netback and
        netfront respectively, and modify the rest of the code to use these
        as appropriate.
      - Patches 3 and 5 introduce new XenStore keys to negotiate and use
        multiple shared rings and event channels, and code to connect these
        as appropriate.
      - Patch 6 documents the XenStore keys required for the new feature
        in include/xen/interface/io/netif.h
      
      All other transmit and receive processing remains unchanged, i.e. there
      is a kthread per queue and a NAPI context per queue.
      
      The performance of these patches has been analysed in detail, with
      results available at:
      
      http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing
      
      To summarise:
        * Using multiple queues allows a VM to transmit at line rate on a 10
          Gbit/s NIC, compared with a maximum aggregate throughput of 6 Gbit/s
          with a single queue.
        * For intra-host VM--VM traffic, eight queues provide 171% of the
          throughput of a single queue; almost 12 Gbit/s instead of 6 Gbit/s.
        * There is a corresponding increase in total CPU usage, i.e. this is a
          scaling out over available resources, not an efficiency improvement.
        * Results depend on the availability of sufficient CPUs, as well as the
          distribution of interrupts and the distribution of TCP streams across
          the queues.
      
      Queue selection is currently achieved via an L4 hash on the packet (i.e.
      TCP src/dst port, IP src/dst address) and is not negotiated between the
      frontend and backend, since only one option exists. Future patches to
      support other frontends (particularly Windows) will need to add some
      capability to negotiate not only the hash algorithm selection, but also
      allow the frontend to specify some parameters to this.
      
      Note that queue selection is a decision by the transmitting system about
      which queue to use for a particular packet. In general, the algorithm
      may differ between the frontend and the backend with no adverse effects.
      
      Queue-specific XenStore entries for ring references and event channels
      are stored hierarchically, i.e. under .../queue-N/... where N varies
      from 0 to one less than the requested number of queues (inclusive). If
      only one queue is requested, it falls back to the flat structure where
      the ring references and event channels are written at the same level as
      other vif information.
      
      V8:
      - Squash the queue error handling code into patch 3.
      - Update the documentation (patch 6) according to comments on the
        equivalent patch to Xen.
      
      V7:
      - Rebase on latest net-next, which includes the netback grant mapping
        patch series from Zoltan Kiss
      - Reduce QUEUE_NAME_SIZE by 1 to avoid double-counting the trailing '\0'
      - Simplify the queue hashing by using (hash % num_queues) instead of
        multiply & shift.
      - Add ratelimited warning for invalid queue selection.
      - Fix error handling to correctly tear down already setup queues.
      - Use dev->real_num_tx_queues instead of separately maintaining a
        count of the number of queues.
      
      V6:
      - Use 'max_queues' as the module param. name for both netback and netfront.
      
      V5:
      - Fix bug in xenvif_free() that could lead to an attempt to transmit an
        skb after the queue structures had been freed.
      - Improve the XenStore protocol documentation in netif.h.
      - Fix IRQ_NAME_SIZE double-accounting for null terminator.
      - Move rx_gso_checksum_fixup stat into struct xenvif_stats (per-queue).
      - Don't initialise a local variable that is set in both branches (xspath).
      
      V4:
      - Add MODULE_PARM_DESC() for the multi-queue parameters for netback
        and netfront modules.
      - Move del_timer_sync() in netfront to after unregister_netdev, which
        restores the order in which these functions were called before applying
        these patches.
      
      V3:
      - Further indentation and style fixups.
      
      V2:
      - Rebase onto net-next.
      - Change queue->number to queue->id.
      - Add atomic operations around the small number of stats variables that
        are not queue-specific or per-cpu.
      - Fixup formatting and style issues.
      - XenStore protocol changes documented in netif.h.
      - Default max. number of queues to num_online_cpus().
      - Check requested number of queues does not exceed maximum.
      --------------------
      
      I rebased this on top of net-next. No functional change is introduced.  The
      patch that needed some extra care was "xen-netback: Factor queue-specific data
      into queue struct" because it clashed with a fix introduced in net. A simple
      test of creating guest, iperf, then shutting down guest worked as expected.
      
      The last patch fixes a minor problem that queue name is not initialised in
      xen-netfront, resulting in names like "-tx" "-rx" in /proc/interrupt.
      
      Changes since v9 (no functional change introduced):
      * include commit summary in the commit message of first patch
      * fold David Vrabel's Reviewed-by into last patch
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ab89acc
    • W
    • A
      xen-net{back, front}: Document multi-queue feature in netif.h · a2deb8b1
      Andrew J. Bennieston 提交于
      Document the multi-queue feature in terms of XenStore keys to be written
      by the backend and by the frontend.
      Signed-off-by: NAndrew J. Bennieston <andrew.bennieston@citrix.com>
      Acked-by: NWei Liu <wei.liu2@citrix.com>
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2deb8b1
    • A
      xen-netfront: Add support for multiple queues · 50ee6061
      Andrew J. Bennieston 提交于
      Build on the refactoring of the previous patch to implement multiple
      queues between xen-netfront and xen-netback.
      
      Check XenStore for multi-queue support, and set up the rings and event
      channels accordingly.
      
      Write ring references and event channels to XenStore in a queue
      hierarchy if appropriate, or flat when using only one queue.
      
      Update the xennet_select_queue() function to choose the queue on which
      to transmit a packet based on the skb hash result.
      Signed-off-by: NAndrew J. Bennieston <andrew.bennieston@citrix.com>
      Acked-by: NWei Liu <wei.liu2@citrix.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50ee6061
    • A
      xen-netfront: Factor queue-specific data into queue struct. · 2688fcb7
      Andrew J. Bennieston 提交于
      In preparation for multi-queue support in xen-netfront, move the
      queue-specific data from struct netfront_info to struct netfront_queue,
      and update the rest of the code to use this.
      
      Also adds loops over queues where appropriate, even though only one is
      configured at this point, and uses alloc_etherdev_mq() and the
      corresponding multi-queue netif wake/start/stop functions in preparation
      for multiple active queues.
      
      Finally, implements a trivial queue selection function suitable for
      ndo_select_queue, which simply returns 0, selecting the first (and
      only) queue.
      Signed-off-by: NAndrew J. Bennieston <andrew.bennieston@citrix.com>
      Acked-by: NWei Liu <wei.liu2@citrix.com>
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2688fcb7
    • A
      xen-netback: Add support for multiple queues · 8d3d53b3
      Andrew J. Bennieston 提交于
      Builds on the refactoring of the previous patch to implement multiple
      queues between xen-netfront and xen-netback.
      
      Writes the maximum supported number of queues into XenStore, and reads
      the values written by the frontend to determine how many queues to use.
      
      Ring references and event channels are read from XenStore on a per-queue
      basis and rings are connected accordingly.
      
      Also adds code to handle the cleanup of any already initialised queues
      if the initialisation of a subsequent queue fails.
      Signed-off-by: NAndrew J. Bennieston <andrew.bennieston@citrix.com>
      Acked-by: NWei Liu <wei.liu2@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d3d53b3
    • W
      xen-netback: Factor queue-specific data into queue struct · e9ce7cb6
      Wei Liu 提交于
      In preparation for multi-queue support in xen-netback, move the
      queue-specific data from struct xenvif into struct xenvif_queue, and
      update the rest of the code to use this.
      
      Also adds loops over queues where appropriate, even though only one is
      configured at this point, and uses alloc_netdev_mq() and the
      corresponding multi-queue netif wake/start/stop functions in preparation
      for multiple active queues.
      
      Finally, implements a trivial queue selection function suitable for
      ndo_select_queue, which simply returns 0 for a single queue and uses
      skb_get_hash() to compute the queue index otherwise.
      Signed-off-by: NAndrew J. Bennieston <andrew.bennieston@citrix.com>
      Signed-off-by: NWei Liu <wei.liu2@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9ce7cb6
    • A
      xen-netback: Move grant_copy_op array back into struct xenvif. · a55d9766
      Andrew J. Bennieston 提交于
      This array was allocated separately in commit ac3d5ac2 ("xen-netback:
      fix guest-receive-side array sizes") due to it being very large, and a
      struct xenvif is allocated as the netdev_priv part of a struct
      net_device, i.e. via kmalloc() but falling back to vmalloc() if the
      initial alloc. fails.
      
      In preparation for the multi-queue patches, where this array becomes
      part of struct xenvif_queue and is always allocated through vzalloc(),
      move this back into the struct xenvif.
      Signed-off-by: NAndrew J. Bennieston <andrew.bennieston@citrix.com>
      Acked-by: NWei Liu <wei.liu2@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a55d9766
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 9bcc14d2
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates
      
      This series contains updates to e1000, igb and ixgbe.
      
      Emil provides his version 2 fix for the detection of SFP+ capable interfaces.
      In cases where the driver is loaded while there are no SFP+ modules in cage,
      the interface was not being detected as SFP capable.  Resolve the issue by
      identifying interfaces with no PHY type set as SFP capable which allows the
      driver to detect the SFP module when the interface is brought up.  In this
      version 2 of the patch, the 82599 specific check was removed since we only
      have 82598 devices that are SFP capable.
      
      Jacob removes the including of the export header in the ixgbe PTP core, since
      it is not needed.  Renames igb_ptp_enable() to igb_ptp_feature_enable() to
      better reflect the actual functions purpose.
      
      Todd fixes the ethtool loopback test for i354 backplane devices since we
      do not know what PHY is to be used for the devices, use MAC loopback for
      ethtool tests.  Todd also sets the packet buffer size register defaults for
      i210 devices.
      
      Yongjian Xu removes the check for skb->len being negative or zero since there
      is never a case where it would be zero or negative for e1000.
      
      Manuel Schölling updates e1000 to use the time_after() helper function.
      
      v2: Fix indentation on wrapped line in patch 3 of the series based on
          feedback from David Miller
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bcc14d2
  2. 04 6月, 2014 17 次提交