1. 19 8月, 2015 18 次提交
  2. 18 8月, 2015 22 次提交
    • D
      Merge branch 'Identifier-Locator-Addressing' · 0b233dc7
      David S. Miller 提交于
      Tom Herbert says:
      
      ====================
      net: Identifier Locator Addressing - Part I
      
      This patch set provides rudimentary support for Identifier Locator
      Addressing or ILA. The basic concept of ILA is that we split an IPv6
      address into a 64 bit locator and 64 bit identifier. The identifier is
      the identity of an entity in communication ("who"), and the locator
      expresses the location of the entity ("where"). Applications
      use externally visible address that contains the identifier.
      When a packet is actually sent, a translation is done that
      overwrites the first 64 bits of the address with a locator.
      The packet can then be forwarded over the network to the host where
      the addressed entity is located. At the receiver, the reverse
      translation is done so the that the application sees the original,
      untranslated address. Presumably an external control plane will
      provide identifier->locator mappings.
      
      v2:
        - Fix compilation erros when LWT not configured
        - Consolidate ILA into a single ila.c
      
      v3:
        - Change pseudohdr argument od inet_proto_csum_replace functions to
          be a bool
      
      v4:
        - In ila_build_state check locator being in netlink params before
          allocating tunnel state
      
      The data path for ILA is a simple NAT translation that only operates
      on the upper 64 bits of a destination address in IPv6 packets. The
      basic process is:
      
         1) Lookup 64 bit identifier (lower 64 bits of destination)
         2) If a match is found
            a) Overwrite locator (upper 64 bits of destination) with
               the new locator
            b) Adjust any checksum that has destination address included in
               pseudo header
         3) Send or receive packet
      
      ILA is a means to implement tunnels or network virtualization without
      encapsulation. Since there is no encapsulation involved, we assume that
      stateless support in the network for IPv6 (e.g. RSS, ECMP, TSO, etc.)
      just works. Also, since we're minimally changing the packet many of
      the worries about encapsulation (MTU, checksum, fragmentation) are
      not relevant. The downside is that, ILA is not extensible like other
      encapsulations (GUE for instance) so it might not be appropriate for
      all use cases. Also, this only makes sense to do in IPv6!
      
      A key aspect of ILA is performance. The intent is that ILA would be
      used in data centers in virtualizing tasks or jobs. In the fullest
      incarnation all intra data center communications might be targeted to
      virtual ILA addresses. This is basically adding a new virtualization
      capability to the existing services in a datacenter, so there is a
      strong expectation is that this does not degrade performance for
      existing applications.
      
      Performance seems to be dependent on how ILA is hooked into kernel.
      ILA can be implemented under some different models:
      
        - Mechanically it is a form a stateless DNAT
        - It can be thought of as a type of (source) routing
        - As a functional replacement of encapsulation
      
      In this patch set we hook into the data path using Light Weight
      Tunnels (LWT) infrastructure. As part of that, we add support in LWT
      to redirect dst input. iproute will be modified to take a new ila encap
      type. ILA can be configured like:
      
      ip route add 3333:0:0:1:5555:0:2:0/128 \
         encap ila 2001:0:0:2 via 2401:db00:20:911a:face:0:27:0
      
      ip -6 addr add 3333:0:0:1:5555:0:1:0/128 dev eth0
      
      ip route add table local local 2001:0:0:1:5555:0:1:0/128
         encap ila 3333:0:0:1 dev lo
      
      So sending to destination 3333:0:0:1:5555:0:2:0 will have destination
      of 2001:0:0:2:5555:0:2:0 on the wire.
      
      Performance results are below. With ILA we see about a 10% drop in
      pps compared to non-ILA. Much of this drop can be attributed to the
      loss of early demux on input (translation occurs after it is attempted).
      We will address this in the next patch set. Also, IPvlan input path
      does not work with ILA since the routing is bypassed-- this will
      be addressed in a future patch.
      
      Performance testing:
      
      Performing netperf TCP_RR with 200 clients:
      
      Non-ILA baseline
        84.92% CPU utilization
        1861922.9 tps
        93/163/330 50/90/99% latencies
      
      ILA single destination
        83.16% CPU utilization
        1679683.4 tps
        105/180/332 50/90/99% latencies
      
      References:
      
      Slides from netconf:
      http://vger.kernel.org/netconf2015Herbert-ILA.pdf
      
      Slides from presentation at IETF:
      https://www.ietf.org/proceedings/92/slides/slides-92-nvo3-1.pdf
      
      I-D:
      https://tools.ietf.org/html/draft-herbert-nvo3-ila-00
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b233dc7
    • T
      net: Identifier Locator Addressing module · 65d7ab8d
      Tom Herbert 提交于
      Adding new module name ila. This implements ILA translation. Light
      weight tunnel redirection is used to perform the translation in
      the data path. This is configured by the "ip -6 route" command
      using the "encap ila <locator>" option, where <locator> is the
      value to set in destination locator of the packet. e.g.
      
      ip -6 route add 3333:0:0:1:5555:0:1:0/128 \
            encap ila 2001:0:0:1 via 2401:db00:20:911a:face:0:25:0
      
      Sets a route where 3333:0:0:1 will be overwritten by
      2001:0:0:1 on output.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65d7ab8d
    • T
      net: Add inet_proto_csum_replace_by_diff utility function · abc5d1ff
      Tom Herbert 提交于
      This function updates a checksum field value and skb->csum based on
      a value which is the difference between the old and new checksum.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      abc5d1ff
    • T
      net: Change pseudohdr argument of inet_proto_csum_replace* to be a bool · 4b048d6d
      Tom Herbert 提交于
      inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates
      the checksum field carries a pseudo header. This argument should be a
      boolean instead of an int.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b048d6d
    • T
      lwt: Add support to redirect dst.input · 25368623
      Tom Herbert 提交于
      This patch adds the capability to redirect dst input in the same way
      that dst output is redirected by LWT.
      
      Also, save the original dst.input and and dst.out when setting up
      lwtunnel redirection. These can be called by the client as a pass-
      through.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25368623
    • D
      enic: Fix sparse warning in vnic_devcmd_init(). · f376d4ad
      David S. Miller 提交于
      >> drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13: sparse: incorrect type in assignment (different address spaces)
         drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13:    expected void *res
         drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13:    got void [noderef] <asn:2>*
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f376d4ad
    • D
      mlx5e: Fix sparse warnings in mlx5e_handle_csum(). · ecf842f6
      David S. Miller 提交于
      >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44: sparse: incorrect type in argument 1 (different base types)
         drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44:    expected restricted __sum16 [usertype] n
         drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44:    got restricted __be16 [usertype] check_sum
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ecf842f6
    • D
      inet: Move VRF table lookup to inlined function · dc028da5
      David Ahern 提交于
      Table lookup compiles out when VRF is not enabled.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc028da5
    • D
      net: Fix docbook warning for IFF_VRF_MASTER enum · 808d28c4
      David Ahern 提交于
      kbuild test robot reported:
      tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
      head:   d52736e2
      commit: 4e3c8992 [751/762] net: Introduce VRF related flags and helpers
      reproduce: make htmldocs
      
      >> Warning(include/linux/netdevice.h:1293): Enum value 'IFF_VRF_MASTER' not described in enum 'netdev_priv_flags'
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      808d28c4
    • D
      net: Updates to netif_index_is_vrf · 2f52bdcf
      David Ahern 提交于
      As Eric noted netif_index_is_vrf is not called with rcu_read_lock held,
      so wrap the dev_get_by_index_rcu in rcu_read_lock and unlock.
      
      If VRF is not enabled or oif is 0 skip the device lookup. In both cases
      index cannot be the VRF master.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f52bdcf
    • D
      Merge branch 'mlx5e-next' · 9cd3778c
      David S. Miller 提交于
      Achiad Shochat says:
      
      ====================
      Driver updates 16-Aug-2015
      
      This patchset contains bug fixes, new RSS and pause parameters ethtool
      options, and support for RX CHECKSUM_COMPLETE.
      
      Patchset was applied and tested over commit adc6310c ("Merge branch
      'mv88e6xxx-switchdev-fdb'").
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cd3778c
    • A
      net/mlx5e: Support RX CHECKSUM_COMPLETE · bbceefce
      Achiad Shochat 提交于
      Only for packets with first ethertype set to IPv4/6 for now.
      Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbceefce
    • A
      net/mlx5e: Support ethtool get/set_pauseparam · 3c2d18ef
      Achiad Shochat 提交于
      Only rx/tx pause settings.
      Autoneg setting is currently not supported.
      Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c2d18ef
    • A
      net/mlx5e: Ethtool link speed setting fixes · 6fa1bcab
      Achiad Shochat 提交于
      - Port speed settings are applied by the device only upon
        port admin status transition from DOWN to UP.
        So we enforce this transition regardless of the port's
        current operation state (which may be occasionally DOWN if
        for example the network cable is disconnected).
      - Fix the PORT_UP/DOWN device interface enum
      - Set the local_port bit in the device PAOS register
      - EXPORT the PAOS (Port Administrative and Operational Status)
        register set/query access functions.
      Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fa1bcab
    • A
      net/mlx5e: HW LRO changes/fixes · d9a40271
      Achiad Shochat 提交于
      - Change the maximum LRO session size from 16KB to 64KB
      - Reduce the LRO session timeout from 512us to 32us in
        order to reduce the TCP latency of non-LRO'ed flows.
      - Fix skb_shinfo(skb)->gso_size and set skb_shinfo(skb)->gso_type.
      - Fix a bug accessing un-initialized mdev pointer.
      Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9a40271
    • A
      net/mlx5e: Support smaller RX/TX ring sizes · e842b100
      Achiad Shochat 提交于
      We un-intentionally limited the minimum rings size too much.
      
      TX minimum ring size reduced from 128 to 64.
      RX minimum ring size reduced from 128 to 2.
      Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e842b100
    • A
      net/mlx5e: Add ethtool RSS configuration options · 2d75b2bc
      Achiad Shochat 提交于
      - get_rxfh_key_size
      - get_rxfh_indir_size
      - get/set_rxfh indirection table and RSS Toeplitz hash key
      - get_rxnfc
      Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d75b2bc
    • A
      net/mlx5e: Make RSS indirection table size a constant · 936896e9
      Achiad Shochat 提交于
      The indirection table size was defined by a variable that
      was actually assigned a constant value.
      Since we do not have any forseen intension to make it configurable
      we simply made it a constant.
      
      We also limit the number of channels such that the RSS indirection
      table could always populate all RX rings.
      Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      936896e9
    • A
      net/mlx5e: Have a single RSS Toeplitz hash key · 57afead5
      Achiad Shochat 提交于
      No need to generate a unique key per TIR.
      Generating a single key per netdev and copying it to all
      its TIRs.
      Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57afead5
    • D
      Merge branch 'for-upstream' of... · 0aa65cc0
      David S. Miller 提交于
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth-next 2015-08-16
      
      Here's what's likely the last bluetooth-next pull request for 4.3:
      
       - 6lowpan/802.15.4 refactoring, cleanups & fixes
       - Document 6lowpan netdev usage in Documentation/networking/6lowpan.txt
       - Support for UART based QCA Bluetooth controllers
       - Power management support for Broeadcom Bluetooth controllers
       - Change LE connection initiation to always use passive scanning first
       - Support for new Silicon Wave USB ID
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0aa65cc0
    • D
      Merge branch 'enic-devcmd2' · 863960b4
      David S. Miller 提交于
      Govindarajulu Varadarajan says:
      
      ====================
      enic: add devcmd2
      
      This series adds new devcmd2 support. The first two patches are code
      refactoring.
      
      devcmd is an interface for driver to communicate with fw/adaptor. It
      involves writing data to hardware registers and waiting for the result.
      This mechanism does not scale well. The queuing of "no wait" devcmds is
      done in firmware memory rather than on the host. Firmware memory is a
      rather more scarce and valuable resource than host memory. A devcmd storm
      from one vf can disrupt the service on other pf/vf. The lack of flow
      control allows for possible denial of server from one VM to another.
      Devcmd2 uses work queue to post the devcmds, just like tx work queue. This
      allows better flow control.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      863960b4
    • G
      enic: add devcmd2 · 373fb087
      Govindarajulu Varadarajan 提交于
      devcmd is an interface for driver to communicate with fw/adaptor. It
      involves writing data to hardware registers and waiting for the result.
      This mechanism does not scale well. The queuing of "no wait" devcmds is
      done in firmware memory rather than on the host. Firmware memory is a
      rather more scarce and valuable resource than host memory. A devcmd storm
      from one vf can disrupt the service on other pf/vf. The lack of flow
      control allows for possible denial of server from one VM to another.
      
      Devcmd2 uses work queue to post the devcmds, just like tx work queue. This
      allows better flow control.
      
      Initialize devcmd2, if fails we fall back to devcmd1.
      
      Also change the driver version.
      Signed-off-by: NN V V Satyanarayana Reddy <nalreddy@cisco.com>
      Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      373fb087