1. 02 3月, 2016 23 次提交
    • J
      Introduce devlink infrastructure · bfcd3a46
      Jiri Pirko 提交于
      Introduce devlink infrastructure for drivers to register and expose to
      userspace via generic Netlink interface.
      
      There are two basic objects defined:
      devlink - one instance for every "parent device", for example switch ASIC
      devlink port - one instance for every physical port of the device.
      
      This initial portion implements basic get/dump of objects to userspace.
      Also, port splitter and port type setting is implemented.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfcd3a46
    • D
      Merge branch 'tc-sw-only' · bd070e21
      David S. Miller 提交于
      John Fastabend says:
      
      ====================
      tc software only
      
      This adds a software only flag to tc but incorporates a bunch of comments
      from the original attempt at this.
      
      First instead of having the offload decision logic be embedded in cls_u32
      I lifted into cls_pkt.h so it can be used anywhere and named the flag
      TCA_CLS_FLAGS_SKIP_HW (Thanks Jiri ;)
      
      In order to do this I put the flag defines in pkt_cls.h as well. However
      it was suggested that perhaps these flags could be lifted into the
      upper layer of TCA_ as well but I'm afraid this can not be done with
      existing tc design as far as I can tell. The problem is the filters are
      packed and unpacked in the classifier specific code and pushing the flags
      through the high level doesn't seem easily doable. And we already have
      this design where classifiers handle generic options such as actions and
      policers. So I think adding one more thing here is OK as 'tc', et. al.
      already know how to handle this type of thing.
      ====================
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd070e21
    • J
      net: sched: cls_u32 add bit to specify software only rules · 9e8ce79c
      John Fastabend 提交于
      In the initial implementation the only way to stop a rule from being
      inserted into the hardware table was via the device feature flag.
      However this doesn't work well when working on an end host system
      where packets are expect to hit both the hardware and software
      datapaths.
      
      For example we can imagine a rule that will match an IP address and
      increment a field. If we install this rule in both hardware and
      software we may increment the field twice. To date we have only
      added support for the drop action so we have been able to ignore
      these cases. But as we extend the action support we will hit this
      example plus more such cases. Arguably these are not even corner
      cases in many working systems these cases will be common.
      
      To avoid forcing the driver to always abort (i.e. the above example)
      this patch adds a flag to add a rule in software only. A careful
      user can use this flag to build software and hardware datapaths
      that work together. One example we have found particularly useful
      is to use hardware resources to set the skb->mark on the skb when
      the match may be expensive to run in software but a mark lookup
      in a hash table is cheap. The idea here is hardware can do in one
      lookup what the u32 classifier may need to traverse multiple lists
      and hash tables to compute. The flag is only passed down on inserts.
      On deletion to avoid stale references in hardware we always try
      to remove a rule if it exists.
      
      The flags field is part of the classifier specific options. Although
      it is tempting to lift this into the generic structure doing this
      proves difficult do to how the tc netlink attributes are implemented
      along with how the dump/change routines are called. There is also
      precedence for putting seemingly generic pieces in the specific
      classifier options such as TCA_U32_POLICE, TCA_U32_ACT, etc. So
      although not ideal I've left FLAGS in the u32 options as well as it
      simplifies the code greatly and user space has already learned how
      to manage these bits ala 'tc' tool.
      
      Another thing if trying to update a rule we require the flags to
      be unchanged. This is to force user space, software u32 and
      the hardware u32 to keep in sync. Thanks to Simon Horman for
      catching this case.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e8ce79c
    • J
      net: cls_u32: move TC offload feature bit into cls_u32 offload logic · 2b6ab0d3
      John Fastabend 提交于
      In the original series drivers would get offload requests for cls_u32
      rules even if the feature bit is disabled. This meant the driver had
      to do a boiler plate check on the feature bit before adding/deleting
      the rule.
      
      This patch lifts the check into the core code and removes it from the
      driver specific case.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b6ab0d3
    • J
      net: sched: consolidate offload decision in cls_u32 · 6843e7a2
      John Fastabend 提交于
      The offload decision was originally very basic and tied to if the dev
      implemented the appropriate ndo op hook. The next step is to allow
      the user to more flexibly define if any paticular rule should be
      offloaded or not. In order to have this logic in one function lift
      the current check into a helper routine tc_should_offload().
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6843e7a2
    • D
      Merge branch 'ndo_set_rx_headroom' · d2e42a17
      David S. Miller 提交于
      Paolo Abeni says:
      
      ====================
      bridge/ovs: avoid skb head copy on frame forwarding
      
      Currently, while when an OVS or Linux bridge is used to forward frames towards
      some tunnel device, a skb_head_copy() may occur if the ingress device do not
      provide enough headroom for the tx encapsulation.
      
      This patch series tries to address the issue implementing a new ndo operation to
      allow the master device to control the headroom used when allocating the skb on
      frame reception.
      
      Said operation is used by the Linux bridge to notify the bridged ports of
      needed_headroom changes, and similar bookkeeping and behaviour is also added to
      openvswitch, on a per datapath basis.
      
      Finally, the operation is implemented for veth and tun device, which give
      performance improvement in the 6-12% range when forwarding frames from said
      devices towards a vxlan tunnel.
      
      v2:
      - fix netdev_get_fwd_headroom() behaviour
      - remove some code duplication with the netdev_set_rx_headroom() and
         netdev_reset_rx_headroom() helpers
      - handle headroom reset on [v]port removal/deletion
      - initialize tun align to the old default value
      
      v3:
      - fix a comment typo
      ====================
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2e42a17
    • P
      veth: implement ndo_set_rx_headroom · 163e5292
      Paolo Abeni 提交于
      The rx headroom for veth dev is the peer device needed_headroom.
      Avoid ping-pong updates setting the private flag IFF_PHONY_HEADROOM.
      
      This avoids skb head reallocation when forwarding from a veth dev
      towards a device adding some kind of encapsulation.
      
      When transmitting frames below the MTU size towards a vxlan device,
      this gives about 10% performance speed-up when OVS is used to connect
      the veth and the vxlan device and a little more when using a
      plain Linux bridge.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      163e5292
    • P
      net/tun: implement ndo_set_rx_headroom · eaea34b2
      Paolo Abeni 提交于
      ndo_set_rx_headroom controls the align value used by tun devices to
      allocate skbs on frame reception.
      When the xmit device adds a large encapsulation, this avoids an skb
      head reallocation on forwarding.
      
      The measured improvement when forwarding towards a vxlan dev with
      frame size below the egress device MTU is as follow:
      
      vxlan over ipv6, bridged: +6%
      vxlan over ipv6, ovs: +7%
      
      In case of ipv4 tunnels there is no improvement, since the tun
      device default alignment provides enough headroom to avoid the skb
      head reallocation.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaea34b2
    • P
      ovs: propagate per dp max headroom to all vports · 3a927bc7
      Paolo Abeni 提交于
      This patch implements bookkeeping support to compute the maximum
      headroom for all the devices in each datapath. When said value
      changes, the underlying devs are notified via the
      ndo_set_rx_headroom method.
      
      This also increases the internal vports xmit performance.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a927bc7
    • P
      bridge: notify enslaved devices of headroom changes · 45493d47
      Paolo Abeni 提交于
      On bridge needed_headroom changes, the enslaved devices are
      notified via the ndo_set_rx_headroom method
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45493d47
    • P
      netdev: introduce ndo_set_rx_headroom · 871b642a
      Paolo Abeni 提交于
      This method allows the controlling device (i.e. the bridge) to specify
      additional headroom to be allocated for skb head on frame reception.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      871b642a
    • D
      Merge branch 'bnxt_en-next' · 46d5efa9
      David S. Miller 提交于
      Michael Chan says:
      
      ====================
      bnxt_en: updates for net-next.
      
      Miscellaneous updates covering SRIOV, IRQ coalescing, firmware logging and
      package version for net-next.  Thanks.
      
      v2: Updated description and added more comments for patch 1.  Fixed
      function parameters formatting for patch 4.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46d5efa9
    • M
      bnxt_en: Add hwrm_send_message_silent(). · 90e20921
      Michael Chan 提交于
      This is used to send NVM_FIND_DIR_ENTRY messages which can return error
      if the entry is not found.  This is normal and the error message will
      cause unnecessary alarm, so silence it.
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90e20921
    • M
      bnxt_en: Refactor _hwrm_send_message(). · fbfbc485
      Michael Chan 提交于
      Add a new function bnxt_do_send_msg() to do essentially the same thing
      with an additional paramter to silence error response messages.  All
      current callers will set silent to false.
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbfbc485
    • R
      bnxt_en: Add installed-package firmware version reporting via Ethtool GDRVINFO · 3ebf6f0a
      Rob Swindell 提交于
      For everything to fit, we remove the PHY microcode version and replace it
      with the firmware package version in the fw_version string.
      Signed-off-by: NRob Swindell <swindell@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ebf6f0a
    • M
      bnxt_en: Fix dmesg log firmware error messages. · a8643e16
      Michael Chan 提交于
      Use appropriate firmware request header structure to prepare the
      firmware messages.  This avoids the unnecessary conversion of the
      fields to 32-bit fields.  Add appropriate endian conversion when
      printing out the message fields in dmesg so that they appear correct
      in the log.
      Reported-by: NRob Swindell <swindell@broadcom.com>
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8643e16
    • M
      bnxt_en: Use firmware provided message timeout value. · ff4fe81d
      Michael Chan 提交于
      Before this patch, we used a hardcoded value of 500 msec as the default
      value for firmware message response timeout.  For better portability with
      future hardware or debug platforms, use the value provided by firmware in
      the first response and store it for all susequent messages.  Redefine the
      macro HWRM_CMD_TIMEOUT to the stored value.  Since we don't have the
      value yet in the first message, use the 500 ms default if the stored value
      is zero.
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff4fe81d
    • M
      bnxt_en: Add coalescing support for tx rings. · dfc9c94a
      Michael Chan 提交于
      When tx and rx rings don't share the same completion ring, tx coalescing
      parameters can be set differently from the rx coalescing parameters.
      Otherwise, use rx coalescing parameters on shared completion rings.
      
      Adjust rx coalescing default values to lower interrupt rate.
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfc9c94a
    • M
      bnxt_en: Refactor bnxt_hwrm_set_coal(). · bb053f52
      Michael Chan 提交于
      Add a function to set all the coalescing parameters.  The function can
      be used later to set both rx and tx coalescing parameters.
      
      v2: Fixed function parameters formatting requested by DaveM.
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb053f52
    • M
      bnxt_en: Store irq coalescing timer values in micro seconds. · dfb5b894
      Michael Chan 提交于
      Don't convert these to internal hardware tick values before storing
      them.  This avoids the confusion of ethtool -c returning slightly
      different values than the ones set using ethtool -C when we convert
      hardware tick values back to micro seconds.  Add better comments for
      the hardware settings.
      
      Also, rename the current set of coalescing fields with rx_ prefix.
      The next patch will add support of tx coalescing values.
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfb5b894
    • J
      bnxt_en: Send PF driver unload notification to all VFs. · 19241368
      Jeffrey Huang 提交于
      During remove_one() when SRIOV is enabled, the PF driver
      should broadcast PF driver unload notification to all
      VFs that are attached to VMs. Upon receiving the PF
      driver unload notification, the VF driver should print
      a warning message to message log.  Certain operations on the
      VF may not succeed after the PF has unloaded.
      Signed-off-by: NJeffrey Huang <huangjw@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19241368
    • J
      bnxt_en: Improve bnxt_vf_update_mac(). · 3874d6a8
      Jeffrey Huang 提交于
      Allow the VF to setup its own MAC address if the PF has not administratively
      set it for the VF.  To do that, we should always store the MAC address
      from the firmware.  There are 2 cases:
      
      1. The MAC address is valid.  This MAC address is assigned by the PF and
      it needs to override the current VF MAC address.
      
      2. The MAC address is zero.  The VF will use a random MAC address by default.
      By storing this 0 MAC address in the VF structure, it will allow the VF
      user to change the MAC address later using ndo_set_mac_address() when
      it sees that the stored MAC address is 0.
      
      v2: Expanded descriptions and added more comments.
      Signed-off-by: NJeffrey Huang <huangjw@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3874d6a8
    • D
      Merge tag 'linux-can-next-for-4.6-20160226' of... · 0c92c949
      David S. Miller 提交于
      Merge tag 'linux-can-next-for-4.6-20160226' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can-next 2016-02-26
      
      this is a pull request of 3 patch for net-next/master.
      
      There are two patches by Simon Horman, in which the device tree support
      for the rcar_can driver is improved. One patch by me fixes the bad
      coding style of the ems_usb driver which was introduced recently.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c92c949
  2. 01 3月, 2016 11 次提交
  3. 29 2月, 2016 1 次提交
  4. 28 2月, 2016 1 次提交
  5. 27 2月, 2016 4 次提交
    • D
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · e1bae75d
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      1GbE Intel Wired LAN Driver Updates 2016-02-24
      
      This series contains updates to e1000e, igb and igbvf.
      
      Raanan provides updates for e1000e, first increases the ULP timer since it
      now takes longer for the ULP exit to complete on Skylake.  Fixes the
      configuration of the internal hardware PHY clock gating mechanism, which was
      causing packet loss due to mis configuring.  Fixed additional ULP
      configuration settings which were not being properly cleared after cable
      connect in V-Pro capable systems.  Added support for more i219 devices.
      
      Takuma Ueba provides a fix for I210 where IPv6 autoconf test sometimes
      fails due to DAD NS for link-local is not transmitted.  To avoid this
      issue, we need to wait until 1000BASE-T status register "Remote receiver
      status OK".
      
      Todd provides a patch to override EEPROM WoL settings for specific OEM
      devices. Then renamed igb defines to be more generic, since the define
      E1000_MRQC_ENABLE_RSS_4Q enables 4 and 8 queues depending on the part.
      
      Roland Hii fixes an issue where only the half cycle time of less than or
      equal to 70 millisecond uses the I210 clock output function.  His patch
      adds additional conditions when half cycle time is equal to 125 or 250 or
      500 millisecond to use the clock output function.
      
      Alex Duyck adds support for generic transmit checksums for igb and igbvf.
      
      Jon Maxwell fixes an issues where customer applications are registering
      and un-registering multicast addresses every few seconds which is leading
      to many "Link is up" messages in the logs as a result of the
      netif_carrier_off(netdev) in igbvf_msix_other().  So remove the
      link is up message when registering multicast addresses.
      
      Corinna Vinschen provides a fix for when switching off VLAN offloading on
      i350, the VLAN interface becomes unusable.
      
      Stefan Assmann updates the driver to use ndo_stop() instead of
      dev_close() when running ethtool offline self test.  Since dev_close()
      causes IFF_UP to be cleared which will remove the interfaces routes
      and some addresses.
      
      v2: Dropped patches 6-10 in the original series.  Patch 6-7 added support
          for character device for AVB and based on community feedback, we do not
          want to do this.  Patches 8-10 provided fixes to the problematic code
          added in patches 6 & 7.  So all of them must go!
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1bae75d
    • A
      GSO: Provide software checksum of tunneled UDP fragmentation offload · 22463876
      Alexander Duyck 提交于
      On reviewing the code I realized that GRE and UDP tunnels could cause a
      kernel panic if we used GSO to segment a large UDP frame that was sent
      through the tunnel with an outer checksum and hardware offloads were not
      available.
      
      In order to correct this we need to update the feature flags that are
      passed to the skb_segment function so that in the event of UDP
      fragmentation being requested for the inner header the segmentation
      function will correctly generate the checksum for the payload if we cannot
      segment the outer header.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22463876
    • D
      Merge branch 'vrf-saddr-selection' · c3f463ba
      David S. Miller 提交于
      David Ahern says:
      
      ====================
      net: l3mdev: Fix source address for unnumbered deployments
      
      David Lamparter noted a use case where the source address selection fails
      to pick an address from a VRF interface - unnumbered interfaces. The use
      case has the VRF device as the VRF local loopback with addresses and
      interfaces enslaved without an address themselves. e.g,
      
          ip addr add 9.9.9.9/32 dev lo
          ip link set lo up
      
          ip link add name vrf0 type vrf table 101
          ip rule add oif vrf0 table 101
          ip rule add iif vrf0 table 101
          ip link set vrf0 up
          ip addr add 10.0.0.3/32 dev vrf0
      
          ip link add name dummy2 type dummy
          ip link set dummy2 master vrf0 up
      
          --> note dummy2 has no address - unnumbered device
      
          ip route add 10.2.2.2/32 dev dummy2 table 101
          ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02
      
      ping to the 10.2.2.2 through the L3 domain:
      
          $ ping -I vrf0 -c1 10.2.2.2
          ping: Warning: source address might be selected on device other than vrf0.
          PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data.
      
      picks up the wrong address -- the one from 'lo' not vrf0. And from tcpdump:
          12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64
      
      This patch series changes address selection to only consider devices in
      the same L3 domain and to use the VRF device as the L3 domains loopback.
      
          $ ping -I vrf0 -c1 10.2.2.2
          PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data.
      
      From tcpdump:
          12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64
      
      Now the source address comes from vrf0.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3f463ba
    • D
      net: l3mdev: prefer VRF master for source address selection · 17b693cd
      David Lamparter 提交于
      When selecting an address in context of a VRF, the vrf master should be
      preferred for address selection.  If it isn't, the user has a hard time
      getting the system to select to their preference - the code will pick
      the address off the first in-VRF interface it can find, which on a
      router could well be a non-routable address.
      Signed-off-by: NDavid Lamparter <equinox@diac24.net>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      [dsa: Fixed comment style and removed extra blank link ]
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17b693cd