1. 08 7月, 2014 40 次提交
    • L
      amd-xgbe: Clear the proper MTL interrupt register · 91f87345
      Lendacky, Thomas 提交于
      When initializing the MTL interrupts the interrupt status
      register is written to instead of the interrupt enable register.
      Since no MTL interrupts are being enabled and the default state
      is for MTL interrupts to be disabled this did not cause a problem,
      but needs to be fixed to target the correct register.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91f87345
    • L
      amd-xgbe: Fix debugfs compatibility change with kstrtouint · f3f128d4
      Lendacky, Thomas 提交于
      The initial change from sscanf to kstrtouint broke backward
      compatbility by using a base of "0" in the kstrtouint call.
      This allowed for entering decimal, hexadecimal or octal as
      input where previously the sscanf always interpreted the input
      as hexadecimal.  Additionally, -EIO was returned on error prior
      to this change and now it is whatever the error value that is
      returned by kstrtouint.
      
      Change the base value of the kstrtouint from 0 to 16 and return
      -EIO on error.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reported-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3f128d4
    • R
      net: arcnet: Remove "#define bool int" · db55b62c
      Rasmus Villemoes 提交于
      The header file include/linux/arcdevice.h #defines bool to int, if
      bool is not already #defined. However, the files which use that header
      file seem to rely on that #define (unconditionally) being in effect:
      the prototypes for the functions arcrimi_reset, com20020_reset,
      com90io_reset, com90xx_reset (whose addresses are assigned to the
      hw.reset member of struct arcnet_local) use int explicitly.
      
      Moreover, that #define is an accident waiting to happen (scenario:
      inclusion of arcdevice.h followed by inclusion of some header which
      declares function prototypes using bool). Also, #include
      <linux/types.h> must appear before #include <linux/arcdevice.h> (the
      compiler wouldn't like "typedef _Bool int").
      
      Since none of the files using arcdevice.h declare variables of type
      "bool", the patch is actually quite simple, unlike the commit message.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db55b62c
    • G
      enic: fix return values in enic_set_coalesce · a16a3361
      Govindarajulu Varadarajan 提交于
      enic_set_coalesce() has two problems.
      
      * It should return -EINVAL and not -EOPNOTSUPP for invalid coalesce values.
      
      * In case of MSIX, enic_set_coalesce return error after applying requested
        coalescing setting partially. We should either apply all the setting requeste
        and return success or apply non and return error.
      
      * This patch also simplifies the algo.
      
      This was introduced by
      '7c2ce6e6 enic: Add support for adaptive interrupt coalescing'
      
      These changes were suggested by Ben Hutchings here
      http://www.spinics.net/lists/netdev/msg283972.html
      
      Also change enic driver version.
      Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a16a3361
    • J
      bonding: remove no longer relevant vlan warnings · e721f87d
      Jiri Pirko 提交于
      These warnings are no longer relevant. Even when last slave is
      removed, there is a valid address assigned to bond (random).
      The correct functionality of vlans is ensured by maintaining unicast
      list in vlan_sync_address().
      Suggested-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Acked-by: NVeaceslav Falico <vfalico@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e721f87d
    • D
      Merge branch 'at86rf230-next' · 7cb9e6bf
      David S. Miller 提交于
      Alexander Aring says:
      
      ====================
      at86rf230: rework driver implementation
      
      this patch series includes a rework of the at86rf230 driver.
      
      There are several changes:
      
       - Add regmap support.
       - Merge at86rf212 operations with generic at86rf2xx operations, all chips
         supports these operations.
       - Drop of irqworker. This is a workqueue which will scheduled by an irq to
         handle synchronous spi handling. Instead using asynchronous spi handling,
         then no scheduler is involved at irq handling.
       - Also detected some bugs by receiving frame like CRC can be correct and a
         802.15.4 frame length could be above 127 bytes. This would crash the whole
         kernel (but should be handled by the mac layer). Another bug is the handling
         with RX_SAFE_MODE which protect the frame buffer after a readout. This is
         currently not working because we read out the buffer twice and the first one
         to get the frame size. Solution is to readout always the whole frame buffer.
       - Added some timing relevants things from the datasheet for state changes And
         IEEE 802.15.4 standard like interframe spacing. Interframe spacing is needed
         to insert some receiving space time between frame transmitting. This should be
         also handled by MAC layer, but it's currently a workaround to add this inside
         the driver layer.
       - Add some callback setting for chip specific handling, instead of runtime decisions
         if (is_chip_type()). Callbacks are set only once at probe time.
       - We don't using a force state change anymore. A force state change will do a
         abort of receiving frames while we want to transmit a new frame. This should
         decrease the drop rate of packets.
       - And many others changes and bug fixes...
      
      changes since v3:
       - fix irq polarity in patch ("at86rf230: rework irq_pol setting").
      
      changes since v2:
       - add check if necessary functions are implemented when hw flags are set in patch
         ("mac802154: at86rf230: add hw flags and merge ops"). I choosed the second variant.
       - remove unnecessary includes for workqueue and mutex in patch
         ("at86rf230: rework transmit and receive").
       - remove unnecessary cast in patch ("at86rf230: rework transmit and receive").
       - acivate regmap cache with REGCACHE_RBTREE in patch
         ("at86rf230: add regmap support").
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7cb9e6bf
    • A
      at86rf230: add new author · 01ebd60b
      Alexander Aring 提交于
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01ebd60b
    • A
      at86rf230: add sleep cycle timing · 7a4ef918
      Alexander Aring 提交于
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a4ef918
    • A
      984e0c68
    • A
      09e536cd
    • A
      at86rf230: rework state change and start/stop · 2e0571c0
      Alexander Aring 提交于
      This patch removes the current synchron state change function and add a
      new function for a state assert. Change the start and stop callbacks to
      use this new synchron state change behaviour. It's a wrapper around the
      async state change function.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e0571c0
    • A
      at86rf230: rework irq_pol setting · 1db0558e
      Alexander Aring 提交于
      This patch rework the irq_pol register setting for rising and falling
      interrupt settings only. The default behaviour should be rising flag.
      
      Also use IRQ_TYPE_* defines instead of IRQF_* defines. There is no
      functionality change but irq_get_trigger_type returns IRQ_TYPE_* defines.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1db0558e
    • A
      at86rf230: move RX_SAFE_MODE setting to hw_init · 6bd2b132
      Alexander Aring 提交于
      There is no need to set this bit in start callback which could be
      called more than once.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bd2b132
    • A
      at86rf230: rework transmit and receive handling · 1d15d6b5
      Alexander Aring 提交于
      This patch is a complete reimplementation of transmit and receive
      handling for the at86rf230 driver.
      
      It solves also six bugs:
      
      First:
      
      The RX_SAFE_MODE is enabled and the transceiver doesn't leave the
      receive state while the framebuffer isn't read by a CMD_FB command.
      This is useful to read out the frame and don't get into another receive
      or transmit state, otherwise the frame would be overwritten.
      The current driver do twice CMD_FB calls, the first one leaves this
      protection.
      
      Second:
      
      Sometimes the CRC calculation is correct and the length field is greater
      127. The current mac802154 layer and filter of a at86rf2xx doesn't check
      on this and the kernel crashes. In this case the frame is corrupted, we
      send the whole receive buffer to the next layer which can be useful for
      sniffing.
      
      Thrid:
      There is a undocumented race condition. When we are go into the
      RX_AACK_ON state the transceiver could be changed into RX_AACK_BUSY
      state. This is a normal behaviour. In this case the transceiver received
      a SHR while assert wasn't finished.
      
      Fourth:
      It also handle some more "correct" state changes. In aret mode the
      transceiver need to go to TX_ON before the transceiver go into
      RX_AACK_ON.
      
      Fifth:
      The programming model [0] describes also a error handling in ARET mode
      if the trac status is different than zero. This is patch adds support
      for handling this.
      
      Sixth:
      In receive handling the transceiver should also get the trac status
      according [0]. The driver could use the trac status as error statistic
      handling, but the driver doesn't use this currently. There is maybe some
      timing behaviour or the read of this register change some transceiver
      states.
      
      In addition the irqworker is removed. Instead we do async spi calls and
      no scheduling is involved anymore. The transmit function is also
      asynchron but with a wait_for_completion handling. The mac802154 layer
      doesn't support asynchron transmit handling right now.
      
      The state change behaviour is now changes, before it was:
      
      1. assert while(!STATE_TRANSITION_IN_PROGRESS)
      2. state change
      3. assert while(!STATE_TRANSITION_IN_PROGRESS)
      4. assert once(wanted state != current state)
      
      Sometimes a unexcepted state change occurs when 4. assert was violated.
      The new state change behaviour is:
      
      1. assert while(!STATE_TRANSITION_IN_PROGRESS)
      2. state change
      3. wait state change timing according datasheet
      4. assert once(wanted state != current state)
      
      This behaviour is described in the at86rf231 software programming model [0].
      The state change documentation in this programming guide should also valid for
      at86rf212 and at86rf233 chips.
      
      The transceiver don't do a FORCE_TX_ON while we want to transmit a PDU.
      The new behaviour is a TX_ON and wait a receiving time (tFrame + tPAck).
      If we are still in RX_AACK_BUSY then we transmit a FORCE_TX_ON as timeout
      handling. The different is that FORCE_TX_ON aborts receiving and TX_ON
      waits if RX_AACK_BUSY is finished. This should decrease the drop rate of
      packets.
      
      [0] http://www.atmel.com/Images/AVR2022_swpm231-2.0.zipSigned-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d15d6b5
    • A
      at86rf230: add support for at86rf23x desense · a7d7eda9
      Alexander Aring 提交于
      To set the CCA_ED_THRES register the calculation for at86rf23x is
      different than for at86rf212. This patch adds a new callback for this
      calculation in chip data struct.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7d7eda9
    • A
      at86rf230: remove is212 and add driver data · a53d1f7c
      Alexander Aring 提交于
      This patch adds a new at86rf2xx_chip_data structure which holds device
      specific attributes. Instead of runtime decisions "if (is212())" we set
      callbacks/attributes while device detection.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a53d1f7c
    • A
      at86rf230: rework detect device handling · c8ee0f56
      Alexander Aring 提交于
      This patch drops the current lowlevel spi calls for the detect device
      function instead we handle this via regmap. Also put the detection of
      in a seperate function and set all device specific attributes while detection.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8ee0f56
    • A
      at86rf230: add regmap support · f76014f7
      Alexander Aring 提交于
      This patch adds regmap support for the at86rf230 driver and drop the
      lowlevel spi access functions and use the regmap access functions.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f76014f7
    • A
      mac802154: at86rf230: add hw flags and merge ops · 640985ec
      Alexander Aring 提交于
      This patch adds new mac802154 hw flags for transmit power, csma and
      listen before transmit (lbt). These flags indicates that the transceiver
      supports these features. If the flags are set and the driver doesn't
      implement the necessary functions, then ieee802154_register_device
      returns -ENOSYS "Function not implemented".
      
      This patch merges also all at86rf230 operations into one operations structure
      and set the right hw flags for the at86rf230 transceivers.
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      640985ec
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 1598c36a
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2014-07-02
      
      This series contains updates to i40e and i40evf.
      
      Anjali fixes a possible race where we were trying to free the dummy packet
      buffer in the function that created it, so cleanup the dummy packet buffer
      in i40e_clean_tx_ring() instead.  Also fixes an issue where the filter
      program routine was not checking if there were descriptors available for
      programming a filter.
      
      Mitch fixes unnecessary delays when sending the admin queue commands by
      moving a declaration up one level so we do not dereference it out of scope.
      Fixes an issue with the VF where if the admin queue interrupts get lost for
      some reason, the VF communication will stall as the VFs have no way of
      reaching the PF.  To alleviate this condition, go ahead and check the ARQ
      every time we run the service task.  Updates i40evf to allow the watchdog
      to fire vector 0 via software, which makes the driver tolerant of dropped
      interrupts on that vector.
      
      Paul fixes a shifted '1' to be unsigned to avoid shifting a signed integer.
      
      Jesse disables TPH by default since it is currently not enabled in the
      current hardware.  Also finishes the i40e implementation of get_settings
      for ethtool.
      
      Catherine adds a new variable (hw.phy.link_info.an_enabled) to track whether
      auto-negotiation is enabled, along with the functionality to update the
      variable.  Adds the functionality to set the requested flow control mode.
      Adds i40e implementation of setpauseparam and set_settings to ethtool.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1598c36a
    • D
      Merge branch 'fec-next' · dbaaca81
      David S. Miller 提交于
      Russell King says:
      
      ====================
      Freescale ethernet driver updates
      
      Here's the first batch of patches for the Freescale FEC ethernet driver.
      They require the previously applied "net: fec: Don't clear IPV6 header
      checksum field when IP accelerator enable" patch.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbaaca81
    • R
      net: fec: fix missing kmalloc() failure check in fec_enet_alloc_buffers() · ffdce2cc
      Russell King 提交于
      fec_enet_alloc_buffers() assumes that kmalloc() will never fail, which
      is an invalid assumption.  Fix this by implementing a common error
      cleanup path, and use it to also clean up after failed bounce buffer
      allocation.
      Acked-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ffdce2cc
    • R
      net: fec: ensure fec_enet_free_buffers() properly cleans the rings · 8b7c9efa
      Russell King 提交于
      Ensure that we do not double-free any allocations, and that any transmit
      skbuffs are properly freed when we clean up the rings.
      Acked-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b7c9efa
    • R
      net: fec: clean up transmit descriptor setup · d6bf3143
      Russell King 提交于
      Avoid writing any state until we're certain we can proceed with the
      transmission: this avoids writing mapping error address values to the
      descriptors, or setting the skbuff pointer until we have successfully
      mapped the skb.
      Acked-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6bf3143
    • R
      net: fec: make rx skb handling more robust · 730ee360
      Russell King 提交于
      Allocate, and then map the receive skb before writing any data to the
      ring descriptor or storing the skb.  When freeing the receive ring
      entries, unmap and free the skb, and then clear the stored skb pointer.
      
      This means we have ring data and skb pointer in one of two states:
      either both fully setup, or nothing setup.
      
      This simplifies the cleanup, as we can use just the skb pointer to
      indicate whether the descriptor is setup, and thus avoids potentially
      calling dma_unmap_single() on a DMA error value.
      Acked-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      730ee360
    • R
      net: fec: remove useless fep->opened · 5d165c55
      Russell King 提交于
      napi_disable() waits until the NAPI processing has completed, and then
      prevents any further polls.  At this point, the driver then clears
      fep->opened.  The NAPI poll function uses this to stop processing in
      the receive path.  Hence, it will never see this variable cleared,
      because the NAPI poll has to complete before it will be cleared.
      
      Therefore, this variable serves no purpose, so let's remove it.
      Acked-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d165c55
    • R
      net: fec: stop the phy before shutting down the MAC · d76cfae9
      Russell King 提交于
      When the network interface goes down, stop the phy to prevent further
      link up status changes before taking the MAC or netif sections down.
      This prevents further reception of link up events which could
      potentially call fec_restart().
      
      Since phy_stop() takes the mutex which adjust_link() runs under, we
      also ensure that adjust_link() will not already be processing a link
      up event.
      
      We also need to do this when suspending as well - we don't want a
      mis-timed phy state change to restart the MAC after we have stopped
      it for suspend, and thus need to restart the phy when resuming.
      Acked-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d76cfae9
    • R
      net: fec: ensure that a disconnected phy isn't configured · 0b146ca8
      Russell King 提交于
      When we disconnect from a phy, we should forget our pointer to it so we
      don't accidentally try to configure it.  We handle a NULL phy pointer
      correctly in most places, except fec_enet_set_pauseparam().  Fix this
      too.
      Acked-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b146ca8
    • R
      net: fec: remove checking for NULL phy_dev in fec_enet_close() · 635cf17c
      Russell King 提交于
      fep->phy_dev can not be NULL here for two reasons:
      - fec_enet_open() will have successfully connected the phy, or will have
        failed.
      - fec_enet_open() will have called phy_start(fep->phy_dev), which
        unconditionally dereferences this pointer.
      
      If it were to be NULL here, then fec_enet_open() will have already
      oopsed.
      Acked-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      635cf17c
    • R
      net: fec: use netif_tx_disable() rather than netif_stop_queue() · b49cd504
      Russell King 提交于
      We use netif_stop_queue() in several places where we want to ensure that
      the start_xmit function is not running.  netif_stop_queue() is not
      sufficient to achieve that - it merely sets a flag to indicate that the
      transmit queue(s) should not be run.
      
      netif_tx_disable() gives this guarantee, since it takes the transmit
      queue lock while marking the queue stopped.  This will wait for the
      transmit function to complete before returning.
      Acked-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b49cd504
    • R
      net: fec: fix interrupt handling races · 7a16807c
      Russell King 提交于
      While running: while :; do iperf -c <HOST> -P 4; done, transmit timeouts
      are regularly reported.  With the tx ring dumping in place, we can see
      that all entries are in use, and the hardware has finished transmitting
      these packets.  However, the driver has not reclaimed these ring
      entries.
      
      This can occur if the interrupt handler is invoked at the wrong moment -
      eg:
      
      	CPU0				CPU1
      	fec_enet_tx()
      					interrupt, IEVENT = FEC_ENET_TXF
      					FEC_ENET_TXF cleared
      					napi_schedule_prep()
      	napi_complete()
      
      The result is that we clear the transmit interrupt, but we don't trigger
      any cleaning of the transmit ring.  Instead, use a different strategy:
      
      - When receiving a transmit or receive interrupt, disable both tx and rx
        interrupts, but do not acknowledge them.  Schedule a napi poll.  Don't
        loop.
      
      - When we are polled, read IEVENT, acknowledging the pending transmit
        and receive interrupts, before then going on to process the
        appropriate rings.
      
      This allows us to avoid the race, and has a number of other advantages:
      - we cut down on the number of transmit interrupts we have to process.
      - we only look at the rings which have pending events.
      - we gain additional throughput: the iperf total bandwidth increases
        from about 180Mbps to 240Mbps:
      
      [  3]  0.0-10.0 sec  68.1 MBytes  57.0 Mbits/sec
      [  5]  0.0-10.0 sec  72.4 MBytes  60.5 Mbits/sec
      [  4]  0.0-10.1 sec  76.1 MBytes  63.5 Mbits/sec
      [  6]  0.0-10.1 sec  71.9 MBytes  59.9 Mbits/sec
      [SUM]  0.0-10.1 sec   288 MBytes   241 Mbits/sec
      Acked-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a16807c
    • R
      net: fec: fix ethtool set_pauseparam duplex bug · 9671a42e
      Russell King 提交于
      Setting the pause parameters causes a running network interface to be
      restarted.  However, the restart forces the FEC into half-duplex mode,
      whether or not the remote end is in half-duplex mode.  Misconfigured
      duplex mode is a known source of problems on a link.
      
      Fix this by always preserving the duplex mode on configuration changes.
      Acked-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9671a42e
    • R
      net: fec: iMX6 FEC does not support half-duplex gigabit · b44592ff
      Russell King 提交于
      The iMX6 gigabit FEC does not support half-duplex gigabit operation.
      Phys attacked to the FEC may support this, and we currently do nothing
      to disable this feature.  This may result in an invalid configuration.
      Mask out phy support for gigabit half-duplex operation.
      Acked-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b44592ff
    • D
      Merge branch 'net-hash-tx' · 6c035ea0
      David S. Miller 提交于
      Tom Herbert says:
      
      ====================
      net: Improvements and applications of packet flow hash in transmit path
      
      This patch series includes some patches which improve and make use
      of skb->hash in the transmit path.
      
      What is included:
      
      - Infrastructure to save a precomputed hash in the sock structure.
        For connected TCP and UDP sockets we only need to compute the
        flow hash once and not once for every packet.
      - Call skb_get_hash in get_xps_queue and __skb_tx_hash. This eliminates
        the awkward access to skb->sk->sk_hash in the lower transmit path.
      - Move UDP source port generation into a common function in udp.h This
        implementation is mostly based on vxlan_src_port.
      - Use non-zero IPv6 flow labels in flow_dissector as port information
        for flow hash calculation.
      - Implement automatic flow label generation on transmit (per RFC 6438).
      - Don't repeatedly try to compute an L4 hash in skb_get_hash if we've
        already tried to find one in software stack calculation.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c035ea0
    • T
      net: Only do flow_dissector hash computation once per packet · a3b18ddb
      Tom Herbert 提交于
      Add sw_hash flag to skbuff to indicate that skb->hash was computed
      from flow_dissector. This flag is checked in skb_get_hash to avoid
      repeatedly trying to compute the hash (ie. in the case that no L4 hash
      can be computed).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3b18ddb
    • T
      ipv6: Implement automatic flow label generation on transmit · cb1ce2ef
      Tom Herbert 提交于
      Automatically generate flow labels for IPv6 packets on transmit.
      The flow label is computed based on skb_get_hash. The flow label will
      only automatically be set when it is zero otherwise (i.e. flow label
      manager hasn't set one). This supports the transmit side functionality
      of RFC 6438.
      
      Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
      system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
      functionality per socket.
      
      By default, auto flowlabels are disabled to avoid possible conflicts
      with flow label manager, however if this feature proves useful we
      may want to enable it by default.
      
      It should also be noted that FreeBSD has already implemented automatic
      flow labels (including the sysctl and socket option). In FreeBSD,
      automatic flow labels default to enabled.
      
      Performance impact:
      
      Running super_netperf with 200 flows for TCP_RR and UDP_RR for
      IPv6. Note that in UDP case, __skb_get_hash will be called for
      every packet with explains slight regression. In the TCP case
      the hash is saved in the socket so there is no regression.
      
      Automatic flow labels disabled:
      
        TCP_RR:
          86.53% CPU utilization
          127/195/322 90/95/99% latencies
          1.40498e+06 tps
      
        UDP_RR:
          90.70% CPU utilization
          118/168/243 90/95/99% latencies
          1.50309e+06 tps
      
      Automatic flow labels enabled:
      
        TCP_RR:
          85.90% CPU utilization
          128/199/337 90/95/99% latencies
          1.40051e+06
      
        UDP_RR
          92.61% CPU utilization
          115/164/236 90/95/99% latencies
          1.4687e+06
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb1ce2ef
    • T
      flow_dissector: Use IPv6 flow label in flow_dissector · 19469a87
      Tom Herbert 提交于
      This patch implements the receive side to support RFC 6438 which is to
      use the flow label as an ECMP hash. If an IPv6 flow label is set
      in a packet we can use this as input for computing an L4-hash. There
      should be no need to parse any transport headers in this case.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19469a87
    • T
      vxlan: Call udp_flow_src_port · 535fb8d0
      Tom Herbert 提交于
      In vxlan and OVS vport-vxlan call common function to get source port
      for a UDP tunnel. Removed vxlan_src_port since the functionality is
      now in udp_flow_src_port.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      535fb8d0
    • T
      udp: Add function to make source port for UDP tunnels · b8f1a556
      Tom Herbert 提交于
      This patch adds udp_flow_src_port function which is intended to be
      a common function that UDP tunnel implementations call to set the source
      port. The source port is chosen so that a hash over the outer headers
      (IP addresses and UDP ports) acts as suitable hash for the flow of the
      encapsulated packet. In this manner, UDP encapsulation works with RSS
      and ECMP based wrt the inner flow.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8f1a556
    • T
      net: Call skb_get_hash in get_xps_queue and __skb_tx_hash · 0e001614
      Tom Herbert 提交于
      Call standard function to get a packet hash instead of taking this from
      skb->sk->sk_hash or only using skb->protocol.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e001614