1. 21 7月, 2015 24 次提交
  2. 16 7月, 2015 16 次提交
    • Y
    • D
      Merge branch 'protodown' · 0d057881
      David S. Miller 提交于
      Anuradha Karuppiah says:
      
      ====================
      net: Introduce protodown flag.
      
      User space daemons can detect errors in the network that need to be
      notified to the switch device drivers.
      
      Drivers can react to this error state by doing a phy-down on the
      switch-port which would result in a carrier-off locally and on the directly
      connected switch. Doing that would prevent loops and black-holes in the
      network.
      
      One such use case is the multi-chassis LAG application -
      
      1. The MLAG application runs on peer switches (say Switch0 and Switch1)
         synchronizing states, forwarding entries etc. between the two
         switches over the peer-link (this is a link directly connecting the
         two switches).
      2. An MLAG election process designates one of the switches as a primary
         (for e.g. Switch0 is primary and Switch1 is secondary).
      3. The peer link plays a critical role in allowing Switch0-Switch1 to
         function as a single LAG partner to the downstream dual-connected
         servers. When the peer-link between the switches goes down we have a
         split-brain situation. Switch0 and Switch1 are no longer in sync and
         are acting independently. This can result in traffic loops and
         traffic black-holing in the network.
      4. To prevent these problems the MLAG application on the secondary
         switch phy-downs the MLAG ports on detecting the peer-link down.
         This will be seen as a carrier down on servers that are
         dual-connected to Switch0 and Switch1.
      5. Specifically a dual-connected server will see a carrier-down on the
         port connected to the MLAG secondary, Switch1, and will stop using
         that port for traffic TX. So traffic black holing is prevented.
      
      v6 to v7:
         Removed some unnecessary code in response to review comments.
      
      v5 to v6:
         Replaced proto_flags with a simple proto_down boolean attribute in
         response to Dave's comments.
      
      v4 to v5:
         Changed the ip link display format for protodown to match the set as
         recommended by Stephen.
      
      v3 to v4:
         I have moved protodown out of IFF_XXX and introduced a separate
         proto_flags field with IF_PROTOF_DOWN bit being used by apps to notify
         switch port errors. This is in response to Stephen's comments that
         adding a new IFF_XXX may break user space.
      
         I have used rocker as the sample switch driver. And to test this
         functionality I used the qemu-rocker patch that Scott sent out in
         response to the v3 posting (needed to set link up/down when phy is
         enabled/disabled).
      
      v1 to v2:
         Based on Dave's suggestion I have moved out aggregating of error bits
         across applications to a user space framework. This patch now simply
         notifies an aggregated error bit to drivers enabling them to handle
         the error gracefully.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d057881
    • A
      rocker: Handle protodown notifications. · c3055246
      Anuradha Karuppiah 提交于
      protodown can be set by user space applications like MLAG on detecting
      errors on a switch port. This patch provides sample switch driver changes
      for handling protodown. Rocker PHYS disables the port in response to
      protodown.
      Signed-off-by: NAnuradha Karuppiah <anuradhak@cumulusnetworks.com>
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NWilson Kok <wkok@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3055246
    • A
    • A
      net core: Add protodown support. · d746d707
      Anuradha Karuppiah 提交于
      This patch introduces the proto_down flag that can be used by user space
      applications to notify switch drivers that errors have been detected on the
      device.
      
      The switch driver can react to protodown notification by doing a phys down
      on the associated switch port.
      Signed-off-by: NAnuradha Karuppiah <anuradhak@cumulusnetworks.com>
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NWilson Kok <wkok@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d746d707
    • T
      ibmveth: add support for TSO6 · 07e6a97d
      Thomas Falcon 提交于
      This patch adds support for a new method of signalling the firmware
      that TSO packets are being sent. The new method removes the need to
      alter the ip and tcp checksums and allows TSO6 support.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07e6a97d
    • H
      hv_netvsc: Add close of RNDIS filter into change mtu call · 2de8530b
      Haiyang Zhang 提交于
      The current change mtu call only stops tx before removing RNDIS filter.
      In case ringbufer is not empty, the rndis_filter_device_remove() may
      hang on removing the buffers.
      
      This patch adds close of RNDIS filter before removing it, also a
      gradual waiting loop until the ring is empty. The change_mtu hang
      issue under heavy traffic is solved by this patch.
      Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2de8530b
    • Y
      ipv6: Fix finding best source address in ipv6_dev_get_saddr(). · c0b8da1e
      YOSHIFUJI Hideaki/吉藤英明 提交于
      Commit 9131f3de ("ipv6: Do not iterate over all interfaces when
      finding source address on specific interface.") did not properly
      update best source address available.  Plus, it introduced
      possible NULL pointer dereference.
      
      Bug was reported by Erik Kline <ek@google.com>.
      Based on patch proposed by Hajime Tazaki <thehajime@gmail.com>.
      
      Fixes: 9131f3de ("ipv6: Do not
      	iterate over all interfaces when finding source address
      	on specific interface.")
      Signed-off-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
      Acked-by: NHajime Tazaki <thehajime@gmail.com>
      Acked-by: NErik Kline <ek@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0b8da1e
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 9243b25b
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2015-07-14
      
      This series contains updates to i40e and i40evf only.
      
      Joe Stringer and Jesse Gross add a ndo_features_check function to ensure
      that the i40e driver does not try to offload packets that exceed 80 bytes
      in length.
      
      Anjali adds additional stats to track flow director ATR and SB current
      state and flow director flush count which will help the need for verbose
      debug logs with respect to flow director.  Also refines an error message
      to avoid confusion, so that it indicates what may have really happened
      when the init_shared_code() call possibly fails.
      
      Pawel adds new fields to the capabilities structures to handle Flex-10
      device/function capabilities which is needed to support Flex-10 configs.
      
      Jesse improves the transmit performance by added a prefetch for the
      next transmit descriptor to be used when we know there are more coming.
      
      Mitch modifies i40evf driver to handle/allow an abundance of vectors.
      Currently the driver only maps transmit and receive queues to a single
      MSI-X vector per queue if there are exactly enough vectors for this, but
      if we have too many vectors, it will fail and allocate queues to vectors
      in a suboptimal manner.  So change the condition check to allow for an
      excess number of vectors and won't use the extras.  Also update the
      driver to just return success if the user attempts to set a port VLAN on
      a VF that already has the same port VLAN configured, instead of going
      through unnecessary filter removals & adds.  Fix the MAC filters for VFs,
      which were being programmed with 0 for the VLAN value when there was no
      VLAN assigned.  Instead, we must use -1 to indicate that no VLAN is in
      use.  Fix the VF disable code, which was not properly cleaning up the VF
      and would leave the VF in an indeterminate state, so fix this by
      notifying the VF and then call the normal VF reset routine.  Fix the
      logic in the driver so that MAC filters are added and removed correctly
      and added a check for the driver's hardware MAC address so that this
      filter does not get removed incorrectly.
      
      Carolyn removes incorrect #ifdef's which should not have been added in
      the first place and with the #ifdef's removed, make the necessary
      changes in the driver to resolve compile errors.
      
      Greg updates the admin queue command header defines.
      
      v2: fix indentation in patch 12 based on feedback from Sergei Shtylyov
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9243b25b
    • A
      pkt_sched: sch_qfq: remove unused member of struct qfq_sched · 40bdc536
      Andrea Parri 提交于
      The member (u32) "num_active_agg" of struct qfq_sched has been unused
      since its introduction in 462dbc91
      "pkt_sched: QFQ Plus: fair-queueing service at DRR cost" and (AFAICT)
      there is no active plan to use it; this removes the member.
      Signed-off-by: NAndrea Parri <parri.andrea@gmail.com>
      Acked-by: NPaolo Valente <paolo.valente@unimore.it>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40bdc536
    • C
      net: qlcnic: Deletion of unnecessary memset · e29dd443
      Christophe Jaillet 提交于
      There is no need to memset memory allocated with vzalloc.
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Acked-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e29dd443
    • D
      Merge branch 'gianfar_rx_sg' · 9061cb02
      David S. Miller 提交于
      Claudiu Manoil says:
      
      ====================
      gianfar: Add Rx S/G
      
      This patch-set introduces scatter/gather support
      on the Rx side, addressing Rx path performance
      issues in the driver.
      Thanks.
      
      As an example, two boards connected back-to-back
      were used to measure the throughput, running the
      same kernel 4.1, before and after applying these
      patches.
      The netperf UDP_STREAM results below show that the
      bottleneck lies on the Rx side BEFORE applying the
      patches, and that the Rx throughput is even lower
      with a larger MTU.  AFTER applying the patches the
      Rx bottleneck is gone (Rx throughput matches the
      Tx one) and the RX throughput is not influenced by
      MTU size any longer (as expected).
      
      BEFORE:
      
      1) MTU 1500 (default)
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840     512   150.00    20119124      0      549.4     100.00   14.911
      163840           150.00    14057349             383.9     100.00   14.911
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840      64   150.00    23654013      0       80.7     100.00   101.463
      163840           150.00    15875288              54.2     100.00   101.463
      
      2) MTU 8000
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840     512   150.00    20067232      0      548.0     100.00   14.950
      163840           150.00    6113498             166.9     99.95    14.942
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840      64   150.00    23621279      0       80.6     100.00   101.604
      163840           150.00    5868602              20.0     99.96    101.563
      
      AFTER:
      (both MTU 1500 and MTU 8000)
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840     512   150.00    19914969      0      543.8     100.00   15.064
      163840           150.00    19914969             543.8     99.35    14.966
      
      root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64
      MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      163840      64   150.00    23433989      0       80.0     100.00   102.416
      163840           150.00    23433989              80.0     99.62    102.023
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9061cb02
    • C
      gianfar: Add paged allocation and Rx S/G · 75354148
      Claudiu Manoil 提交于
      The eTSEC h/w is capable of scatter/gather on the receive side
      too if MAXFRM > MRBLR, when the allowed maximum Rx frame size
      is set to be greater than the maximum Rx buffer size (MRBLR).
      It's about time the driver makes use of this h/w capability,
      by supporting fixed buffer sizes and Rx S/G.
      
      The buffer size given to eTSEC for reception is fixed to
      1536B (must be multiple of 64), which is the same default
      buffer size as before, used to accommodate standard MTU
      (1500B) size frames.  As before, eTSEC can receive frames of
      up to 9600B.  Individual Rx buffers are mapped to page halves
      (page size for eTSEC systems is 4KB).  The skb is built around
      the first buffer of a frame (using build_skb()).  In case the
      frame spans multiple buffers, the trailing buffers are added
      as Rx fragments to the skb.  The last buffer in frame is marked
      by the L status flag.  A mechanism is in place to reuse the pages
      owned by the driver (for Rx) for subsequent receptions.
      
      Supporting fixed size buffers allows the implementation of Rx S/G,
      which in turn removes the memory pressure issues the driver had
      before when MTU was set for jumbo frame reception.
      Also, in most cases, the Rx path becomes faster due to Rx page
      reusal, since the overhead of allocating new rx buffers is removed
      from the fast path.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75354148
    • C
      gianfar: Use ndev, more Rx path cleanup · f23223f1
      Claudiu Manoil 提交于
      Use "ndev" instead of "dev", as the rx queue back pointer
      to a net_device struct, to avoid name clashing with a
      "struct device" reference.  This prepares the addition of a
      "struct device" back pointer to the rx queue structure.
      
      Remove duplicated rxq registration in the process.
      Move napi_gro_receive() outside gfar_process_frame().
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f23223f1
    • C
      gianfar: Fix and cleanup rxbd status handling · f966082e
      Claudiu Manoil 提交于
      There are several (long standing) problems about how the status
      field of the rx buffer descriptor (rxbd) is currently handled on
      the error path:
      - too many unnecessary 16bit reads of the two halves of the rxbd
      status field (32bit), also resulting in overuse of endianness
      convesion macros;
      - "bdp->status = RXBD_LARGE" makes no sense, since the "large"
      flag is read only (only eTSEC can write it), and trying to clear
      the other status bits is also error prone in this context
      (most of the rx status bits are read only anyway).
      
      This is fixed with a single 32bit read of the "status" field,
      and then the appropriate 16bit shifting is applied to access
      the various status bits or the rx frame length. Also corrected
      the use of the RXBD_LARGE flag.
      
      Additional fix:
      "rx_over_errors" stat is incremented instead of "rx_crc_errors"
      in case of RXBD_OVERRUN occurrence.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f966082e
    • C
      gianfar: Bundle Rx allocation, cleanup · 76f31e8b
      Claudiu Manoil 提交于
      Use a more common consumer/ producer index design to improve
      rx buffer allocation.  Instead of allocating a single new buffer
      (skb) on each iteration, bundle the allocation of several rx
      buffers at a time.  This also opens the path for further memory
      optimizations.
      
      Remove useless check of rxq->rfbptr, since this patch touches
      rx pause frame handling code as well.  rxq->rfbptr is always
      initialized as part of Rx BD ring init.
      Remove redundant (and misleading) 'amount_pull' parameter.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76f31e8b