1. 11 11月, 2014 27 次提交
    • D
      ixgbe: cleanup ixgbe_ndo_set_vf_vlan · 2b509c0c
      Don Skidmore 提交于
      Clean up functionality in ixgbe_ndo_set_vf_vlan that will simplify later
      patches.
      Signed-off-by: NDon Skidmore <donald.c.skidmore@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2b509c0c
    • D
      ixgbe: fix X540 Completion timeout · 71bde601
      Don Skidmore 提交于
      On topologies including few levels of PCIe switching X540 can run into an
      unexpected completion error.  We get around this by waiting after enabling
      loopback a sufficient amount of time until Tx Data Fetch is sent.  We then
      poll the pending transaction bit to ensure we received the completion.  Only
      then do we go on to clear the buffers.
      Signed-of-by: NDon Skidmore <donald.c.skidmore@intel.com>
      Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      71bde601
    • M
      i40evf: don't use more queues than CPUs · cc052927
      Mitch Williams 提交于
      It's kind of silly to configure and attempt to use a bunch of queue
      pairs when you're running on a single (virtual) CPU. Instead of
      unconditionally configuring all of the queues that the PF gives us,
      clamp the number of queue pairs to the number of CPUs.
      
      Change-ID: I321714c9e15072ee76de8f95ab9a81f86ed347d1
      Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
      Signed-off-by: NPatrick Lu <patrick.lu@intel.com>
      Tested-by: NJim Young <jamesx.m.young@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      cc052927
    • M
      i40evf: make early init processing more robust · f8d4db35
      Mitch Williams 提交于
      In early init, if we get an unexpected message from the PF (such as link
      status), we just kick an error back to the init task, causing it to
      restart its state machine and delaying initialization.
      
      Make the early init AQ message receive code more robust by handling
      messages in a loop, and ignoring those that we aren't interested in.
      This also gets rid of some scary log messages that really didn't
      indicate a problem.
      
      Change-ID: I620e8c72e49c49c665ef33eeab2425dd10e721cf
      Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
      Signed-off-by: NPatrick Lu <patrick.lu@intel.com>
      Tested-by: NJim Young <jamesx.m.young@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      f8d4db35
    • J
      i40e: clean up throttle rate code · 79442d38
      Jesse Brandeburg 提交于
      The interrupt throttle rate minimum is actually 2us, so
      fix that define and while we are there, remove some unused defines.
      
      Change some strings in the function to be a bit less wrappy, and
      express the correct limits.
      
      Change-ID: I96829bbc77935e0b57c6f0fc1439fb4152b2960a
      Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NPatrick Lu <patrick.lu@intel.com>
      Tested-by: NJim Young <jamesx.m.young@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      79442d38
    • S
      i40e: don't do link_status or stats collection on every ARQ · 21536717
      Shannon Nelson 提交于
      The ARQ events cause a service_task execution, and we do a link_status
      check and full stats gathering for each service_task.  However, when
      there are a lot of ARQ events, such as when doing an NVM update, we end up
      doing 10's if not 100's of these per second, thereby heavily abusing the
      PCI bus and especially the Firmware.  This patch adds a check to keep the
      service_task from running these periodic tasks more than once per second,
      while still allowing quick action to service the events.
      
      Change-ID: Iec7670c37bfae9791c43fec26df48aea7f70b33e
      Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
      Signed-off-by: NPatrick Lu <patrick.lu@intel.com>
      Tested-by: NJim Young <jamesx.m.young@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      21536717
    • K
      i40e: poll firmware slower · 0db4e162
      Kamil Krawczyk 提交于
      The code was polling the firmware tail register for completion every
      10 microseconds, which is way faster than the firmware can respond.
      This changes the poll interval to 1ms, which reduces polling CPU
      utilization, and the number of times we loop.
      
      The maximum delay is still 100ms.
      
      Change-ID: I4bbfa6b66d802890baf8b4154061e55942b90958
      Signed-off-by: NKamil Krawczyk <kamil.krawczyk@intel.com>
      Acked-by: NShannon Nelson <shannon.nelson@intel.com>
      Tested-by: NJim Young <jamesx.m.young@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      0db4e162
    • E
      mlx4: restore conditional call to napi_complete_done() · 2e1af7d7
      Eric Dumazet 提交于
      After commit 1a288172 ("mlx4: use napi_complete_done()") we ended up
      calling napi_complete_done() in the case NAPI poll consumed all its
      budget.
      
      This added extra interrupt pressure, this patch restores proper
      behavior.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Fixes: 1a288172 ("mlx4: use napi_complete_done()")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e1af7d7
    • D
      Merge branch 'sunvnet-next' · d21385fa
      David S. Miller 提交于
      Sowmini Varadhan says:
      
      ====================
      sunvnet: edge-case/race-conditions bug fixes
      
      This patch series contains fixes for race-conditions in sunvnet,
      that can encountered when there is a difference in latency between
      producer and consumer.
      
      Patch 1 addresses a case when the STOPPED LDC ack from a peer is
      processed before vnet_start_xmit can finish updating the dr->prod
      state.
      
      Patch 2 fixes the edge-case when outgoing data and incoming
      stopped-ack cross each other in flight.
      
      Patch 3 adds a missing rcu_read_unlock(), found by code-inspection.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d21385fa
    • S
      sunvnet: Add missing rcu_read_unlock() in vnet_start_xmit · df20286a
      Sowmini Varadhan 提交于
      The out_dropped label will only do rcu_read_unlock for non-null port.
      So add the missing rcu_read_unlock() when bailing due to non-null port.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df20286a
    • S
      sunvnet: vnet_ack() should check if !start_cons to send a missed trigger · 777362d7
      Sowmini Varadhan 提交于
      As per comments in vnet_start_xmit, for the edge case
      when outgoing vnet_start_xmit() data and an incoming STOPPED
      ACK cross each other in flight, we may need to send the missed
      START trigger from maybe_tx_wakeup() after checking for a
      false value of start_cons
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      777362d7
    • S
      sunvnet: Fix race between vnet_start_xmit() and vnet_ack() · b0cffed5
      Sowmini Varadhan 提交于
      When vnet_start_xmit() is concurrent with vnet_ack(), we may
      have a race that looks like:
      
          thread 1                              thread 2
          vnet_start_xmit                       vnet_event_napi -> vnet_rx
      
      __vnet_tx_trigger for some desc X
      at this point dr->prod == X
                                              peer sends back a stopped ack for X
                                              we process X, but X == dr->prod
                                              so we bail out in vnet_ack with
                                              !idx_is_pending
      update dr->prod
      
      As a result of the fact that we never processed the stopped ack for X,
      the Tx path is led to incorrectly believe that the peer is still
      "started" and reading, but the peer has stopped reading, which will
      ultimately end in flow-control assertions.
      
      The fix is to synchronize the above 2 paths  on the netif_tx_lock.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0cffed5
    • A
      8139too: Allow using the largest possible MTU · 6f6e741f
      Alban Bedel 提交于
      This driver allows MTU up to 1518 bytes which is not enought to run
      batman-adv. Simply raise the maximum packet size up to the maximum
      allowed by the transmit descriptor, 1792 bytes, giving a maximum MTU
      of 1774 bytes.
      Signed-off-by: NAlban Bedel <albeu@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f6e741f
    • A
      8139too: Allow setting MTU larger than 1500 · ef786f10
      Alban Bedel 提交于
      Replace the default ndo_change_mtu callback with one that allow
      setting MTU that the driver can handle.
      Signed-off-by: NAlban Bedel <albeu@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef786f10
    • D
      Merge tag 'master-2014-11-04' of... · b9217266
      David S. Miller 提交于
      Merge tag 'master-2014-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
      
      John W. Linville says:
      
      ====================
      pull request: wireless-next 2014-11-07
      
      Please pull this batch of updates intended for the 3.19 stream!
      
      For the mac80211 bits, Johannes says:
      
      "This relatively large batch of changes is comprised of the following:
       * large mac80211-hwsim changes from Ben, Jukka and a bit myself
       * OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
         University in Prague and Volkswagen Group Research
       * minstrel VHT work from Karl
       * more CSA work from Luca
       * WMM admission control support in mac80211 (myself)
       * various smaller fixes, spelling corrections, and minor API additions"
      
      For the Bluetooth bits, Johan says:
      
      "Here's the first bluetooth-next pull request for 3.19. The vast majority
      of patches are for ieee802154 from Alexander Aring with various fixes
      and cleanups. There are also several LE/SMP fixes as well as improved
      support for handling LE devices that have lost their pairing information
      (the patches from Alfonso). Jukka provides a couple of stability fixes
      for 6lowpan and Szymon conformance fixes for RFCOMM. For the HCI drivers
      we have one new USB ID for an Acer controller as well as a reset
      handling fix for H5."
      
      For the Atheros bits, Kalle says:
      
      "Major changes are:
      
      o ethtool support (Ben)
      
      o print dev string prefix with debug hex buffers dump (Michal)
      
      o debugfs file to read calibration data from the firmware verification
        purposes (me)
      
      o fix fw_stats debugfs file, now results are more reliable (Michal)
      
      o firmware crash counters via debugfs (Ben&me)
      
      o various tracing points to debug firmware (Rajkumar)
      
      o make it possible to provide firmware calibration data via a file (me)
      
      And we have quite a lot of smaller fixes and clean up."
      
      For the iwlwifi bits, Emmanuel says:
      
      "The big new thing here is netdetect which allows the
      firmware to wake up the platform when a specific network
      is detected. Along with that I have fixes for d3 operation.
      The usual amount of rate scaling stuff - we now support STBC.
      The other commit that stands out is Johannes's work on
      devcoredump. He basically starts to use the standard
      infrastructure he built."
      
      Along with that are the usual sort of updates and such for ath9k,
      brcmfmac, wil6210, and a handful of other bits here and there...
      
      Please let me know if there are problems!
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9217266
    • D
      Merge branch 'raw_probe_proto_opt' · e344458f
      David S. Miller 提交于
      Herbert Xu says:
      
      ====================
      ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice
      
      This series rewrites the function raw_probe_proto_opt in a more
      readable fasion, and then fixes the long-standing bug where we
      read the probed bytes twice which means that what we're using to
      probe may in fact be invalid.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e344458f
    • H
      ipv4: Avoid reading user iov twice after raw_probe_proto_opt · c008ba5b
      Herbert Xu 提交于
      Ever since raw_probe_proto_opt was added it had the problem of
      causing the user iov to be read twice, once during the probe for
      the protocol header and once again in ip_append_data.
      
      This is a potential security problem since it means that whatever
      we're probing may be invalid.  This patch plugs the hole by
      firstly advancing the iov so we don't read the same spot again,
      and secondly saving what we read the first time around for use
      by ip_append_data.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c008ba5b
    • H
      ipv4: Use standard iovec primitive in raw_probe_proto_opt · 32b5913a
      Herbert Xu 提交于
      The function raw_probe_proto_opt tries to extract the first two
      bytes from the user input in order to seed the IPsec lookup for
      ICMP packets.  In doing so it's processing iovec by hand and
      overcomplicating things.
      
      This patch replaces the manual iovec processing with a call to
      memcpy_fromiovecend.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32b5913a
    • D
      net: Move bonding headers under include/net · 1ef8019b
      David S. Miller 提交于
      This ways drivers like cxgb4 don't need to do ugly relative includes.
      Reported-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ef8019b
    • J
      cxgb4: Remove unnecessary struct in6_addr * casts · 4483589f
      Joe Perches 提交于
      Just use the address of the in6_addr.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4483589f
    • D
      Merge branch 'cxgb4-next' · c42e2533
      David S. Miller 提交于
      Hariprasad Shenai says:
      
      ====================
      RDMA/cxgb4,cxgb4vf,cxgb4i,csiostor: Cleanup macros
      
      This series moves the debugfs code to a new file debugfs.c and cleans up
      macros/register defines.
      
      Various patches have ended up changing the style of the symbolic macros/register
      defines and some of them used the macros/register defines that matches the
      output of the script from the hardware team.
      
      As a result, the current kernel.org files are a mix of different macro styles.
      Since this macro/register defines is used by five different drivers, a
      few patch series have ended up adding duplicate macro/register define entries
      with different styles. This makes these register define/macro files a complete
      mess and we want to make them clean and consistent.
      
      Will post few more series so that we can cover all the macros so that they all
      follow the same style to be consistent.
      
      The patches series is created against 'net-next' tree.
      And includes patches on cxgb4, cxgb4vf, iw_cxgb4, csiostor and cxgb4i driver.
      
      We have included all the maintainers of respective drivers. Kindly review the
      change and let us know in case of any review comments.
      
      V3: Use suffix instead of prefix for macros/register defines
      V2: Changes the description and cover-letter content to answer David Miller's
      question
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c42e2533
    • H
      cxgb4: Cleanup macros so they follow the same style and look consistent, part 2 · e2ac9628
      Hariprasad Shenai 提交于
      Various patches have ended up changing the style of the symbolic macros/register
      defines to different style.
      
      As a result, the current kernel.org files are a mix of different macro styles.
      Since this macro/register defines is used by different drivers a
      few patch series have ended up adding duplicate macro/register define entries
      with different styles. This makes these register define/macro files a complete
      mess and we want to make them clean and consistent. This patch cleans up a part
      of it.
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2ac9628
    • H
      cxgb4: Cleanup macros so they follow the same style and look consistent · 6559a7e8
      Hariprasad Shenai 提交于
      Various patches have ended up changing the style of the symbolic macros/register
      to different style.
      
      As a result, the current kernel.org files are a mix of different macro styles.
      Since this macro/register defines is used by different drivers a
      few patch series have ended up adding duplicate macro/register define entries
      with different styles. This makes these register define/macro files a complete
      mess and we want to make them clean and consistent. This patch cleans up a part
      of it.
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6559a7e8
    • H
    • E
      mlx4: use napi_complete_done() · 1a288172
      Eric Dumazet 提交于
      To enable gro_flush_timeout, a driver has to use napi_complete_done()
      instead of napi_complete().
      
      Tested:
       Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
      
      Without this feature, we send back about 305,000 ACK per second.
      
      GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
      
      Setting a timer of 2000 nsec is enough to increase GRO packet sizes
      and reduce number of ACK packets. (811/19.2 = 42)
      
      Receiver performs less calls to upper stacks, less wakes up.
      This also reduces cpu usage on the sender, as it receives less ACK
      packets.
      
      Note that reducing number of wakes up increases cpu efficiency, but can
      decrease QPS, as applications wont have the chance to warmup cpu caches
      doing a partial read of RPC requests/answers if they fit in one skb.
      
      B:~# sar -n DEV 1 10 | grep eth0 | tail -1
      Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
      0.00      0.50
      
      B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
      
      B:~# sar -n DEV 1 10 | grep eth0 | tail -1
      Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
      0.00      0.50
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a288172
    • E
      net: gro: add a per device gro flush timer · 3b47d303
      Eric Dumazet 提交于
      Tuning coalescing parameters on NIC can be really hard.
      
      Servers can handle both bulk and RPC like traffic, with conflicting
      goals : bulk flows want as big GRO packets as possible, RPC want minimal
      latencies.
      
      To reach big GRO packets on 10Gbe NIC, one can use :
      
      ethtool -C eth0 rx-usecs 4 rx-frames 44
      
      But this penalizes rpc sessions, with an increase of latencies, up to
      50% in some cases, as NICs generally do not force an interrupt when
      a packet with TCP Push flag is received.
      
      Some NICs do not have an absolute timer, only a timer rearmed for every
      incoming packet.
      
      This patch uses a different strategy : Let GRO stack decides what do do,
      based on traffic pattern.
      
      Packets with Push flag wont be delayed.
      Packets without Push flag might be held in GRO engine, if we keep
      receiving data.
      
      This new mechanism is off by default, and shall be enabled by setting
      /sys/class/net/ethX/gro_flush_timeout to a value in nanosecond.
      
      To fully enable this mechanism, drivers should use napi_complete_done()
      instead of napi_complete().
      
      Tested:
       Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
      
      Without this feature, we send back about 305,000 ACK per second.
      
      GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
      
      Setting a timer of 2000 nsec is enough to increase GRO packet sizes
      and reduce number of ACK packets. (811/19.2 = 42)
      
      Receiver performs less calls to upper stacks, less wakes up.
      This also reduces cpu usage on the sender, as it receives less ACK
      packets.
      
      Note that reducing number of wakes up increases cpu efficiency, but can
      decrease QPS, as applications wont have the chance to warmup cpu caches
      doing a partial read of RPC requests/answers if they fit in one skb.
      
      B:~# sar -n DEV 1 10 | grep eth0 | tail -1
      Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
      0.00      0.50
      
      B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout
      
      B:~# sar -n DEV 1 10 | grep eth0 | tail -1
      Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
      0.00      0.50
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b47d303
    • D
      rtnetlink: add babel protocol recognition · be955b29
      Dave Taht 提交于
      Babel uses rt_proto 42. Add to userspace visible header file.
      Signed-off-by: NDave Taht <dave.taht@bufferbloat.net>
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be955b29
  2. 09 11月, 2014 1 次提交
  3. 08 11月, 2014 10 次提交
  4. 07 11月, 2014 2 次提交
    • D
      4e84b496
    • T
      vxlan: Fix to enable UDP checksums on interface · 5c91ae08
      Tom Herbert 提交于
      Add definition to vxlan nla_policy for UDP checksum. This is necessary
      to enable UDP checksums on VXLAN.
      
      In some instances, enabling UDP checksums can improve performance on
      receive for devices that return legacy checksum-unnecessary for UDP/IP.
      Also, UDP checksum provides some protection against VNI corruption.
      
      Testing:
      
      Ran 200 instances of TCP_STREAM and TCP_RR on bnx2x.
      
      TCP_STREAM
        IPv4, without UDP checksums
            14.41% TX CPU utilization
            25.71% RX CPU utilization
            9083.4 Mbps
        IPv4, with UDP checksums
            13.99% TX CPU utilization
            13.40% RX CPU utilization
            9095.65 Mbps
      
      TCP_RR
        IPv4, without UDP checksums
            94.08% TX CPU utilization
            156/248/462 90/95/99% latencies
            1.12743e+06 tps
        IPv4, with UDP checksums
            94.43% TX CPU utilization
            158/250/462 90/95/99% latencies
            1.13345e+06 tps
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c91ae08