1. 04 10月, 2014 13 次提交
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 579899a9
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2014-10-02
      
      This series contains updates to fm10k, igb, ixgbe and i40e.
      
      Alex provides two updates to the fm10k driver.  First reduces the buffer
      size to 2k for all page sizes, since most frames only have a 1500 MTU
      so supporting a buffer size larger than this is somewhat wasteful.
      Second fixes an issue where the number of transmit queues was not being
      updated, so added the lines necessary to update the number of transmit
      queues.
      
      Rick Jones provides two patches to convert ixgbe, igb and i40e to use
      dev_consume_skb_any().
      
      Emil provides two patches for ixgbe, first cleans up a couple of wait
      loops on auto-negotiation that were not needed.  Second fixes an issue
      reported by Fujitsu/Red Hat, which consolidates the logic behind the
      dynamically setting of TXDCTL.WTHRESH depending on interrupt throttle
      rate (ITR) setting regardless of BQL.
      
      Ethan Zhao provides a cleanup patch for ixgbe where he noticed a
      duplicate define.
      
      Bernhard Kaindl provides a patch for igb to remove a source of latency
      spikes by not calling code that uses mdelay() for feeding a PHY stat
      while being called with a spinlock held.
      
      Todd bumps the igb version based on the recent changes.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      579899a9
    • D
      Merge branch 'mlx5-next' · 48fea861
      David S. Miller 提交于
      Eli Cohen says:
      
      ====================
      mlx5 update for 3.18
      
      This series integrates a new mechanism for populating and extracting field values
      used in the driver/firmware interaction around command mailboxes.
      
      Changes from V1:
       - Remove unused definition of memcpy_cpu_to_be32()
       - Remove definitions of non_existent_*() and use BUILD_BUG_ON() instead.
       - Added a patch one line patch to add support for ConnectX-4 devices.
      
      Changes from V0:
       - trimmed the auto-generated file to a minimum, as required by the reviewers.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48fea861
    • E
      net/mlx5_core: Add ConnectX-4 to list of supported devices · f832dc82
      Eli Cohen 提交于
      Add the upcoming ConnectX-4 device to the list of supported devices by then
      mlx5 driver.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f832dc82
    • E
      net/mlx5_core: Identify resources by their type · 5903325a
      Eli Cohen 提交于
      This patch puts a common part as the first field of mlx5_core_qp. This field is
      used to identify which resource generated an event. This is required since upcoming
      new resource types such as DC targets are allocated for the same numerical space
      as regular QPs and may generate the same events. By searching the resource in the
      same table we can then look at the common field to identify the resource.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5903325a
    • E
      net/mlx5_core: use set/get macros in device caps · b775516b
      Eli Cohen 提交于
      Transform device capabilities related commands to use set/get macros to
      manipulate command mailboxes.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b775516b
    • E
      net/mlx5_core: Use hardware registers description header file · d29b796a
      Eli Cohen 提交于
      Add an auto generated header file that describes hardware registers along with
      set of macros that set/get values. The macros do static checks to avoid
      overflow, handle endianess, and overall provide a clean way to code commands.
      Currently the header file is small and we will add structs as we make use of
      the macros.
      A few commands were removed from the commands enum since they are not supported
      currently and will be added when support is available.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d29b796a
    • E
      net/mlx5_core: Update device capabilities handling · c7a08ac7
      Eli Cohen 提交于
      Rearrange struct mlx5_caps so it has a "gen" field to represent the current
      capabilities configured for the device. Max capabilities can also be queried
      from the device. Also update capabilities struct to contain more fields as per
      the latest revision if firmware specification.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7a08ac7
    • E
      qdisc: validate skb without holding lock · 55a93b3e
      Eric Dumazet 提交于
      Validation of skb can be pretty expensive :
      
      GSO segmentation and/or checksum computations.
      
      We can do this without holding qdisc lock, so that other cpus
      can queue additional packets.
      
      Trick is that requeued packets were already validated, so we carry
      a boolean so that sch_direct_xmit() can validate a fresh skb list,
      or directly use an old one.
      
      Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
      host.
      
      Turning TSO on or off had no effect on throughput, only few more cpu
      cycles. Lock contention on qdisc lock disappeared.
      
      Same if disabling TX checksum offload.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55a93b3e
    • T
      net: ethernet: Remove superfluous ether_setup after alloc_etherdev · 6a05880a
      Tobias Klauser 提交于
      There is no need to call ether_setup after alloc_ethdev since it was
      already called there.
      
      Follow commits c706471b ("net: axienet: remove unnecessary
      ether_setup after alloc_etherdev") and 3c87dcbf ("net: ll_temac:
      Remove unnecessary ether_setup after alloc_etherdev") and fix the
      pattern in all remaining ethernet drivers.
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a05880a
    • D
      Merge branch 'qdisc_bulk_dequeue' · c2bf5ec2
      David S. Miller 提交于
      Jesper Dangaard Brouer says:
      
      ====================
      qdisc: bulk dequeue support
      
      This patchset uses DaveM's recent API changes to dev_hard_start_xmit(),
      from the qdisc layer, to implement dequeue bulking.
      
      Patch01: "qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE"
       - Implement basic qdisc dequeue bulking
       - This time, 100% relying on BQL limits, no magic safe-guard constants
      
      Patch02: "qdisc: dequeue bulking also pickup GSO/TSO packets"
       - Extend bulking to bulk several GSO/TSO packets
       - Seperate patch, as it introduce a small regression, see test section.
      
      We do have a patch03, which exports a userspace tunable as a BQL
      tunable, that can byte-cap or disable the bulking/bursting.  But we
      could not agree on it internally, thus not sending it now.  We
      basically strive to avoid adding any new userspace tunable.
      
      Testing patch01:
      ================
       Demonstrating the performance improvement of qdisc dequeue bulking, is
      tricky because the effect only "kicks-in" once the qdisc system have a
      backlog. Thus, for a backlog to form, we need either 1) to exceed wirespeed
      of the link or 2) exceed the capability of the device driver.
      
      For practical use-cases, the measureable effect of this will be a
      reduction in CPU usage
      
      01-TCP_STREAM:
      --------------
      Testing effect for TCP involves disabling TSO and GSO, because TCP
      already benefit from bulking, via TSO and especially for GSO segmented
      packets.  This patch view TSO/GSO as a seperate kind of bulking, and
      avoid further bulking of these packet types.
      
      The measured perf diff benefit (at 10Gbit/s) for a single netperf
      TCP_STREAM were 9.24% less CPU used on calls to _raw_spin_lock()
      (mostly from sch_direct_xmit).
      
      If my E5-2695v2(ES) CPU is tuned according to:
       http://netoptimizer.blogspot.dk/2014/04/basic-tuning-for-network-overload.html
      Then it is possible that a single netperf TCP_STREAM, with GSO and TSO
      disabled, can utilize all bandwidth on a 10Gbit/s link.  This will
      then cause a standing backlog queue at the qdisc layer.
      
      Trying to pressure the system some more CPU util wise, I'm starting
      24x TCP_STREAMs and monitoring the overall CPU utilization.  This
      confirms bulking saves CPU cycles when it "kicks-in".
      
      Tool mpstat, while stressing the system with netperf 24x TCP_STREAM, shows:
       * Disabled bulking: sys:2.58%  soft:8.50%  idle:88.78%
       * Enabled  bulking: sys:2.43%  soft:7.66%  idle:89.79%
      
      02-UDP_STREAM
      -------------
      The measured perf diff benefit for UDP_STREAM were 6.41% less CPU used
      on calls to _raw_spin_lock().  24x UDP_STREAM with packet size -m 1472 (to
      avoid sending UDP/IP fragments).
      
      03-trafgen driver test
      ----------------------
      The performance of the 10Gbit/s ixgbe driver is limited due to
      updating the HW ring-queue tail-pointer on every packet.  As
      previously demonstrated with pktgen.
      
      Using trafgen to send RAW frames from userspace (via AF_PACKET), and
      forcing it through qdisc path (with option --qdisc-path and -t0),
      sending with 12 CPUs.
      
      I can demonstrate this driver layer limitation:
       * 12.8 Mpps with no qdisc bulking
       * 14.8 Mpps with qdisc bulking (full 10G-wirespeed)
      
      Testing patch02:
      ================
      Testing Bulking several GSO/TSO packets:
      
      Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
      netperf-wrapper. Bulking several TSO show no performance regressions
      (requeues were in the area 32 requeues/sec for 10G while transmitting
      approx 813Kpps).
      
      Bulking several GSOs does show small regression or very small
      improvement (requeues were in the area 8000 requeues/sec, for 10G
      while transmitting approx 813Kpps).
      
       Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
      latency. Base-case, which is "normal" GSO bulking, sees varying
      high-prio queue delay between 0.38ms to 0.47ms.  Bulking several GSOs
      together, result in a stable high-prio queue delay of 0.50ms.
      
      Corrosponding to:
       (10000*10^6)*((0.50-0.47)/10^3)/8 = 37500 bytes
       (10000*10^6)*((0.50-0.38)/10^3)/8 = 150000 bytes
       37500/1500  = 25 pkts
       150000/1500 = 100 pkts
      
       Using igb at 100Mbit/s with GSO bulking, shows an improvement.
      Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms
      diff of 0.12ms corrosponding to 1500 bytes at 100Mbit/s. Bulking
      several GSOs together, result in a stable high-prio queue delay of
      2.23ms.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2bf5ec2
    • J
      qdisc: dequeue bulking also pickup GSO/TSO packets · 808e7ac0
      Jesper Dangaard Brouer 提交于
      The TSO and GSO segmented packets already benefit from bulking
      on their own.
      
      The TSO packets have always taken advantage of the only updating
      the tailptr once for a large packet.
      
      The GSO segmented packets have recently taken advantage of
      bulking xmit_more API, via merge commit 53fda7f7 ("Merge
      branch 'xmit_list'"), specifically via commit 7f2e870f ("net:
      Move main gso loop out of dev_hard_start_xmit() into helper.")
      allowing qdisc requeue of remaining list.  And via commit
      ce93718f ("net: Don't keep around original SKB when we
      software segment GSO frames.").
      
      This patch allow further bulking of TSO/GSO packets together,
      when dequeueing from the qdisc.
      
      Testing:
       Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
      netperf-wrapper. Bulking several TSO show no performance regressions
      (requeues were in the area 32 requeues/sec).
      
      Bulking several GSOs does show small regression or very small
      improvement (requeues were in the area 8000 requeues/sec).
      
       Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
      latency. Base-case, which is "normal" GSO bulking, sees varying
      high-prio queue delay between 0.38ms to 0.47ms.  Bulking several GSOs
      together, result in a stable high-prio queue delay of 0.50ms.
      
       Using igb at 100Mbit/s with GSO bulking, shows an improvement.
      Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      808e7ac0
    • J
      qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE · 5772e9a3
      Jesper Dangaard Brouer 提交于
      Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
      sending/processing an entire skb list.
      
      This patch implements qdisc bulk dequeue, by allowing multiple packets
      to be dequeued in dequeue_skb().
      
      The optimization principle for this is two fold, (1) to amortize
      locking cost and (2) avoid expensive tailptr update for notifying HW.
       (1) Several packets are dequeued while holding the qdisc root_lock,
      amortizing locking cost over several packet.  The dequeued SKB list is
      processed under the TXQ lock in dev_hard_start_xmit(), thus also
      amortizing the cost of the TXQ lock.
       (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
      API to delay HW tailptr update, which also reduces the cost per
      packet.
      
      One restriction of the new API is that every SKB must belong to the
      same TXQ.  This patch takes the easy way out, by restricting bulk
      dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
      qdisc only have attached a single TXQ.
      
      Some detail about the flow; dev_hard_start_xmit() will process the skb
      list, and transmit packets individually towards the driver (see
      xmit_one()).  In case the driver stops midway in the list, the
      remaining skb list is returned by dev_hard_start_xmit().  In
      sch_direct_xmit() this returned list is requeued by dev_requeue_skb().
      
      To avoid overshooting the HW limits, which results in requeuing, the
      patch limits the amount of bytes dequeued, based on the drivers BQL
      limits.  In-effect bulking will only happen for BQL enabled drivers.
      
      Small amounts for extra HoL blocking (2x MTU/0.24ms) were
      measured at 100Mbit/s, with bulking 8 packets, but the
      oscillating nature of the measurement indicate something, like
      sched latency might be causing this effect. More comparisons
      show, that this oscillation goes away occationally. Thus, we
      disregard this artifact completely and remove any "magic" bulking
      limit.
      
      For now, as a conservative approach, stop bulking when seeing TSO and
      segmented GSO packets.  They already benefit from bulking on their own.
      A followup patch add this, to allow easier bisect-ability for finding
      regressions.
      
      Jointed work with Hannes, Daniel and Florian.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5772e9a3
    • M
      et131x: Add PCIe gigabit ethernet driver et131x to drivers/net · 38df6492
      Mark Einon 提交于
      This adds the ethernet driver for Agere et131x devices to
      drivers/net/ethernet.
      
      The driver being added has been in the staging tree for some time, and will be
      removed from there in a seperate patch. This one merely disables the staging
      version to prevent two instances being built.
      Signed-off-by: NMark Einon <mark.einon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38df6492
  2. 03 10月, 2014 1 次提交
  3. 02 10月, 2014 26 次提交
    • T
      igb: bump version to 5.2.15 · b5d130c4
      Todd Fujinaka 提交于
      Bump version
      Signed-off-by: NTodd Fujinaka <todd.fujinaka@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b5d130c4
    • R
      i40e/igb: Convert to dev_consume_skb_any() · a81fb049
      Rick Jones 提交于
      Convert two more Intel NIC drivers to dev_consume_skb_any() to help
      make dropped packet profiling sane.
      Signed-off-by: NRick Jones <rick.jones2@hp.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Tested-by: NJim Young <jamesx.m.young@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      a81fb049
    • B
      igb: remove blocking phy read from inside spinlock · 7acf6318
      Bernhard Kaindl 提交于
      Remove a source of latency spikes (in my case up to 10ms) by not calling
      code that uses mdelay() for feeding a phy statistic (rx errors for idle
      symbols - not data -> idle_errors) while being called with a spinlock held.
      
      As idle_errors isn't read, this patch only removes unused code and data.
      
      Later, more complicated changes may be applied to address the spinlock and
      allow for some PHY diagnostics by harvesting this PHY stats register fully.
      
      This patch is designed to fix the issue and be safe for longterm/stable.
      
      For the Intel e1000e driver, the same change was applied in 2008 with
      commit 23033fad ("e1000e: remove phy read from inside spinlock").
      
      The mdelay is triggered by HW/SW semaphores, thus it depends on the HW.
      
      I've HW that triggers it even when idle. Others may trigger it only e.g.
      when Ethernet ports aquire or loose the link or on ifconfig up / down.
      We've noticed this first from delays in frame rx/tx due to the mdelay().
      
      Example command for checking if the issue is triggered: cyclictest -Smp1
      (Look for occasional "Max:" values > 4000 or use -b 4000 to stop if greater)
      
      It was observed with I350 ports connected to other I350 ports, but not
      if driver and EEPROM was modified to run the I350 in EEPROM-less mode.
      
      phy_stats.idle_errors and .receive_errors (isn't touched) occupy 64 not
      used bits in the adapter struct: Their allocation may be removed as well.
      
      Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
      Cc: Todd Fujinaka <todd.fujinaka@intel.com>
      Fixes: 12dcd86b ("igb: fix stats handling") (this added the spin_lock)
      Signed-off-by: NBernhard Kaindl <bk-linux@use.startmail.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      7acf6318
    • E
      ixgbe: delete one duplicate marcro definition of IXGBE_MAX_L2A_QUEUES · 3463de10
      Ethan Zhao 提交于
      There is typo in ixgbe.h, two marcro definition of IXGBE_MAX_L2A_QUEUES to 4,
      delete one, clear the compiler warning.
      Signed-off-by: NEthan Zhao <ethan.zhao@oracle.com>
      Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      3463de10
    • E
      ixgbe: fix setting of TXDCTL.WTRHESH when ITR is set to 0 and no BQL · ffefa9f6
      Emil Tantilov 提交于
      This patch consolidates the logic behind dynamically setting TXDCTL.WTHRESH
      depending on interrupt throttle rate (ITR) setting regardless of BQL.
      
      Previously TXDCTL.WTHRESH was dynamically being set only with BQL being
      enabled, but we have to set it regardless of BQL when ITR is low to avoid
      Tx stalls/hangs.
      
      CC: John Greene <jogreene@redhat.com>
      Reported by: Masayuki Gouji <gouji.masayuki@jp.fujitsu.com>
      Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      ffefa9f6
    • E
      ixgbe: remove wait loop on autoneg for copper devices · 340c5203
      Emil Tantilov 提交于
      This patch removes couple of wait loops on autoneg that are not needed.
      
      During validation we noticed that the loops always time out, so there
      should be no user impact.
      Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      340c5203
    • R
      ixgbe: Convert the normal transmit complete path to dev_consume_skb_any() · fe1f2a97
      Rick Jones 提交于
      Convert the normal packet completion path to dev_consume_skb_any() so
      packet drop profiling via dropwatch or perf top -G -e skb_kfree_skb
      is not cluttered with false hits.
      
      Compile tested only.  There is a dev_kfree_skb_any() in the routine
      ixgbe_ptp_tx_hwtstamp() in ixgbe_ptp.c that looks like a conversion
      candidate but I wasn't familiar enough with the code to pull the
      trigger.
      Signed-off-by: NRick Jones <rick.jones2@hp.com>
      Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fe1f2a97
    • A
      fm10k: Correctly set the number of Tx queues · c9d49940
      Alexander Duyck 提交于
      The number of Tx queues was not being updated due to some issues when
      generating the patches.  This change makes sure to add the lines necessary
      to update the number of Tx queues correctly.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c9d49940
    • A
      fm10k: Reduce buffer size when pages are larger than 4K · fd333962
      Alexander Duyck 提交于
      This change reduces the buffer size to 2K for all page sizes.  The basic
      idea is that since most frames only have a 1500 MTU supporting a buffer
      size larger than this is somewhat wasteful.  As such I have reduced the
      size to 2K for all page sizes which will allow for more uses per page.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fd333962
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 50dddff3
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Don't halt the firmware in r8152 driver, from Hayes Wang.
      
       2) Handle full sized 802.1ad frames in bnx2 and tg3 drivers properly,
          from Vlad Yasevich.
      
       3) Don't sleep while holding tx_clean_lock in netxen driver, fix from
          Manish Chopra.
      
       4) Certain kinds of ipv6 routes can end up endlessly failing the route
          validation test, causing it to be re-looked up over and over again.
          This particularly kills input route caching in TCP sockets.  Fix
          from Hannes Frederic Sowa.
      
       5) netvsc_start_xmit() has a use-after-free access to skb->len, fix
          from K Y Srinivasan.
      
       6) Fix matching of inverted containers in ematch module, from Ignacy
          Gawędzki.
      
       7) Aggregation of GRO frames via SKB ->frag_list for linear skbs isn't
          handled properly, regression fix from Eric Dumazet.
      
       8) Don't test return value of ipv4_neigh_lookup(), which returns an
          error pointer, against NULL.  From WANG Cong.
      
       9) Fix an old regression where we mistakenly allow a double add of the
          same tunnel.  Fixes from Steffen Klassert.
      
      10) macvtap device delete and open can run in parallel and corrupt lists
          etc., fix from Vlad Yasevich.
      
      11) Fix build error with IPV6=m NETFILTER_XT_TARGET_TPROXY=y, from Pablo
          Neira Ayuso.
      
      12) rhashtable_destroy() triggers lockdep splats, fix also from Pablo.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (32 commits)
        bna: Update Maintainer Email
        r8152: disable power cut for RTL8153
        r8152: remove clearing bp
        bnx2: Correctly receive full sized 802.1ad fragmes
        tg3: Allow for recieve of full-size 8021AD frames
        r8152: fix setting RTL8152_UNPLUG
        netxen: Fix bug in Tx completion path.
        netxen: Fix BUG "sleeping function called from invalid context"
        ipv6: remove rt6i_genid
        hyperv: Fix a bug in netvsc_start_xmit()
        net: stmmac: fix stmmac_pci_probe failed when CONFIG_HAVE_CLK is selected
        ematch: Fix matching of inverted containers.
        gro: fix aggregation for skb using frag_list
        neigh: check error pointer instead of NULL for ipv4_neigh_lookup()
        ip6_gre: Return an error when adding an existing tunnel.
        ip6_vti: Return an error when adding an existing tunnel.
        ip6_tunnel: Return an error when adding an existing tunnel.
        ip6gre: add a rtnl link alias for ip6gretap
        net/mlx4_core: Allow not to specify probe_vf in SRIOV IB mode
        r8152: fix the carrier off when autoresuming
        ...
      50dddff3
    • R
      bna: Update Maintainer Email · 439e9575
      Rasesh Mody 提交于
      Update the maintainer email for BNA driver.
      Signed-off-by: NRasesh Mody <rasesh.mody@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      439e9575
    • P
      d068b02c
    • P
      net: bcmgenet: fix bcmgenet_put_tx_csum() · bc23333b
      Petri Gynther 提交于
      bcmgenet_put_tx_csum() needs to return skb pointer back to the caller
      because it reallocates a new one in case of lack of skb headroom.
      Signed-off-by: NPetri Gynther <pgynther@google.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc23333b
    • A
      net: pktgen: packet bursting via skb->xmit_more · 38b2cf29
      Alexei Starovoitov 提交于
      This patch demonstrates the effect of delaying update of HW tailptr.
      (based on earlier patch by Jesper)
      
      burst=1 is the default. It sends one packet with xmit_more=false
      burst=2 sends one packet with xmit_more=true and
              2nd copy of the same packet with xmit_more=false
      burst=3 sends two copies of the same packet with xmit_more=true and
              3rd copy with xmit_more=false
      
      Performance with ixgbe (usec 30):
      burst=1  tx:9.2 Mpps
      burst=2  tx:13.5 Mpps
      burst=3  tx:14.5 Mpps full 10G line rate
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38b2cf29
    • F
      net: bridge: add a br_set_state helper function · 775dd692
      Florian Fainelli 提交于
      In preparation for being able to propagate port states to e.g: notifiers
      or other kernel parts, do not manipulate the port state directly, but
      instead use a helper function which will allow us to do a bit more than
      just setting the state.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      775dd692
    • W
      net_sched: avoid calling tcf_unbind_filter() in call_rcu callback · a0efb80c
      WANG Cong 提交于
      This fixes the following crash:
      
      [   63.976822] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [   63.980094] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 3.17.0-rc6+ #648
      [   63.980094] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [   63.980094] task: ffff880117dea690 ti: ffff880117dfc000 task.ti: ffff880117dfc000
      [   63.980094] RIP: 0010:[<ffffffff817e6d07>]  [<ffffffff817e6d07>] u32_destroy_key+0x27/0x6d
      [   63.980094] RSP: 0018:ffff880117dffcc0  EFLAGS: 00010202
      [   63.980094] RAX: ffff880117dea690 RBX: ffff8800d02e0820 RCX: 0000000000000000
      [   63.980094] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 6b6b6b6b6b6b6b6b
      [   63.980094] RBP: ffff880117dffcd0 R08: 0000000000000000 R09: 0000000000000000
      [   63.980094] R10: 00006c0900006ba8 R11: 00006ba100006b9d R12: 0000000000000001
      [   63.980094] R13: ffff8800d02e0898 R14: ffffffff817e6d4d R15: ffff880117387a30
      [   63.980094] FS:  0000000000000000(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
      [   63.980094] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   63.980094] CR2: 00007f07e6732fed CR3: 000000011665b000 CR4: 00000000000006e0
      [   63.980094] Stack:
      [   63.980094]  ffff88011a9cd300 ffffffff82051ac0 ffff880117dffce0 ffffffff817e6d68
      [   63.980094]  ffff880117dffd70 ffffffff810cb4c7 ffffffff810cb3cd ffff880117dfffd8
      [   63.980094]  ffff880117dea690 ffff880117dea690 ffff880117dfffd8 000000000000000a
      [   63.980094] Call Trace:
      [   63.980094]  [<ffffffff817e6d68>] u32_delete_key_freepf_rcu+0x1b/0x1d
      [   63.980094]  [<ffffffff810cb4c7>] rcu_process_callbacks+0x3bb/0x691
      [   63.980094]  [<ffffffff810cb3cd>] ? rcu_process_callbacks+0x2c1/0x691
      [   63.980094]  [<ffffffff817e6d4d>] ? u32_destroy_key+0x6d/0x6d
      [   63.980094]  [<ffffffff810780a4>] __do_softirq+0x142/0x323
      [   63.980094]  [<ffffffff810782a8>] run_ksoftirqd+0x23/0x53
      [   63.980094]  [<ffffffff81092126>] smpboot_thread_fn+0x203/0x221
      [   63.980094]  [<ffffffff81091f23>] ? smpboot_unpark_thread+0x33/0x33
      [   63.980094]  [<ffffffff8108e44d>] kthread+0xc9/0xd1
      [   63.980094]  [<ffffffff819e00ea>] ? do_wait_for_common+0xf8/0x125
      [   63.980094]  [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61
      [   63.980094]  [<ffffffff819e43ec>] ret_from_fork+0x7c/0xb0
      [   63.980094]  [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61
      
      tp could be freed in call_rcu callback too, the order is not guaranteed.
      
      John Fastabend says:
      
      ====================
      Its worth noting why this is safe. Any running schedulers will either
      read the valid class field or it will be zeroed.
      
      All schedulers today when the class is 0 do a lookup using the
      same call used by the tcf_exts_bind(). So even if we have a running
      classifier hit the null class pointer it will do a lookup and get
      to the same result. This is particularly fragile at the moment because
      the only way to verify this is to audit the schedulers call sites.
      ====================
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0efb80c
    • W
      net_sched: fix another crash in cls_tcindex · 6e056569
      WANG Cong 提交于
      This patch fixes the following crash:
      
      [  166.670795] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [  166.674230] IP: [<ffffffff814b739f>] __list_del_entry+0x5c/0x98
      [  166.674230] PGD d0ea5067 PUD ce7fc067 PMD 0
      [  166.674230] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [  166.674230] CPU: 1 PID: 775 Comm: tc Not tainted 3.17.0-rc6+ #642
      [  166.674230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  166.674230] task: ffff8800d03c4d20 ti: ffff8800cae7c000 task.ti: ffff8800cae7c000
      [  166.674230] RIP: 0010:[<ffffffff814b739f>]  [<ffffffff814b739f>] __list_del_entry+0x5c/0x98
      [  166.674230] RSP: 0018:ffff8800cae7f7d0  EFLAGS: 00010207
      [  166.674230] RAX: 0000000000000000 RBX: ffff8800cba8d700 RCX: ffff8800cba8d700
      [  166.674230] RDX: 0000000000000000 RSI: dead000000200200 RDI: ffff8800cba8d700
      [  166.674230] RBP: ffff8800cae7f7d0 R08: 0000000000000001 R09: 0000000000000001
      [  166.674230] R10: 0000000000000000 R11: 000000000000859a R12: ffffffffffffffe8
      [  166.674230] R13: ffff8800cba8c5b8 R14: 0000000000000001 R15: ffff8800cba8d700
      [  166.674230] FS:  00007fdb5f04a740(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
      [  166.674230] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  166.674230] CR2: 0000000000000000 CR3: 00000000cf929000 CR4: 00000000000006e0
      [  166.674230] Stack:
      [  166.674230]  ffff8800cae7f7e8 ffffffff814b73e8 ffff8800cba8d6e8 ffff8800cae7f828
      [  166.674230]  ffffffff817caeec 0000000000000046 ffff8800cba8c5b0 ffff8800cba8c5b8
      [  166.674230]  0000000000000000 0000000000000001 ffff8800cf8e33e8 ffff8800cae7f848
      [  166.674230] Call Trace:
      [  166.674230]  [<ffffffff814b73e8>] list_del+0xd/0x2b
      [  166.674230]  [<ffffffff817caeec>] tcf_action_destroy+0x4c/0x71
      [  166.674230]  [<ffffffff817ca0ce>] tcf_exts_destroy+0x20/0x2d
      [  166.674230]  [<ffffffff817ec2b5>] tcindex_delete+0x196/0x1b7
      
      struct list_head can not be simply copied and we should always init it.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e056569
    • D
      Merge branch 'udp_gso' · 25e379c4
      David S. Miller 提交于
      Tom Herbert says:
      
      ====================
      udp: Generalize GSO for UDP tunnels
      
      This patch set generalizes the UDP tunnel segmentation functions so
      that they can work with various protocol encapsulations. The primary
      change is to set the inner_protocol field in the skbuff when creating
      the encapsulated packet, and then in skb_udp_tunnel_segment this data
      is used to determine the function for segmenting the encapsulated
      packet. The inner_protocol field is overloaded to take either an
      Ethertype or IP protocol.
      
      The inner_protocol is set on transmit using skb_set_inner_ipproto or
      skb_set_inner_protocol functions. VXLAN and IP tunnels (for fou GSO)
      were modified to call these.
      
      Notes:
        - GSO for GRE/UDP where GRE checksum is enabled does not work.
          Handling this will require some special case code.
        - Software GSO now supports many varieties of encapsulation with
          SKB_GSO_UDP_TUNNEL{_CSUM}. We still need a mechanism to query
          for device support of particular combinations (I intend to
          add ndo_gso_check for that).
        - MPLS seems to be the only previous user of inner_protocol. I don't
          believe these patches can affect that. For supporting GSO with
          MPLS over UDP, the inner_protocol should be set using the
          helper functions in this patch.
        - GSO for L2TP/UDP should also be straightforward now.
      
      v2:
        - Respin for Eric's restructuring of skbuff.
      
      Tested GRE, IPIP, and SIT over fou as well as VLXAN. This was
      done using 200 TCP_STREAMs in netperf.
      
       GRE
          IPv4, FOU, UDP checksum enabled
            TCP_STREAM TSO enabled on tun interface
              14.04% TX CPU utilization
              13.17% RX CPU utilization
              9211 Mbps
            TCP_STREAM TSO disabled on tun interface
              27.82% TX CPU utilization
              25.41% RX CPU utilization
              9336 Mbps
          IPv4, FOU, UDP checksum disabled
            TCP_STREAM TSO enabled on tun interface
              13.14% TX CPU utilization
              23.18% RX CPU utilization
              9277 Mbps
            TCP_STREAM TSO disabled on tun interface
              30.00% TX CPU utilization
              31.28% RX CPU utilization
              9327 Mbps
      
        IPIP
          FOU, UDP checksum enabled
            TCP_STREAM TSO enabled on tun interface
              15.28% TX CPU utilization
              13.92% RX CPU utilization
              9342 Mbps
            TCP_STREAM TSO disabled on tun interface
              27.82% TX CPU utilization
              25.41% RX CPU utilization
              9336 Mbps
          FOU, UDP checksum disabled
            TCP_STREAM TSO enabled on tun interface
              15.08% TX CPU utilization
              24.64% RX CPU utilization
              9226 Mbps
            TCP_STREAM TSO disabled on tun interface
              30.00% TX CPU utilization
              31.28% RX CPU utilization
              9327 Mbps
      
        SIT
          FOU, UDP checksum enabled
            TCP_STREAM TSO enabled on tun interface
              14.47% TX CPU utilization
              14.58% RX CPU utilization
              9106 Mbps
            TCP_STREAM TSO disabled on tun interface
              31.82% TX CPU utilization
              30.82% RX CPU utilization
              9204 Mbps
          FOU, UDP checksum disabled
            TCP_STREAM TSO enabled on tun interface
              15.70% TX CPU utilization
              27.93% RX CPU utilization
              9097 Mbps
            TCP_STREAM TSO disabled on tun interface
              33.48% TX CPU utilization
              37.36% RX CPU utilization
              9197 Mbps
      
         VXLAN
            TCP_STREAM TSO enabled on tun interface
              16.42% TX CPU utilization
              23.66% RX CPU utilization
              9081 Mbps
            TCP_STREAM TSO disabled on tun interface
              30.32% TX CPU utilization
              30.55% RX CPU utilization
              9185 Mbps
      
         Baseline (no encp, TSO and LRO enabled)
            TCP_STREAM
              11.85% TX CPU utilization
              15.13% RX CPU utilization
              9452 Mbps
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25e379c4
    • T
      vxlan: Set inner protocol before transmit · 996c9fd1
      Tom Herbert 提交于
      Call skb_set_inner_protocol to set inner Ethernet protocol to
      ETH_P_TEB before transmit. This is needed for GSO with UDP tunnels.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      996c9fd1
    • T
      gre: Set inner protocol in v4 and v6 GRE transmit · 54bc9bac
      Tom Herbert 提交于
      Call skb_set_inner_protocol to set inner Ethernet protocol to
      protocol being encapsulation by GRE before tunnel_xmit. This is
      needed for GSO if UDP encapsulation (fou) is being done.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54bc9bac
    • T
      ipip: Set inner IP protocol in ipip · 077c5a09
      Tom Herbert 提交于
      Call skb_set_inner_ipproto to set inner IP protocol to IPPROTO_IPV4
      before tunnel_xmit. This is needed if UDP encapsulation (fou) is
      being done.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      077c5a09
    • T
      sit: Set inner IP protocol in sit · 469471cd
      Tom Herbert 提交于
      Call skb_set_inner_ipproto to set inner IP protocol to IPPROTO_IPV6
      before tunnel_xmit. This is needed if UDP encapsulation (fou) is
      being done.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      469471cd
    • T
      udp: Generalize skb_udp_segment · 8bce6d7d
      Tom Herbert 提交于
      skb_udp_segment is the function called from udp4_ufo_fragment to
      segment a UDP tunnel packet. This function currently assumes
      segmentation is transparent Ethernet bridging (i.e. VXLAN
      encapsulation). This patch generalizes the function to
      operate on either Ethertype or IP protocol.
      
      The inner_protocol field must be set to the protocol of the inner
      header. This can now be either an Ethertype or an IP protocol
      (in a union). A new flag in the skbuff indicates which type is
      effective. skb_set_inner_protocol and skb_set_inner_ipproto
      helper functions were added to set the inner_protocol. These
      functions are called from the point where the tunnel encapsulation
      is occuring.
      
      When skb_udp_tunnel_segment is called, the function to segment the
      inner packet is selected based on the inner IP or Ethertype. In the
      case of an IP protocol encapsulation, the function is derived from
      inet[6]_offloads. In the case of Ethertype, skb->protocol is
      set to the inner_protocol and skb_mac_gso_segment is called. (GRE
      currently does this, but it might be possible to lookup the protocol
      in offload_base and call the appropriate segmenation function
      directly).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8bce6d7d
    • D
      Merge branch 'bpf-next' · f44d61cd
      David S. Miller 提交于
      Alexei Starovoitov says:
      
      ====================
      bpf: add search pruning optimization and tests
      
      patch #1 commit log explains why eBPF verifier has to examine some
      instructions multiple times and describes the search pruning optimization
      that improves verification speed for branchy programs and allows more
      complex programs to be verified successfully.
      This patch completes the core verifier logic.
      
      patch #2 adds more verifier tests related to branches and search pruning
      
      I'm still working on Andy's 'bitmask for stack slots' suggestion. It will be
      done on top of this patch.
      
      The current verifier algorithm is brute force depth first search with
      state pruning. If anyone can come up with another algorithm that demonstrates
      better results, we'll replace the algorithm without affecting user space.
      
      Note verifier doesn't guarantee that all possible valid programs are accepted.
      Overly complex programs may still be rejected.
      Verifier improvements/optimizations will guarantee that if a program
      was passing verification in the past, it will still be passing.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f44d61cd
    • A
      bpf: add tests to verifier testsuite · fd10c2ef
      Alexei Starovoitov 提交于
      add 4 extra tests to cover jump verification better
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd10c2ef
    • A
      bpf: add search pruning optimization to verifier · f1bca824
      Alexei Starovoitov 提交于
      consider C program represented in eBPF:
      int filter(int arg)
      {
          int a, b, c, *ptr;
      
          if (arg == 1)
              ptr = &a;
          else if (arg == 2)
              ptr = &b;
          else
              ptr = &c;
      
          *ptr = 0;
          return 0;
      }
      eBPF verifier has to follow all possible paths through the program
      to recognize that '*ptr = 0' instruction would be safe to execute
      in all situations.
      It's doing it by picking a path towards the end and observes changes
      to registers and stack at every insn until it reaches bpf_exit.
      Then it comes back to one of the previous branches and goes towards
      the end again with potentially different values in registers.
      When program has a lot of branches, the number of possible combinations
      of branches is huge, so verifer has a hard limit of walking no more
      than 32k instructions. This limit can be reached and complex (but valid)
      programs could be rejected. Therefore it's important to recognize equivalent
      verifier states to prune this depth first search.
      
      Basic idea can be illustrated by the program (where .. are some eBPF insns):
          1: ..
          2: if (rX == rY) goto 4
          3: ..
          4: ..
          5: ..
          6: bpf_exit
      In the first pass towards bpf_exit the verifier will walk insns: 1, 2, 3, 4, 5, 6
      Since insn#2 is a branch the verifier will remember its state in verifier stack
      to come back to it later.
      Since insn#4 is marked as 'branch target', the verifier will remember its state
      in explored_states[4] linked list.
      Once it reaches insn#6 successfully it will pop the state recorded at insn#2 and
      will continue.
      Without search pruning optimization verifier would have to walk 4, 5, 6 again,
      effectively simulating execution of insns 1, 2, 4, 5, 6
      With search pruning it will check whether state at #4 after jumping from #2
      is equivalent to one recorded in explored_states[4] during first pass.
      If there is an equivalent state, verifier can prune the search at #4 and declare
      this path to be safe as well.
      In other words two states at #4 are equivalent if execution of 1, 2, 3, 4 insns
      and 1, 2, 4 insns produces equivalent registers and stack.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1bca824