1. 25 1月, 2018 1 次提交
    • C
      igb: Allow to remove administratively set MAC on VFs · 177132df
      Corinna Vinschen 提交于
      Before libvirt modifies the MAC address and vlan tag for an SRIOV VF
      for use by a virtual machine (either using vfio device assignment or
      macvtap passthru mode), it saves the current MAC address and vlan tag
      so that it can reset them to their original value when the guest is
      done.  Libvirt can't leave the VF MAC set to the value used by the
      now-defunct guest since it may be started again later using a
      different VF, but it certainly shouldn't just pick any random value,
      either. So it saves the state of everything prior to using the VF, and
      resets it to that.
      
      The igb driver initializes the MAC addresses of all VFs to
      00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK
      netlink message, also visible in the list of VFs in the output of "ip
      link show"). But when libvirt attempts to restore the MAC address back
      to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the kernel
      responds with "Invalid argument".
      
      Forbidding a reset back to the original value leaves the VF MAC at the
      value set for the now-defunct virtual machine. Especially on a system
      with NetworkManager enabled, this has very bad consequences, since
      NetworkManager forces all interfacess to be IFF_UP all the time - if
      the same virtual machine is restarted using a different VF (or even on
      a different host), there will be multiple interfaces watching for
      traffic with the same MAC address.
      
      To allow libvirt to revert to the original state, we need a way to
      remove the administrative set MAC on a VF, to allow normal host
      operation again, and to reset/overwrite the VF MAC via VF netdev.
      
      This patch implements the outlined scenario by allowing to set the
      VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF.
      igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0,
      so it's possible to reset the VF MAC back to the original value via
      the VF netdev.
      
      Note: Recent patches to libvirt allow for a workaround if the NIC
      isn't capable of resetting the administrative MAC back to all 0, but
      in theory the NIC should allow resetting the MAC in the first place.
      Signed-off-by: NCorinna Vinschen <vinschen@redhat.com>
      Tested-by: NAaron Brown <arron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      177132df
  2. 22 11月, 2017 1 次提交
  3. 08 11月, 2017 1 次提交
  4. 28 10月, 2017 1 次提交
    • A
      igb: Add support for CBS offload · 05f9d3e1
      Andre Guedes 提交于
      This patch adds support for Credit-Based Shaper (CBS) qdisc offload
      from Traffic Control system. This support enable us to leverage the
      Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features
      from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav
      standard which was merged into 802.1Q in 2014. It enables traffic
      prioritization and bandwidth reservation via the Credit-Based Shaper
      which is implemented in hardware by i210 controller.
      
      The patch introduces the igb_setup_tc() function which implements the
      support for CBS qdisc hardware offload in the IGB driver. CBS offload
      is the only traffic control offload supported by the driver at the
      moment.
      
      FQTSS transmission mode from i210 controller is automatically enabled
      by the IGB driver when the CBS is enabled for the first hardware
      queue. Likewise, FQTSS mode is automatically disabled when CBS is
      disabled for the last hardware queue. Changing FQTSS mode requires NIC
      reset.
      
      FQTSS feature is supported by i210 controller only.
      Signed-off-by: NAndre Guedes <andre.guedes@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Tested-by: NHenrik Austad <henrik@austad.us>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      05f9d3e1
  5. 26 10月, 2017 1 次提交
  6. 25 10月, 2017 1 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  7. 18 10月, 2017 1 次提交
  8. 11 10月, 2017 1 次提交
  9. 09 8月, 2017 4 次提交
    • G
      igb: do not drop PF mailbox lock after read of VF message · 46b3bb9b
      Greg Edwards 提交于
      When the PF receives a mailbox message from the VF, it grabs the mailbox
      lock, reads the VF message from the mailbox, ACKs the message and drops
      the lock.
      
      While the PF is performing the action for the VF message, nothing
      prevents another VF message from being posted to the mailbox.  The
      current code handles this condition by just dropping any new VF messages
      without processing them.  This results in a mailbox timeout in the VM
      for posted messages waiting for an ACK, and the VF is reset by the
      igbvf_watchdog_task in the VM.
      
      Given the right sequence of VF messages and mailbox timeouts, this
      condition can go on ad infinitum.
      
      Modify the PF mailbox read method to take an 'unlock' argument that
      optionally leaves the mailbox locked by the PF after reading the VF
      message.  This ensures another VF message is not posted to the mailbox
      until after the PF has completed processing the VF message and written
      its reply.
      Signed-off-by: NGreg Edwards <gedwards@ddn.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      46b3bb9b
    • C
      igb: Remove incorrect "unexpected SYS WRAP" log message · 2643e6e9
      Corinna Vinschen 提交于
      TSAUXC.DisableSystime is never set, so SYSTIM runs into a SYS WRAP
      every 1100 secs on 80580/i350/i354 (40 bit SYSTIM) and every 35000
      secs on 80576 (45 bit SYSTIM).
      
      This wrap event sets the TSICR.SysWrap bit unconditionally.
      
      However, checking TSIM at interrupt time shows that this event does not
      actually cause the interrupt.  Rather, it's just bycatch while the
      actual interrupt is caused by, for instance, TSICR.TXTS.
      
      The conclusion is that the SYS WRAP is actually expected, so the
      "unexpected SYS WRAP" message is entirely bogus and just helps to
      confuse users.  Drop it.
      Signed-off-by: NCorinna Vinschen <vinschen@redhat.com>
      Acked-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2643e6e9
    • C
      igb: protect TX timestamping from API misuse · 26bd4e2d
      Cliff Spradlin 提交于
      HW timestamping can only be requested for a packet if the NIC is first
      setup via ioctl(SIOCSHWTSTAMP). If this step was skipped, then the igb
      driver still allowed TX packets to request HW timestamping. In this
      situation, the _IGB_PTP_TX_IN_PROGRESS flag was set and would never
      clear. This prevented any future HW timestamping requests to succeed.
      
      Fix this by checking that the NIC is configured for HW TX timestamping
      before accepting a HW TX timestamping request.
      Signed-off-by: NCliff Spradlin <cspradlin@google.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      26bd4e2d
    • G
      igb: Fix error of RX network flow classification · 94221ae7
      Gangfeng Huang 提交于
      After add an ethertype filter, if user change the adapter speed several
      times, the error "ethtool -N: etype filters are all used" is reported by
      igb driver.
      
      In older patch, function igb_nfc_filter_exit() and igb_nfc_filter_restore()
      is not paried. igb_nfc_filter_restore() exist in igb_up(), but function
      igb_nfc_filter_exit() is exist in __igb_close(). In the process of speed
      changing, only igb_nfc_filter_restore() is called, it will take a position
      of ethertype bitmap.
      
      Reproduce steps:
      Step 1: Add a etype filter by ethtool
      $ethtool -N eth0 flow-type ether proto 0x88F8 action 1
      Step 2: Change the adapter speed to 100M/full duplex
      $ethtool -s eth0 speed 100 duplex full
      Step 3: Change the adapter speed to 1000M/full duplex
      ethtool -s eth0 speed 1000 duplex full
      Repeat step2 and step3, then dmesg the system log, you can find the error
      message, add new ethtype filter is also failed.
      
      This fixing is move igb_nfc_filter_exit() from __igb_close() to igb_down()
      to make igb_nfc_filter_restore()/igb_nfc_filter_exit() is paired.
      Signed-off-by: NGangfeng Huang <gangfeng.huang@ni.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      94221ae7
  10. 08 6月, 2017 1 次提交
  11. 06 6月, 2017 5 次提交
  12. 21 4月, 2017 3 次提交
  13. 18 3月, 2017 11 次提交
  14. 19 1月, 2017 1 次提交
    • T
      net: Remove usage of net_device last_rx member · 4a7c9726
      Tobias Klauser 提交于
      The network stack no longer uses the last_rx member of struct net_device
      since the bonding driver switched to use its own private last_rx in
      commit 9f242738 ("bonding: use last_arp_rx in slave_last_rx()").
      
      However, some drivers still (ab)use the field for their own purposes and
      some driver just update it without actually using it.
      
      Previously, there was an accompanying comment for the last_rx member
      added in commit 4dc89133 ("net: add a comment on netdev->last_rx")
      which asked drivers not to update is, unless really needed. However,
      this commend was removed in commit f8ff080d ("bonding: remove
      useless updating of slave->dev->last_rx"), so some drivers added later
      on still did update last_rx.
      
      Remove all usage of last_rx and switch three drivers (sky2, atp and
      smc91c92_cs) which actually read and write it to use their own private
      copy in netdev_priv.
      
      Compile-tested with allyesconfig and allmodconfig on x86 and arm.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Mirko Lindner <mlindner@marvell.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a7c9726
  15. 11 1月, 2017 1 次提交
  16. 09 1月, 2017 1 次提交
  17. 06 1月, 2017 3 次提交
    • T
      igb: close/suspend race in netif_device_detach · 9474933c
      Todd Fujinaka 提交于
      Similar to ixgbe, when an interface is part of a namespace it is
      possible that igb_close() may be called while __igb_shutdown() is
      running which ends up in a double free WARN and/or a BUG in
      free_msi_irqs().
      
      Extend the rtnl_lock() to protect the call to netif_device_detach() and
      igb_clear_interrupt_scheme() in __igb_shutdown() and check for
      netif_device_present() to avoid calling igb_clear_interrupt_scheme() a
      second time in igb_close().
      
      Also extend the rtnl lock in igb_resume() to netif_device_attach().
      Signed-off-by: NTodd Fujinaka <todd.fujinaka@intel.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9474933c
    • G
      igb: re-assign hw address pointer on reset after PCI error · 69b97cf6
      Guilherme G Piccoli 提交于
      Whenever the igb driver detects the result of a read operation returns
      a value composed only by F's (like 0xFFFFFFFF), it will detach the
      net_device, clear the hw_addr pointer and warn to the user that adapter's
      link is lost - those steps happen on igb_rd32().
      
      In case a PCI error happens on Power architecture, there's a recovery
      mechanism called EEH, that will reset the PCI slot and call driver's
      handlers to reset the adapter and network functionality as well.
      
      We observed that once hw_addr is NULL after the error is detected on
      igb_rd32(), it's never assigned back, so in the process of resetting
      the network functionality we got a NULL pointer dereference in both
      igb_configure_tx_ring() and igb_configure_rx_ring(). In order to avoid
      such bug, this patch re-assigns the hw_addr value in the slot_reset
      handler.
      Reported-by: NAnthony H Thai <ahthai@us.ibm.com>
      Reported-by: NHarsha Thyagaraja <hathyaga@in.ibm.com>
      Signed-off-by: NGuilherme G Piccoli <gpiccoli@linux.vnet.ibm.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      69b97cf6
    • C
      igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr · 629823b8
      Cao jin 提交于
      When running as guest, under certain condition, it will oops as following.
      writel() in igb_configure_tx_ring() results in oops, because hw->hw_addr
      is NULL. While other register access won't oops kernel because they use
      wr32/rd32 which have a defense against NULL pointer.
      
          [  141.225449] pcieport 0000:00:1c.0: AER: Multiple Uncorrected (Fatal)
          error received: id=0101
          [  141.225523] igb 0000:01:00.1: PCIe Bus Error:
          severity=Uncorrected (Fatal), type=Unaccessible,
          id=0101(Unregistered Agent ID)
          [  141.299442] igb 0000:01:00.1: broadcast error_detected message
          [  141.300539] igb 0000:01:00.0 enp1s0f0: PCIe link lost, device now
          detached
          [  141.351019] igb 0000:01:00.1 enp1s0f1: PCIe link lost, device now
          detached
          [  143.465904] pcieport 0000:00:1c.0: Root Port link has been reset
          [  143.465994] igb 0000:01:00.1: broadcast slot_reset message
          [  143.466039] igb 0000:01:00.0: enabling device (0000 -> 0002)
          [  144.389078] igb 0000:01:00.1: enabling device (0000 -> 0002)
          [  145.312078] igb 0000:01:00.1: broadcast resume message
          [  145.322211] BUG: unable to handle kernel paging request at
          0000000000003818
          [  145.361275] IP: [<ffffffffa02fd38d>]
          igb_configure_tx_ring+0x14d/0x280 [igb]
          [  145.400048] PGD 0
          [  145.438007] Oops: 0002 [#1] SMP
      
      A similar issue & solution could be found at:
          http://patchwork.ozlabs.org/patch/689592/Signed-off-by: NCao jin <caoj.fnst@cn.fujitsu.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      629823b8
  18. 15 12月, 2016 2 次提交
    • A
      igb: update code to better handle incrementing page count · bd4171a5
      Alexander Duyck 提交于
      Update the driver code so that we do bulk updates of the page reference
      count instead of just incrementing it by one reference at a time.  The
      advantage to doing this is that we cut down on atomic operations and
      this in turn should give us a slight improvement in cycles per packet.
      In addition if we eventually move this over to using build_skb the gains
      will be more noticeable.
      
      Link: http://lkml.kernel.org/r/20161110113616.76501.17072.stgit@ahduyck-blue-test.jf.intel.comSigned-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Keguang Zhang <keguang.zhang@gmail.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Tobias Klauser <tklauser@distanz.ch>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd4171a5
    • A
      igb: update driver to make use of DMA_ATTR_SKIP_CPU_SYNC · 5be59554
      Alexander Duyck 提交于
      The ARM architecture provides a mechanism for deferring cache line
      invalidation in the case of map/unmap.  This patch makes use of this
      mechanism to avoid unnecessary synchronization.
      
      A secondary effect of this change is that the portion of the page that
      has been synchronized for use by the CPU should be writable and could be
      passed up the stack (at least on ARM).
      
      The last bit that occurred to me is that on architectures where the
      sync_for_cpu call invalidates cache lines we were prefetching and then
      invalidating the first 128 bytes of the packet.  To avoid that I have
      moved the sync up to before we perform the prefetch and allocate the
      skbuff so that we can actually make use of it.
      
      Link: http://lkml.kernel.org/r/20161110113611.76501.98897.stgit@ahduyck-blue-test.jf.intel.comSigned-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Keguang Zhang <keguang.zhang@gmail.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Tobias Klauser <tklauser@distanz.ch>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5be59554