1. 24 11月, 2014 1 次提交
  2. 30 10月, 2014 1 次提交
    • R
      igb: don't reuse pages with pfmemalloc flag · bc16e47f
      Roman Gushchin 提交于
      Incoming packet is dropped silently by sk_filter(), if the skb was
      allocated from pfmemalloc reserves and the corresponding socket is
      not marked with the SOCK_MEMALLOC flag.
      
      Igb driver allocates pages for DMA with __skb_alloc_page(), which
      calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
      of OOM condition, igb can get pages with pfmemalloc flag set.
      
      If an incoming packet hits the pfmemalloc page and is large enough
      (small packets are copying into the memory, allocated with
      netdev_alloc_skb_ip_align(), so they are not affected), it will be
      dropped.
      
      This behavior is ok under high memory pressure, but the problem is
      that the igb driver reuses these mapped pages. So, packets are still
      dropping even if all memory issues are gone and there is a plenty
      of free memory.
      
      In my case, some TCP sessions hang on a small percentage (< 0.1%)
      of machines days after OOMs.
      
      Fix this by avoiding reuse of such pages.
      Signed-off-by: NRoman Gushchin <klamm@yandex-team.ru>
      Tested-by: Aaron Brown "aaron.f.brown@intel.com"
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      bc16e47f
  3. 11 10月, 2014 1 次提交
  4. 02 10月, 2014 3 次提交
  5. 06 9月, 2014 2 次提交
  6. 28 8月, 2014 1 次提交
  7. 26 8月, 2014 1 次提交
    • D
      net: Remove ndo_xmit_flush netdev operation, use signalling instead. · 0b725a2c
      David S. Miller 提交于
      As reported by Jesper Dangaard Brouer, for high packet rates the
      overhead of having another indirect call in the TX path is
      non-trivial.
      
      There is the indirect call itself, and then there is all of the
      reloading of the state to refetch the tail pointer value and
      then write the device register.
      
      Move to a more passive scheme, which requires very light modifications
      to the device drivers.
      
      The signal is a new skb->xmit_more value, if it is non-zero it means
      that more SKBs are pending to be transmitted on the same queue as the
      current SKB.  And therefore, the driver may elide the tail pointer
      update.
      
      Right now skb->xmit_more is always zero.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b725a2c
  8. 25 8月, 2014 1 次提交
  9. 24 7月, 2014 2 次提交
  10. 11 7月, 2014 1 次提交
  11. 10 7月, 2014 1 次提交
  12. 01 7月, 2014 1 次提交
  13. 11 6月, 2014 1 次提交
  14. 04 6月, 2014 1 次提交
  15. 24 5月, 2014 1 次提交
    • S
      net-next:v4: Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool. · ed616689
      Sucheta Chakraborty 提交于
      o min_tx_rate puts lower limit on the VF bandwidth. VF is guaranteed
        to have a bandwidth of at least this value.
        max_tx_rate puts cap on the VF bandwidth. VF can have a bandwidth
        of up to this value.
      
      o A new handler set_vf_rate for attr IFLA_VF_RATE has been introduced
        which takes 4 arguments:
        netdev, VF number, min_tx_rate, max_tx_rate
      
      o ndo_set_vf_rate replaces ndo_set_vf_tx_rate handler.
      
      o Drivers that currently implement ndo_set_vf_tx_rate should now call
        ndo_set_vf_rate instead and reject attempt to set a minimum bandwidth
        greater than 0 for IFLA_VF_TX_RATE when IFLA_VF_RATE is not yet
        implemented by driver.
      
      o If user enters only one of either min_tx_rate or max_tx_rate, then,
        userland should read back the other value from driver and set both
        for IFLA_VF_RATE.
        Drivers that have not yet implemented IFLA_VF_RATE should always
        return min_tx_rate as 0 when read from ip tool.
      
      o If both IFLA_VF_TX_RATE and IFLA_VF_RATE options are specified, then
        IFLA_VF_RATE should override.
      
      o Idea is to have consistent display of rate values to user.
      
      o Usage example: -
      
        ./ip link set p4p1 vf 0 rate 900
      
        ./ip link show p4p1
        32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
        DEFAULT qlen 1000
          link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff
          vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 900 (Mbps), max_tx_rate 900Mbps
          vf 1 MAC f6:c6:7c:3f:3d:6c
          vf 2 MAC 56:32:43:98:d7:71
          vf 3 MAC d6:be:c3:b5:85:ff
          vf 4 MAC ee:a9:9a:1e:19:14
          vf 5 MAC 4a:d0:4c:07:52:18
          vf 6 MAC 3a:76:44:93:62:f9
          vf 7 MAC 82:e9:e7:e3:15:1a
      
        ./ip link set p4p1 vf 0 max_tx_rate 300 min_tx_rate 200
      
        ./ip link show p4p1
        32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
        DEFAULT qlen 1000
          link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff
          vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 300 (Mbps), max_tx_rate 300Mbps,
          min_tx_rate 200Mbps
          vf 1 MAC f6:c6:7c:3f:3d:6c
          vf 2 MAC 56:32:43:98:d7:71
          vf 3 MAC d6:be:c3:b5:85:ff
          vf 4 MAC ee:a9:9a:1e:19:14
          vf 5 MAC 4a:d0:4c:07:52:18
          vf 6 MAC 3a:76:44:93:62:f9
          vf 7 MAC 82:e9:e7:e3:15:1a
      
        ./ip link set p4p1 vf 0 max_tx_rate 600 rate 300
      
        ./ip link show p4p1
        32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
        DEFAULT qlen 1000
          link/ether 00:0e:1e:08:b0:f brd ff:ff:ff:ff:ff:ff
          vf 0 MAC 3e:a0:ca:bd:ae:5, tx rate 600 (Mbps), max_tx_rate 600Mbps,
          min_tx_rate 200Mbps
          vf 1 MAC f6:c6:7c:3f:3d:6c
          vf 2 MAC 56:32:43:98:d7:71
          vf 3 MAC d6:be:c3:b5:85:ff
          vf 4 MAC ee:a9:9a:1e:19:14
          vf 5 MAC 4a:d0:4c:07:52:18
          vf 6 MAC 3a:76:44:93:62:f9
          vf 7 MAC 82:e9:e7:e3:15:1a
      Signed-off-by: NSucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed616689
  16. 23 5月, 2014 1 次提交
  17. 25 4月, 2014 10 次提交
  18. 23 4月, 2014 3 次提交
  19. 19 4月, 2014 1 次提交
  20. 11 4月, 2014 2 次提交
  21. 01 4月, 2014 1 次提交
  22. 28 3月, 2014 2 次提交
  23. 21 3月, 2014 1 次提交
    • C
      igb: Unset IGB_FLAG_HAS_MSIX-flag when falling back to msi-only · b709323d
      Christoph Paasch 提交于
      Prior to cd14ef54 (igb: Change to use statically allocated array for
      MSIx entries), having msix_entries different from NULL was an indicator
      that MSIX is enabled.
      In igb_set_interrupt_capabiliy we may fall back to MSI-only. Prior to
      the above patch msix_entries was set to NULL by
      igb_reset_interrupt_capability.
      
      However, now we are checking the flag for IGB_FLAG_HAS_MSIX and so the
      stack gets completly confused:
      
      [   42.659791] ------------[ cut here ]------------
      [   42.715032] WARNING: CPU: 7 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x15c/0x1fb()
      [   42.848263] NETDEV WATCHDOG: eth0 (igb): transmit queue 0 timed out
      [   42.923253] Modules linked in:
      [   42.959875] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.14.0-rc2-mptcp #437
      [   43.043184] Hardware name: HP ProLiant DL165 G7, BIOS O37 01/26/2011
      [   43.119215]  0000000000000108 ffff88023fdc3da8 ffffffff81487847 0000000000000108
      [   43.208165]  ffff88023fdc3df8 ffff88023fdc3de8 ffffffff81034e7d ffff88023fdc3dd8
      [   43.297120]  ffffffff813fff10 ffff880236018000 ffff880236b178c0 0000000000000008
      [   43.386071] Call Trace:
      [   43.415303]  <IRQ>  [<ffffffff81487847>] dump_stack+0x49/0x62
      [   43.484174]  [<ffffffff81034e7d>] warn_slowpath_common+0x77/0x91
      [   43.556049]  [<ffffffff813fff10>] ? dev_watchdog+0x15c/0x1fb
      [   43.623759]  [<ffffffff81034f2b>] warn_slowpath_fmt+0x41/0x43
      [   43.692511]  [<ffffffff813fff10>] dev_watchdog+0x15c/0x1fb
      [   43.758141]  [<ffffffff813ffdb4>] ? __netdev_watchdog_up+0x64/0x64
      [   43.832091]  [<ffffffff8103cd04>] call_timer_fn+0x17/0x6f
      [   43.896682]  [<ffffffff8103cebe>] run_timer_softirq+0x162/0x1a2
      [   43.967511]  [<ffffffff81038520>] __do_softirq+0xcd/0x1cc
      [   44.032104]  [<ffffffff81038689>] irq_exit+0x3a/0x48
      [   44.091492]  [<ffffffff81026d43>] smp_apic_timer_interrupt+0x43/0x50
      [   44.167525]  [<ffffffff8148c24a>] apic_timer_interrupt+0x6a/0x70
      [   44.239392]  <EOI>  [<ffffffff8100992c>] ? default_idle+0x6/0x8
      [   44.310343]  [<ffffffff81009b31>] arch_cpu_idle+0x13/0x18
      [   44.374934]  [<ffffffff81066126>] cpu_startup_entry+0xa7/0x101
      [   44.444724]  [<ffffffff81025660>] start_secondary+0x1b2/0x1b7
      [   44.513472] ---[ end trace a5a075fd4e7f854f ]---
      [   44.568753] igb 0000:04:00.0 eth0: Reset adapter
      [   46.206945] random: nonblocking pool is initialized
      [   46.465670] irq 44: nobody cared (try booting with the "irqpoll" option)
      [   46.545862] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G        W    3.14.0-rc2-mptcp #437
      [   46.640610] Hardware name: HP ProLiant DL165 G7, BIOS O37 01/26/2011
      [   46.716641]  ffff8802363f8c84 ffff88023fdc3e38 ffffffff81487847 00000000a03cdb6d
      [   46.805598]  ffff8802363f8c00 ffff88023fdc3e68 ffffffff81068489 0000007f81825400
      [   46.894539]  ffff8802363f8c00 0000000000000000 0000000000000000 ffff88023fdc3ea8
      [   46.983484] Call Trace:
      [   47.012714]  <IRQ>  [<ffffffff81487847>] dump_stack+0x49/0x62
      [   47.081585]  [<ffffffff81068489>] __report_bad_irq+0x35/0xc1
      [   47.149295]  [<ffffffff81068683>] note_interrupt+0x16e/0x1ea
      [   47.217006]  [<ffffffff8106679e>] handle_irq_event_percpu+0x116/0x12e
      [   47.294075]  [<ffffffff810667e9>] handle_irq_event+0x33/0x4f
      [   47.361787]  [<ffffffff81068c95>] handle_fasteoi_irq+0x83/0xd1
      [   47.431577]  [<ffffffff81003d5b>] handle_irq+0x1f/0x28
      [   47.493047]  [<ffffffff81003567>] do_IRQ+0x4e/0xd4
      [   47.550358]  [<ffffffff8148b06a>] common_interrupt+0x6a/0x6a
      [   47.618066]  <EOI>  [<ffffffff8100992c>] ? default_idle+0x6/0x8
      [   47.689016]  [<ffffffff81009b31>] arch_cpu_idle+0x13/0x18
      [   47.753605]  [<ffffffff81066126>] cpu_startup_entry+0xa7/0x101
      [   47.823397]  [<ffffffff81025660>] start_secondary+0x1b2/0x1b7
      [   47.892146] handlers:
      [   47.919301] [<ffffffff812fbd7d>] igb_intr
      
      So, this patch unsets the flag to indicate that we are not using MSIX.
      This patch does exactly this: Unsetting the flag when falling back to MSI.
      
      Fixes: cd14ef54 (igb: Change to use statically allocated array for MSIx entries)
      Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
      Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
      Tested-by: NJeff Pieper <jeffrey.e.pieper@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b709323d