1. 06 12月, 2015 2 次提交
  2. 19 11月, 2015 3 次提交
    • E
      net: provide generic busy polling to all NAPI drivers · 93d05d4a
      Eric Dumazet 提交于
      NAPI drivers no longer need to observe a particular protocol
      to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y)
      
      napi_hash_add() and napi_hash_del() are automatically called
      from core networking stack, respectively from
      netif_napi_add() and netif_napi_del()
      
      This patch depends on free_netdev() and netif_napi_del() being
      called from process context, which seems to be the norm.
      
      Drivers might still prefer to call napi_hash_del() on their
      own, since they might combine all the rcu grace periods into
      a single one, knowing their NAPI structures lifetime, while
      core networking stack has no idea of a possible combining.
      
      Once this patch proves to not bring serious regressions,
      we will cleanup drivers to either remove napi_hash_del()
      or provide appropriate rcu grace periods combining.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93d05d4a
    • E
      net: move skb_mark_napi_id() into core networking stack · 93f93a44
      Eric Dumazet 提交于
      We would like to automatically provide busy polling support
      to all NAPI drivers, without them having to implement anything.
      
      skb_mark_napi_id() can be called from napi_gro_receive() and
      napi_get_frags().
      
      Few drivers are still calling skb_mark_napi_id() because
      they use netif_receive_skb(). They should eventually call
      napi_gro_receive() instead. I will leave this to drivers
      maintainers.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93f93a44
    • E
      bnx2x: remove bnx2x_low_latency_recv() support · b59768c6
      Eric Dumazet 提交于
      Switch to native NAPI polling, as this reduces overhead and complexity.
      
      Normal path is faster, since one cmpxchg() is not anymore requested,
      and busy polling with the NAPI polling has same performance.
      
      Tested:
      lpk50:~# cat /proc/sys/net/core/busy_read
      70
      lpk50:~# nstat >/dev/null;./netperf -H lpk55 -t TCP_RR;nstat
      MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpk55.prod.google.com () port 0 AF_INET : first burst 0
      Local /Remote
      Socket Size   Request  Resp.   Elapsed  Trans.
      Send   Recv   Size     Size    Time     Rate
      bytes  Bytes  bytes    bytes   secs.    per sec
      
      16384  87380  1        1       10.00    40095.07
      16384  87380
      IpInReceives                    401062             0.0
      IpInDelivers                    401062             0.0
      IpOutRequests                   401079             0.0
      TcpActiveOpens                  7                  0.0
      TcpPassiveOpens                 3                  0.0
      TcpAttemptFails                 3                  0.0
      TcpEstabResets                  5                  0.0
      TcpInSegs                       401036             0.0
      TcpOutSegs                      401052             0.0
      TcpOutRsts                      38                 0.0
      UdpInDatagrams                  26                 0.0
      UdpOutDatagrams                 27                 0.0
      Ip6OutNoRoutes                  1                  0.0
      TcpExtDelayedACKs               1                  0.0
      TcpExtTCPPrequeued              98                 0.0
      TcpExtTCPDirectCopyFromPrequeue 98                 0.0
      TcpExtTCPHPHits                 4                  0.0
      TcpExtTCPHPHitsToUser           98                 0.0
      TcpExtTCPPureAcks               5                  0.0
      TcpExtTCPHPAcks                 101                0.0
      TcpExtTCPAbortOnData            6                  0.0
      TcpExtBusyPollRxPackets         400832             0.0
      TcpExtTCPOrigDataSent           400983             0.0
      IpExtInOctets                   21273867           0.0
      IpExtOutOctets                  21261254           0.0
      IpExtInNoECTPkts                401064             0.0
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b59768c6
  3. 07 11月, 2015 1 次提交
    • M
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman 提交于
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  4. 18 8月, 2015 1 次提交
    • Y
      bnx2: Fix bandwidth allocation for some MF modes · da3cc2da
      Yuval Mintz 提交于
      Management firmware tells driver in case bandwidth configuration for
      a specific function exists, but [regretably] the same field has different
      meanings depending on the multi-function mode - it can either be
      a percentile value or an actual speed.
      
      For newer multi-function modes current logic is incorrect -
      driver understands values as actual speeds instead of percentages,
      causing the resulting chip configuration to be incorrect.
      Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da3cc2da
  5. 11 8月, 2015 1 次提交
  6. 30 7月, 2015 1 次提交
  7. 23 7月, 2015 4 次提交
  8. 29 6月, 2015 1 次提交
    • M
      bnx2x: fix DMA API usage · 8031612d
      Michal Schmidt 提交于
      With CONFIG_DMA_API_DEBUG=y bnx2x triggers the error "DMA-API: device
      driver frees DMA memory with wrong function".
      On archs where PAGE_SIZE > SGE_PAGE_SIZE it also triggers "DMA-API:
      device driver frees DMA memory with different size".
      
      Fix this by making the mapping and unmapping symmetric:
       - Do not map the whole pool page at once. Instead map the
         SGE_PAGE_SIZE-sized pieces individually, so they can be unmapped in
         the same manner.
       - What's mapped using dma_map_page() must be unmapped using
         dma_unmap_page().
      
      Tested on ppc64.
      
      Fixes: 4cace675 ("bnx2x: Alloc 4k fragment for each rx ring buffer element")
      Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8031612d
  9. 25 6月, 2015 1 次提交
  10. 02 6月, 2015 1 次提交
    • G
      bnx2x: Alloc 4k fragment for each rx ring buffer element · 4cace675
      Gabriel Krisman Bertazi 提交于
      The driver allocates one page for each buffer on the rx ring, which is
      too much on architectures like ppc64 and can cause unexpected allocation
      failures when the system is under stress.  Now, we keep a memory pool
      per queue, and if the architecture's PAGE_SIZE is greater than 4k, we
      fragment pages and assign each 4k segment to a ring element, which
      reduces the overall memory consumption on such architectures.  This
      helps avoiding errors like the example below:
      
      [bnx2x_alloc_rx_sge:435(eth1)]Can't alloc sge
      [c00000037ffeb900] [d000000075eddeb4] .bnx2x_alloc_rx_sge+0x44/0x200 [bnx2x]
      [c00000037ffeb9b0] [d000000075ee0b34] .bnx2x_fill_frag_skb+0x1ac/0x460 [bnx2x]
      [c00000037ffebac0] [d000000075ee11f0] .bnx2x_tpa_stop+0x160/0x2e8 [bnx2x]
      [c00000037ffebb90] [d000000075ee1560] .bnx2x_rx_int+0x1e8/0xc30 [bnx2x]
      [c00000037ffebcd0] [d000000075ee2084] .bnx2x_poll+0xdc/0x3d8 [bnx2x] (unreliable)
      Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Acked-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
      Reviewed-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cace675
  11. 12 5月, 2015 1 次提交
  12. 05 5月, 2015 1 次提交
  13. 30 4月, 2015 3 次提交
  14. 28 4月, 2015 1 次提交
    • M
      bnx2x: really disable TPA if 'disable_tpa' option is set · 22a8f237
      Michal Schmidt 提交于
      bnx2x's 'disable_tpa=1' module option is not respected properly and TPA
      (transparent packet aggregation) remains enabled. Even though the
      module option causes LRO to be disabled, TPA is enabled in GRO mode.
      
      Additionally, disabling GRO via ethtool then has no effect. One can
      still observe tpa_* statistics increase and large packets being received
      in tcpdump.
      
      The bug was an unintended consequence of commit aebf6244 "bnx2x: Be
      more forgiving toward SW GRO".
      
      Fix it by following the bp->disable_tpa flag when initializing fp's.
      Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22a8f237
  15. 23 4月, 2015 1 次提交
  16. 16 4月, 2015 1 次提交
    • E
      bnx2x: Fix busy_poll vs netpoll · 074975d0
      Eric Dumazet 提交于
      Commit 9a2620c8 ("bnx2x: prevent WARN during driver unload")
      switched the napi/busy_lock locking mechanism from spin_lock() into
      spin_lock_bh(), breaking inter-operability with netconsole, as netpoll
      disables interrupts prior to calling our napi mechanism.
      
      This switches the driver into using atomic assignments instead of the
      spinlock mechanisms previously employed.
      
      Based on initial patch from Yuval Mintz & Ariel Elior
      
      I basically added softirq starvation avoidance, and mixture
      of atomic operations, plain writes and barriers.
      
      Note this slightly reduces the overhead for this driver when no
      busy_poll sockets are in use.
      
      Fixes: 9a2620c8 ("bnx2x: prevent WARN during driver unload")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      074975d0
  17. 27 1月, 2015 1 次提交
  18. 14 1月, 2015 1 次提交
  19. 11 12月, 2014 1 次提交
  20. 17 11月, 2014 1 次提交
  21. 31 10月, 2014 1 次提交
  22. 20 9月, 2014 1 次提交
  23. 02 9月, 2014 1 次提交
  24. 30 8月, 2014 1 次提交
  25. 26 8月, 2014 3 次提交
  26. 23 8月, 2014 3 次提交
  27. 25 7月, 2014 1 次提交
    • D
      bnx2x: fix crash during TSO tunneling · fe26566d
      Dmitry Kravkov 提交于
      When TSO packet is transmitted additional BD w/o mapping is used
      to describe the packed. The BD needs special handling in tx
      completion.
      
      kernel: Call Trace:
      kernel: <IRQ>  [<ffffffff815e19ba>] dump_stack+0x19/0x1b
      kernel: [<ffffffff8105dee1>] warn_slowpath_common+0x61/0x80
      kernel: [<ffffffff8105df5c>] warn_slowpath_fmt+0x5c/0x80
      kernel: [<ffffffff814a8c0d>] ? find_iova+0x4d/0x90
      kernel: [<ffffffff814ab0e2>] intel_unmap_page.part.36+0x142/0x160
      kernel: [<ffffffff814ad0e6>] intel_unmap_page+0x26/0x30
      kernel: [<ffffffffa01f55d7>] bnx2x_free_tx_pkt+0x157/0x2b0 [bnx2x]
      kernel: [<ffffffffa01f8dac>] bnx2x_tx_int+0xac/0x220 [bnx2x]
      kernel: [<ffffffff8101a0d9>] ? read_tsc+0x9/0x20
      kernel: [<ffffffffa01f8fdb>] bnx2x_poll+0xbb/0x3c0 [bnx2x]
      kernel: [<ffffffff814d041a>] net_rx_action+0x15a/0x250
      kernel: [<ffffffff81067047>] __do_softirq+0xf7/0x290
      kernel: [<ffffffff815f3a5c>] call_softirq+0x1c/0x30
      kernel: [<ffffffff81014d25>] do_softirq+0x55/0x90
      kernel: [<ffffffff810673e5>] irq_exit+0x115/0x120
      kernel: [<ffffffff815f4358>] do_IRQ+0x58/0xf0
      kernel: [<ffffffff815e94ad>] common_interrupt+0x6d/0x6d
      kernel: <EOI>  [<ffffffff810bbff7>] ? clockevents_notify+0x127/0x140
      kernel: [<ffffffff814834df>] ? cpuidle_enter_state+0x4f/0xc0
      kernel: [<ffffffff81483615>] cpuidle_idle_call+0xc5/0x200
      kernel: [<ffffffff8101bc7e>] arch_cpu_idle+0xe/0x30
      kernel: [<ffffffff810b4725>] cpu_startup_entry+0xf5/0x290
      kernel: [<ffffffff815cfee1>] start_secondary+0x265/0x27b
      kernel: ---[ end trace 11aa7726f18d7e80 ]---
      
      Fixes: a848ade4 ("bnx2x: add CSUM and TSO support for encapsulation protocols")
      Reported-by: NYulong Pei <ypei@redhat.com>
      Cc: Michal Schmidt <mschmidt@redhat.com>
      Signed-off-by: NDmitry Kravkov <Dmitry.Kravkov@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe26566d
  28. 02 7月, 2014 1 次提交