1. 20 11月, 2012 1 次提交
  2. 08 11月, 2012 1 次提交
  3. 04 8月, 2012 1 次提交
  4. 20 7月, 2012 1 次提交
    • T
      mlx4_en: map entire pages to increase throughput · 4cce66cd
      Thadeu Lima de Souza Cascardo 提交于
      In its receive path, mlx4_en driver maps each page chunk that it pushes
      to the hardware and unmaps it when pushing it up the stack. This limits
      throughput to about 3Gbps on a Power7 8-core machine.
      
      One solution is to map the entire allocated page at once. However, this
      requires that we keep track of every page fragment we give to a
      descriptor. We also need to work with the discipline that all fragments will
      be released (in the sense that it will not be reused by the driver
      anymore) in the order they are allocated to the driver.
      
      This requires that we don't reuse any fragments, every single one of
      them must be reallocated. We do that by releasing all the fragments that
      are processed and only after finished processing the descriptors, we
      start the refill.
      
      We also must somehow guarantee that we either refill all fragments in a
      descriptor or none at all, without resorting to giving up a page
      fragment that we would have already given. Otherwise, we would break the
      discipline of only releasing the fragments in the order they were
      allocated.
      
      This has passed page allocation fault injections (restricted to the
      driver by using required-start and required-end) and device hotplug
      while 16 TCP streams were able to deliver more than 9Gbps.
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cce66cd
  5. 19 7月, 2012 1 次提交
    • A
      net/mlx4_en: Add accelerated RFS support · 1eb8c695
      Amir Vadai 提交于
      Use RFS infrastructure and flow steering in HW to keep CPU
      affinity of rx interrupts and application per TCP stream.
      
      A flow steering filter is added to the HW whenever the RFS
      ndo callback is invoked by core networking code.
      
      Because the invocation takes place in interrupt context, the
      actual setup of HW is done using workqueue. Whenever new filter
      is added, the driver checks for expiry of existing filters.
      
      Since there's window in time between the point where the core
      RFS code invoked the ndo callback, to the point where the HW
      is configured from the workqueue context, the 2nd, 3rd etc
      packets from that stream will cause the net core to invoke
      the callback again and again.
      
      To prevent inefficient/double configuration of the HW, the filters
      are kept in a database which is indexed using hash function to enable
      fast access.
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1eb8c695
  6. 08 7月, 2012 4 次提交
  7. 26 6月, 2012 1 次提交
  8. 18 5月, 2012 1 次提交
    • A
      net/mlx4_en: num cores tx rings for every UP · bc6a4744
      Amir Vadai 提交于
      Change the TX ring scheme such that the number of rings for untagged packets
      and for tagged packets (per each of the vlan priorities) is the same, unlike
      the current situation where for tagged traffic there's one ring per priority
      and for untagged rings as the number of core.
      
      Queue selection is done as follows:
      
      If the mqprio qdisc is operates on the interface, such that the core networking
      code invoked the device setup_tc ndo callback, a mapping of skb->priority =>
      queue set is forced - for both, tagged and untagged traffic.
      
      Else, the egress map skb->priority =>  User priority is used for tagged traffic, and
      all untagged traffic is sent through tx rings of UP 0.
      
      The patch follows the convergence of discussing that issue with John Fastabend
      over this thread http://comments.gmane.org/gmane.linux.network/229877
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Liran Liss <liranl@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc6a4744
  9. 24 4月, 2012 2 次提交
  10. 05 4月, 2012 4 次提交
    • A
      net/mlx4_en: Set max rate-limit for a TC · 109d2446
      Amir Vadai 提交于
      This patch is using the DCB netlink to set rate limit per ETS TC
      Values are accepted in Kbps and rounded up to the nearest multiply of 100Mbps.
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      109d2446
    • A
      net/mlx4_en: DCB QoS support · 564c274c
      Amir Vadai 提交于
      Set TSA, promised BW and PFC using IEEE 802.1qaz netlink commands.
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      564c274c
    • A
      net/mlx4_en: Force user priority by QP attribute · 0e98b523
      Amir Vadai 提交于
      Instead of relying on HW to change schedule queue by UP, schedule
      queue is fixed for a tx_ring, and UP in WQE is ignored in this aspect.  This
      resolves two issues with untagged traffic:
      1. untagged traffic has no UP in packet which is needed for QoS. The change
         above allows setting the schedule queue (and by that the UP) of such a stream.
      2. BlueFlame uses the same field used by vlan tag. So forcing UP from QPC
         allows using BF for untagged but prioritized traffic.
      
      In old firmware that force UP is not supported, untagged traffic will not subject to
      QoS.
      
      Because UP is set by QP, need to always have a tx ring per UP, even if pfcrx
      module paramter is false.
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e98b523
    • T
      mlx4: allocate just enough pages instead of always 4 pages · 117980c4
      Thadeu Lima de Souza Cascardo 提交于
      The driver uses a 2-order allocation, which is too much on architectures
      like ppc64, which has a 64KiB page. This particular allocation is used
      for large packet fragments that may have a size of 512, 1024, 4096 or
      fill the whole allocation. So, a minimum size of 16384 is good enough
      and will be the same size that is used in architectures of 4KiB sized
      pages.
      
      This will avoid allocation failures that we see when the system is under
      stress, but still has plenty of memory, like the one below.
      
      This will also allow us to set the interface MTU to higher values like
      9000, which was not possible on ppc64 without this patch.
      
      Node 1 DMA: 737*64kB 37*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 51904kB
      83137 total pagecache pages
      0 pages in swap cache
      Swap cache stats: add 0, delete 0, find 0/0
      Free swap  = 10420096kB
      Total swap = 10420096kB
      107776 pages RAM
      1184 pages reserved
      147343 pages shared
      28152 pages non-shared
      netstat: page allocation failure. order:2, mode:0x4020
      Call Trace:
      [c0000001a4fa3770] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable)
      [c0000001a4fa3820] [c00000000016af38] .__alloc_pages_nodemask+0x618/0x930
      [c0000001a4fa39a0] [c0000000001a71a0] .alloc_pages_current+0xb0/0x170
      [c0000001a4fa3a40] [d00000000dcc3e00] .mlx4_en_alloc_frag+0x200/0x240 [mlx4_en]
      [c0000001a4fa3b10] [d00000000dcc3f8c] .mlx4_en_complete_rx_desc+0x14c/0x250 [mlx4_en]
      [c0000001a4fa3be0] [d00000000dcc4eec] .mlx4_en_process_rx_cq+0x62c/0x850 [mlx4_en]
      [c0000001a4fa3d20] [d00000000dcc5150] .mlx4_en_poll_rx_cq+0x40/0x90 [mlx4_en]
      [c0000001a4fa3dc0] [c0000000004e2bb8] .net_rx_action+0x178/0x450
      [c0000001a4fa3eb0] [c00000000009c9b8] .__do_softirq+0x118/0x290
      [c0000001a4fa3f90] [c000000000031df8] .call_do_softirq+0x14/0x24
      [c000000184c3b520] [c00000000000e700] .do_softirq+0xf0/0x110
      [c000000184c3b5c0] [c00000000009c6d4] .irq_exit+0xb4/0xc0
      [c000000184c3b640] [c00000000000e964] .do_IRQ+0x144/0x230
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: NKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Tested-by: NKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      117980c4
  11. 07 3月, 2012 3 次提交
  12. 07 2月, 2012 1 次提交
  13. 23 1月, 2012 1 次提交
  14. 19 1月, 2012 1 次提交
  15. 14 12月, 2011 1 次提交
  16. 28 11月, 2011 3 次提交
  17. 15 11月, 2011 1 次提交
  18. 01 11月, 2011 1 次提交
  19. 19 10月, 2011 2 次提交
  20. 10 10月, 2011 3 次提交
  21. 11 8月, 2011 1 次提交
  22. 22 7月, 2011 1 次提交
  23. 16 4月, 2011 1 次提交
  24. 24 3月, 2011 3 次提交