1. 11 7月, 2012 1 次提交
    • J
      mlx4: Use port management change event instead of smp_snoop · 00f5ce99
      Jack Morgenstein 提交于
      The port management change event can replace smp_snoop.  If the
      capability bit for this event is set in dev-caps, the event is used
      (by the driver setting the PORT_MNG_CHG_EVENT bit in the async event
      mask in the MAP_EQ fw command).  In this case, when the driver passes
      incoming SMP PORT_INFO SET mads to the FW, the FW generates port
      management change events to signal any changes to the driver.
      
      If the FW generates these events, smp_snoop shouldn't be invoked in
      ib_process_mad(), or duplicate events will occur (once from the
      FW-generated event, and once from smp_snoop).
      
      In the case where the FW does not generate port management change
      events smp_snoop needs to be invoked to create these events.  The flow
      in smp_snoop has been modified to make use of the same procedures as
      in the fw-generated-event event case to generate the port management
      events (LID change, Client-rereg, Pkey change, and/or GID change).
      
      Port management change event handling required changing the
      mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument
      (last argument) had to be changed to unsigned long in order to
      accomodate passing the EQE pointer.
      
      We also needed to move the definition of struct mlx4_eqe from
      net/mlx4.h to file device.h -- to make it available to the IB driver,
      to handle port management change events.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      00f5ce99
  2. 09 7月, 2012 1 次提交
  3. 26 6月, 2012 3 次提交
  4. 07 6月, 2012 1 次提交
    • J
      mlx4_core: Fix setting VL_cap in mlx4_SET_PORT wrapper flow · edc4a67e
      Jack Morgenstein 提交于
      Commit 096335b3 ("mlx4_core: Allow dynamic MTU configuration for
      IB ports") modifies the port VL setting.  This exposes a bug in
      mlx4_common_set_port(), where the VL cap value passed in (inside the
      command mailbox) is incorrectly zeroed-out:
      
      mlx4_SET_PORT modifies the VL_cap field (byte 3 of the mailbox).
      Since the SET_PORT command is paravirtualized on the master as well as
      on the slaves, mlx4_SET_PORT_wrapper() is invoked on the master.  This
      calls mlx4_common_set_port() where mailbox byte 3 gets overwritten by
      code which should only set a single bit in that byte (for the reset
      qkey counter flag) -- but instead overwrites the entire byte.
      
      The result is that when running in SR-IOV mode, the VL_cap will be set
      to zero -- fix this.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      edc4a67e
  5. 01 6月, 2012 6 次提交
  6. 18 5月, 2012 1 次提交
    • A
      net/mlx4_en: num cores tx rings for every UP · bc6a4744
      Amir Vadai 提交于
      Change the TX ring scheme such that the number of rings for untagged packets
      and for tagged packets (per each of the vlan priorities) is the same, unlike
      the current situation where for tagged traffic there's one ring per priority
      and for untagged rings as the number of core.
      
      Queue selection is done as follows:
      
      If the mqprio qdisc is operates on the interface, such that the core networking
      code invoked the device setup_tc ndo callback, a mapping of skb->priority =>
      queue set is forced - for both, tagged and untagged traffic.
      
      Else, the egress map skb->priority =>  User priority is used for tagged traffic, and
      all untagged traffic is sent through tx rings of UP 0.
      
      The patch follows the convergence of discussing that issue with John Fastabend
      over this thread http://comments.gmane.org/gmane.linux.network/229877
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Liran Liss <liranl@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc6a4744
  7. 16 5月, 2012 8 次提交
  8. 15 5月, 2012 1 次提交
    • J
      mlx4_core: Change bitmap allocator to work in round-robin fashion · f4ec9e95
      Jack Morgenstein 提交于
      Under most circumstances, the bitmap allocator does not allocate the
      same full 24-bit QP number immediately after a QP is destroyed.
      
      This works by using the upper bits of a 24-bit QP number, beyond the
      number of QPs that are actually available in the low level driver.
      For example, say that the HCA is willing to allocate a maximum of 64K
      qps.  We use the bits 23..16 as a "counter" which is incremented by 1
      at each allocation so that even if the same physical QP is
      re-allocated, it will not receive the same 24-bit QP number.
      
      However, we have seen the following scenario:
      1. Allocate, say, 255 QPs in succession.  This will cause a wrap of the "counter".
      2. Destroy the first QP allocated, then allocate a new QP.  The new QP,
         because of the counter wraparound, will get the same FULL QP number as
         the QP just destroyed!
      
      This is a problem because packets in transit can be erroneously
      delivered to the new QP when they were meant for the old (destroyed)
      QP, because the full QP number of the new QP is identical to the
      destroyed QP.  (The "counter" mechanism is meant to prevent this by
      having the full 24-bit QP numbers differ even if the physical QP on
      the HCA is the same.  As we see above, however, this mechanism does
      not always work).
      
      The best fix for this problem is to allocate QPs in round-robin mode,
      so that the physical QP numbers are not immediately re-used.
      Found-by: NMatthew Finlay <matt@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      f4ec9e95
  9. 09 5月, 2012 1 次提交
    • S
      mlx4_core: Add second capabilities flags field · b3416f44
      Shlomo Pongratz 提交于
      This patch adds a 64-bit flags2 features member to struct mlx4_dev to
      export further features of the hardware.  The original flags field
      tracks features whose support bits are advertised by the firmware in
      offsets 0x40 and 0x44 of the query device capabilities command.
      flags2 will track features whose support bits are scattered at various
      offsets.
      
      RSS support is the first feature to be exported through flags2.  RSS
      capabilities are located at offset 0x2e.  The size of the RSS
      indirection table is also given in this offset.
      Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b3416f44
  10. 24 4月, 2012 3 次提交
  11. 16 4月, 2012 1 次提交
  12. 15 4月, 2012 1 次提交
  13. 05 4月, 2012 6 次提交
    • A
      net/mlx4_en: Set max rate-limit for a TC · 109d2446
      Amir Vadai 提交于
      This patch is using the DCB netlink to set rate limit per ETS TC
      Values are accepted in Kbps and rounded up to the nearest multiply of 100Mbps.
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      109d2446
    • A
      net/mlx4_en: sk_prio <=> UP for untagged traffic · 897d7846
      Amir Vadai 提交于
      Since vlan egress map is only good for tagged traffic, need to have other
      mapping to be used by untagged traffic.
      For that, the driver uses sch_mqprio mapping. This mapping could be set by
      using tc tool from iproute2 package.
      Mapped UP will be used by the HW for QoS purposes, but won't go out on the
      wire.
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      897d7846
    • A
      net/mlx4_en: DCB QoS support · 564c274c
      Amir Vadai 提交于
      Set TSA, promised BW and PFC using IEEE 802.1qaz netlink commands.
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      564c274c
    • A
      net/mlx4_core: set port QoS attributes · e5395e92
      Amir Vadai 提交于
      Adding QoS firmware commands:
      - mlx4_en_SET_PORT_PRIO2TC - set UP <=> TC
      - mlx4_en_SET_PORT_SCHEDULER - set promised BW, max BW and PG number
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5395e92
    • A
      net/mlx4_en: Force user priority by QP attribute · 0e98b523
      Amir Vadai 提交于
      Instead of relying on HW to change schedule queue by UP, schedule
      queue is fixed for a tx_ring, and UP in WQE is ignored in this aspect.  This
      resolves two issues with untagged traffic:
      1. untagged traffic has no UP in packet which is needed for QoS. The change
         above allows setting the schedule queue (and by that the UP) of such a stream.
      2. BlueFlame uses the same field used by vlan tag. So forcing UP from QPC
         allows using BF for untagged but prioritized traffic.
      
      In old firmware that force UP is not supported, untagged traffic will not subject to
      QoS.
      
      Because UP is set by QP, need to always have a tx ring per UP, even if pfcrx
      module paramter is false.
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e98b523
    • T
      mlx4: allocate just enough pages instead of always 4 pages · 117980c4
      Thadeu Lima de Souza Cascardo 提交于
      The driver uses a 2-order allocation, which is too much on architectures
      like ppc64, which has a 64KiB page. This particular allocation is used
      for large packet fragments that may have a size of 512, 1024, 4096 or
      fill the whole allocation. So, a minimum size of 16384 is good enough
      and will be the same size that is used in architectures of 4KiB sized
      pages.
      
      This will avoid allocation failures that we see when the system is under
      stress, but still has plenty of memory, like the one below.
      
      This will also allow us to set the interface MTU to higher values like
      9000, which was not possible on ppc64 without this patch.
      
      Node 1 DMA: 737*64kB 37*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 51904kB
      83137 total pagecache pages
      0 pages in swap cache
      Swap cache stats: add 0, delete 0, find 0/0
      Free swap  = 10420096kB
      Total swap = 10420096kB
      107776 pages RAM
      1184 pages reserved
      147343 pages shared
      28152 pages non-shared
      netstat: page allocation failure. order:2, mode:0x4020
      Call Trace:
      [c0000001a4fa3770] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable)
      [c0000001a4fa3820] [c00000000016af38] .__alloc_pages_nodemask+0x618/0x930
      [c0000001a4fa39a0] [c0000000001a71a0] .alloc_pages_current+0xb0/0x170
      [c0000001a4fa3a40] [d00000000dcc3e00] .mlx4_en_alloc_frag+0x200/0x240 [mlx4_en]
      [c0000001a4fa3b10] [d00000000dcc3f8c] .mlx4_en_complete_rx_desc+0x14c/0x250 [mlx4_en]
      [c0000001a4fa3be0] [d00000000dcc4eec] .mlx4_en_process_rx_cq+0x62c/0x850 [mlx4_en]
      [c0000001a4fa3d20] [d00000000dcc5150] .mlx4_en_poll_rx_cq+0x40/0x90 [mlx4_en]
      [c0000001a4fa3dc0] [c0000000004e2bb8] .net_rx_action+0x178/0x450
      [c0000001a4fa3eb0] [c00000000009c9b8] .__do_softirq+0x118/0x290
      [c0000001a4fa3f90] [c000000000031df8] .call_do_softirq+0x14/0x24
      [c000000184c3b520] [c00000000000e700] .do_softirq+0xf0/0x110
      [c000000184c3b5c0] [c00000000009c6d4] .irq_exit+0xb4/0xc0
      [c000000184c3b640] [c00000000000e964] .do_IRQ+0x144/0x230
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: NKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Tested-by: NKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      117980c4
  14. 20 3月, 2012 1 次提交
  15. 13 3月, 2012 4 次提交
  16. 08 3月, 2012 1 次提交