- 01 6月, 2012 5 次提交
-
-
由 Jack Morgenstein 提交于
The range check was performed after using the port number. Reverse this to prevent a potential array overflow. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jack Morgenstein 提交于
- pass the following parameters: - firmware version (added QUERY_FW paravirtualization for that) - disable Blueflame on slaves. KVM disables write combining on guests, and we get better performance without BF in this case. (This requires QUERY_DEV_CAP paravirtualization, also in this commit) - max qp rdma as destination - get rid of a chunk of "if (0)" dead code Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jack Morgenstein 提交于
Port is used as an array index before we know if that is proper. For example, in the catas event case, port is zero; however, the port index should lie in the range (1..2). Fix this by using 'port' only in the events where it is of interest. Test for port out of range in the default (unhandled event) case, and do not output a message if it is not an ethernet port. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Marcel Apfelbaum 提交于
In SRIOV mode, the number of EQs used when computing the total ICM size was incorrect. To fix this, we do the following: 1. We add a new structure to mlx4_dev, mlx4_phys_caps, to contain physical HCA capabilities. The PPF uses the phys capabilities when it computes things like ICM size. The dev_caps structure will then contain the paravirtualized values, making bookkeeping much easier in SRIOV mode. We add a structure rather than a single parameter because there will be other fields in the phys_caps. The first field we add to the mlx4_phys_caps structure is num_phys_eqs. 2. In INIT_HCA, when running in SRIOV mode, the "log_num_eqs" parameter passed to the FW is the number of EQs per VF/PF; each function (PF or VF) has this number of EQs available. However, the total number of EQs which must be allowed for in the ICM is (1 << log_num_eqs) * (#VFs + #PFs). Rather than compute this quantity, we allocate ICM space for 1024 EQs (which is the device maximum number of EQs, and which is the value we place in the mlx4_phys_caps structure). For INIT_HCA, however, we use the per-function number of EQs as described above. Signed-off-by: NMarcel Apfelbaum <marcela@dev.mellanox.co.il> Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jack Morgenstein 提交于
Ths fixes the comparison in the FLR (Function Level Reset) event case. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 5月, 2012 1 次提交
-
-
由 Amir Vadai 提交于
Change the TX ring scheme such that the number of rings for untagged packets and for tagged packets (per each of the vlan priorities) is the same, unlike the current situation where for tagged traffic there's one ring per priority and for untagged rings as the number of core. Queue selection is done as follows: If the mqprio qdisc is operates on the interface, such that the core networking code invoked the device setup_tc ndo callback, a mapping of skb->priority => queue set is forced - for both, tagged and untagged traffic. Else, the egress map skb->priority => User priority is used for tagged traffic, and all untagged traffic is sent through tx rings of UP 0. The patch follows the convergence of discussing that issue with John Fastabend over this thread http://comments.gmane.org/gmane.linux.network/229877 Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Liran Liss <liranl@mellanox.com> Signed-off-by: NAmir Vadai <amirv@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 5月, 2012 8 次提交
-
-
由 Jack Morgenstein 提交于
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jack Morgenstein 提交于
Add missing resource tracking for XRC domains and complete the tracking for HCA network flow counters. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jack Morgenstein 提交于
Currently the slave and master resources are deleted after master freed all bitmaps. If any resources were not properly cleaned up during the shutdown process, an Oops would result. Fix so that delete slave (only) resources during cleanup. Master resources are cleaned up during unload process, and need not separately be cleaned. Note that during cleanup, we need to split the resource-tracker freeing functionality. Before removing all the bitmaps, we free any leftover slave resources. However, we can only remove the resource tracker linked list after all bitmap frees, since some of the freeing functions (e.g., mlx4_cleanup_eq_table) use paravirtualized FW commands which expect the resource tracker linked list to be present. Found-by: NAviad Yehezkel <aviadye@mellanox.com> Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jack Morgenstein 提交于
Consider the following scenario: 2 HCAs, where only one of which can run SRIOV. If we reset the module parameter, all the VFs of the SRIOV HCA will be claimed by the PPF host (-- the code relies on num_vfs being non-zero to avoid this claiming, and num_vfs was reset when pci_enable_sriov failed for the non-SRIOV HCA). The solution is not to touch the num_vfs parameter. Also, eliminate the unneeded check of num_vfs when disabling sriov (the dev flag bit is sufficient). Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jack Morgenstein 提交于
Removed unsued *_str helper functions from resource_tracker.c Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jack Morgenstein 提交于
The "wrapped" was incorrect, since no wrapper function was defined. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jack Morgenstein 提交于
In function mlx4_INIT_PORT_wrapper, the port state mask for the slave is only set if we are invoking the INIT_PORT fw command. However, the reference count for the (initialized) port is incremented anyway. This creates a problem in that when we have multiple slaves, then the CLOSE_PORT command will never be invoked. The reason is that in the CLOSE_PORT wrapper, if the port-state mask is zero for the slave (which it is), the wrapper returns without doing anything. The only slave which will not return immediately in the CLOSE_PORT wrapper is that slave for which INIT_PORT was invoked. The fix is to not have the port-state mask setting depend on the logic for calling the INIT_PORT fw command. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Or Gerlitz 提交于
Handle the compiler warnings on variables which are set but not used by removing the relevant variable or casting a return value which is ignored on purpose to void. Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 5月, 2012 1 次提交
-
-
由 Jack Morgenstein 提交于
Under most circumstances, the bitmap allocator does not allocate the same full 24-bit QP number immediately after a QP is destroyed. This works by using the upper bits of a 24-bit QP number, beyond the number of QPs that are actually available in the low level driver. For example, say that the HCA is willing to allocate a maximum of 64K qps. We use the bits 23..16 as a "counter" which is incremented by 1 at each allocation so that even if the same physical QP is re-allocated, it will not receive the same 24-bit QP number. However, we have seen the following scenario: 1. Allocate, say, 255 QPs in succession. This will cause a wrap of the "counter". 2. Destroy the first QP allocated, then allocate a new QP. The new QP, because of the counter wraparound, will get the same FULL QP number as the QP just destroyed! This is a problem because packets in transit can be erroneously delivered to the new QP when they were meant for the old (destroyed) QP, because the full QP number of the new QP is identical to the destroyed QP. (The "counter" mechanism is meant to prevent this by having the full 24-bit QP numbers differ even if the physical QP on the HCA is the same. As we see above, however, this mechanism does not always work). The best fix for this problem is to allocate QPs in round-robin mode, so that the physical QP numbers are not immediately re-used. Found-by: NMatthew Finlay <matt@mellanox.com> Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 09 5月, 2012 1 次提交
-
-
由 Shlomo Pongratz 提交于
This patch adds a 64-bit flags2 features member to struct mlx4_dev to export further features of the hardware. The original flags field tracks features whose support bits are advertised by the firmware in offsets 0x40 and 0x44 of the query device capabilities command. flags2 will track features whose support bits are scattered at various offsets. RSS support is the first feature to be exported through flags2. RSS capabilities are located at offset 0x2e. The size of the RSS indirection table is also given in this offset. Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 24 4月, 2012 3 次提交
-
-
由 Yevgeny Petrilin 提交于
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yevgeny Petrilin 提交于
Moving to interrupts instead of polling fpr TX completions Avoiding situations where skb can be held in by the driver for a long time (till timer expires). The change is also necessary for supporting BQL. Removing comp_lock that was required because we could handle TX completions from several contexts: Interrupts, timer, polling. Now there is only interrupts Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yevgeny Petrilin 提交于
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 4月, 2012 1 次提交
-
-
由 Paul Gortmaker 提交于
Commit 109d2446 "net/mlx4_en: Set max rate-limit for a TC" introduced 64 bit math operations into mlx4_en_dcbnl_ieee_setmaxrate() causing the following final link failure on an x86_32 allmodconfig ERROR: "__udivdi3" [drivers/net/ethernet/mellanox/mlx4/mlx4_en.ko] undefined! Convert it to use div_u64() instead. Cc: Amir Vadai <amirv@mellanox.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 4月, 2012 1 次提交
-
-
由 Masanari Iida 提交于
Correct spelling typo within drivers/net. Signed-off-by: NMasanari Iida <standby24x7@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 4月, 2012 6 次提交
-
-
由 Amir Vadai 提交于
This patch is using the DCB netlink to set rate limit per ETS TC Values are accepted in Kbps and rounded up to the nearest multiply of 100Mbps. Signed-off-by: NAmir Vadai <amirv@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amir Vadai 提交于
Since vlan egress map is only good for tagged traffic, need to have other mapping to be used by untagged traffic. For that, the driver uses sch_mqprio mapping. This mapping could be set by using tc tool from iproute2 package. Mapped UP will be used by the HW for QoS purposes, but won't go out on the wire. Signed-off-by: NAmir Vadai <amirv@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amir Vadai 提交于
Set TSA, promised BW and PFC using IEEE 802.1qaz netlink commands. Signed-off-by: NAmir Vadai <amirv@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amir Vadai 提交于
Adding QoS firmware commands: - mlx4_en_SET_PORT_PRIO2TC - set UP <=> TC - mlx4_en_SET_PORT_SCHEDULER - set promised BW, max BW and PG number Signed-off-by: NAmir Vadai <amirv@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amir Vadai 提交于
Instead of relying on HW to change schedule queue by UP, schedule queue is fixed for a tx_ring, and UP in WQE is ignored in this aspect. This resolves two issues with untagged traffic: 1. untagged traffic has no UP in packet which is needed for QoS. The change above allows setting the schedule queue (and by that the UP) of such a stream. 2. BlueFlame uses the same field used by vlan tag. So forcing UP from QPC allows using BF for untagged but prioritized traffic. In old firmware that force UP is not supported, untagged traffic will not subject to QoS. Because UP is set by QP, need to always have a tx ring per UP, even if pfcrx module paramter is false. Signed-off-by: NAmir Vadai <amirv@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
The driver uses a 2-order allocation, which is too much on architectures like ppc64, which has a 64KiB page. This particular allocation is used for large packet fragments that may have a size of 512, 1024, 4096 or fill the whole allocation. So, a minimum size of 16384 is good enough and will be the same size that is used in architectures of 4KiB sized pages. This will avoid allocation failures that we see when the system is under stress, but still has plenty of memory, like the one below. This will also allow us to set the interface MTU to higher values like 9000, which was not possible on ppc64 without this patch. Node 1 DMA: 737*64kB 37*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 51904kB 83137 total pagecache pages 0 pages in swap cache Swap cache stats: add 0, delete 0, find 0/0 Free swap = 10420096kB Total swap = 10420096kB 107776 pages RAM 1184 pages reserved 147343 pages shared 28152 pages non-shared netstat: page allocation failure. order:2, mode:0x4020 Call Trace: [c0000001a4fa3770] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable) [c0000001a4fa3820] [c00000000016af38] .__alloc_pages_nodemask+0x618/0x930 [c0000001a4fa39a0] [c0000000001a71a0] .alloc_pages_current+0xb0/0x170 [c0000001a4fa3a40] [d00000000dcc3e00] .mlx4_en_alloc_frag+0x200/0x240 [mlx4_en] [c0000001a4fa3b10] [d00000000dcc3f8c] .mlx4_en_complete_rx_desc+0x14c/0x250 [mlx4_en] [c0000001a4fa3be0] [d00000000dcc4eec] .mlx4_en_process_rx_cq+0x62c/0x850 [mlx4_en] [c0000001a4fa3d20] [d00000000dcc5150] .mlx4_en_poll_rx_cq+0x40/0x90 [mlx4_en] [c0000001a4fa3dc0] [c0000000004e2bb8] .net_rx_action+0x178/0x450 [c0000001a4fa3eb0] [c00000000009c9b8] .__do_softirq+0x118/0x290 [c0000001a4fa3f90] [c000000000031df8] .call_do_softirq+0x14/0x24 [c000000184c3b520] [c00000000000e700] .do_softirq+0xf0/0x110 [c000000184c3b5c0] [c00000000009c6d4] .irq_exit+0xb4/0xc0 [c000000184c3b640] [c00000000000e964] .do_IRQ+0x144/0x230 Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: NKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Tested-by: NKleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 3月, 2012 1 次提交
-
-
由 Eugenia Emantayev 提交于
Prevent race condition between commands on comm channel. Happened while unloading the driver when switching from event to polling mode. VF got completion on the last command before switching to polling mode, but toggle was not changed. After the fix - VF will not write the next command before toggle is updated. Signed-off-by: NEugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 3月, 2012 4 次提交
-
-
由 Roland Dreier 提交于
The current driver defaults to 1M MTT segments, where each segment holds 8 MTT entries. This limits the total memory registered to 8M * PAGE_SIZE which is 32GB with 4K pages. Since systems that have much more memory are pretty common now (at least among systems with InfiniBand hardware), this limit ends up getting hit in practice quite a bit. Handle this by having the driver allocate at least enough MTT entries to cover 2 * totalram pages. Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Or Gerlitz 提交于
Set the MTU for IB ports in the driver instead of using the firmware default of 2KB (the driver defaults to 4KB). Allow for dynamic mtu configuration through a new, per-port sysfs entry. Since there's a dependency between the port MTU and the max number of HW VLs the port can support, apply a mim/max approach, using a loop that goes down from the highest possible number of VLs to the lowest, using the firmware return status to know whether the requested number of VLs is possible with a given MTU. For now, as with the dynamic link type change / VPI support, the sysfs entry to change the mtu is exposed only when NOT running in SR-IOV mode. To allow changing the MTU for the master in SR-IOV mode, primary-function-initiated FLR (Function Level Reset) needs to be implemented. Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Morgenstein 提交于
Print an error message when a thermal error async event is reported by the HW. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NDotan Barak <dotanb@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Roland Dreier 提交于
Commit 22c8bff6 ("mlx4_core: Exported functions can't be static") fixed most of this up, but forgot about mlx4_is_slave_active(). Fix this one too. Cc: <stable@vger.kernel.org> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 08 3月, 2012 1 次提交
-
-
由 Jack Morgenstein 提交于
The actual FW command is called in procedure "handle_resize". Code incorrectly invoked the FW command again (in good flow), in the modify_cq wrapper function. Fix by skipping second FW invocation unconditionally for resize. Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 3月, 2012 7 次提交
-
-
由 Or Gerlitz 提交于
While doing the work for commit a6f7feae ("IB/mlx4: pass SMP vendor-specific attribute MADs to firmware") we realized that the firmware would respond on all sorts of vendor-specific MADs. Therefore commit 97285b78 ("mlx4_core: Add extended port capabilities support") adds redundant code into the driver, since there's no real reaon to maintain the extended capabilities of the port, as they can be queried on demand (e.g the FDR10 capability). This patch reverts commit 97285b78 and removes the check for extended caps from the mlx4_ib driver port query flow. Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Yevgeny Petrilin 提交于
Fixing sparse warnings, the 2 functions are only used in same file. Defining them as static and not exporting them. Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yevgeny Petrilin 提交于
Removing functions that are no longer in use, but still exist Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yevgeny Petrilin 提交于
The SET_PORT functions are implemented in port.c, which is part of mlx4_core, these functions are exported. The functions are in use by the mlx4_en module (were originally part of mlx4_en). Their declaration remained in mlx4_en module, moving the declaration to the right location. Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yevgeny Petrilin 提交于
The mac should be written as __be64 the gid. The warning was because we changed the mac parameter, which is u64, by writing result of cpu_to_be64 into it. Fixing by using new variable of type __be64. Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yevgeny Petrilin 提交于
The keys used for the hardware RSS topelitz hash are of type __be32 where the values provided by the driver are from array of u32, this triggered sparse warning on incorrect type in assignment as of different base types. Since these values are picked randomly, the fix is to transform the key to __be32 by executing cpu_to_be_32() Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Or Gerlitz 提交于
The blue flame buffer is defined to be of type void __iomem * but was passed to mlx4_bf_copy which gets unsigned long * . This triggered sparse warning on different address spaces, fix that by changing mlx4_bf_copy first param to be of type void __iomem * . Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-