- 11 3月, 2017 3 次提交
-
-
由 Ido Schimmel 提交于
When a VLAN device is configured on top of a LAG device (f.e., bond0.10), a vPort is created on top of each of the LAG's slaves and its 'dev' pointer is set to the VLAN device. This is in contrast to the implicit PVID vPort (representing 'bond0'), whose 'dev' pointer keeps pointing to the port netdev itself (f.e., 'sw1p1'). Make both cases consistent by setting their 'dev' pointer to the actual netdev they represent. Either the LAG device itself (in the case of the PVID vPort) or the VLAN device on top of it. This will later allow us to more easily understand for which netdev we should create the router interface (RIF) upon enslavement to a VRF master. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
When an upper device is configured on top of a vPort we make sure it's a bridge master during PRECHANGEUPPER and fail otherwise. Therefore, when CHANGEUPPER is later received we don't bother checking the upper's type. Make the code more extendable in preparation for VRF uppers, by checking the upper's type. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
We're going to allow bridges stacked on top of port netdevs to be enslaved to a VRF, but for now, only VLAN uppers of the VLAN-aware bridge are supported. Sanitize any other bridge upper. This is consistent with the way we sanitize port netdevs' uppers. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 3月, 2017 15 次提交
-
-
由 Petr Machata 提交于
Introduce MLXSW_AFK_ELEMENT_VID, PCP and declare them in afk_element infos that contain them. Use the elements when VLAD ID or priority are used in the flow. Also add MLXSW_AFK_ELEMENT_VID, PCP to mlxsw_sp_acl_tcam_pattern_ipv4. Both items are included in mlxsw_sp_afk_element_info_l2_dmac, resp. _smac, and both MLXSW_AFK_ELEMENT_SMAC and _DMAC are already in the pattern. Signed-off-by: NPetr Machata <petrm@mellanox.com> Reviewed-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Petr Machata 提交于
Add VLAN action offloading. Invoke it from Spectrum flower handler for "vlan modify" actions. Signed-off-by: NPetr Machata <petrm@mellanox.com> Reviewed-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We should keep one way to build skbs, regardless of GRO being on or off. Note that I made sure to defer as much as possible the point we need to pull data from the frame, so that future prefetch() we might add are more effective. These skb attributes derive from the CQE or ring : ip_summed, csum hash vlan offload hwtstamps queue_mapping As a bonus, this patch removes mlx4 dependency on eth_get_headlen() which is very often broken enough to give us headaches. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Testing a boolean in fast path is not worth duplicating the code allocating packets, when GRO is on or off. If this proves to be a problem, we might later use a jump label. Next patch will remove this duplicated code and ease code review. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We need to compute the frame virtual address at different points. Do it once. Following patch will use the new va address for validate_loopback() Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Instead of fetching dma address from rx_desc->data[0].addr, prefer using frags[0].dma + frags[0].page_offset to avoid a potential cache line miss. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
This new counter tracks number of pages that we allocated for one port. lpaa24:~# ethtool -S eth0 | egrep 'rx_alloc_pages|rx_packets' rx_packets: 306755183 rx_alloc_pages: 932897 Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Same technique than some Intel drivers, for arches where PAGE_SIZE = 4096 In most cases, pages are reused because they were consumed before we could loop around the RX ring. This brings back performance, and is even better, a single TCP flow reaches 30Gbit on my hosts. v2: added full memset() in mlx4_en_free_frag(), as Tariq found it was needed if we switch to large MTU, as priv->log_rx_info can dynamically be changed. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Use of order-3 pages is problematic in some cases. This patch might add three kinds of regression : 1) a CPU performance regression, but we will add later page recycling and performance should be back. 2) TCP receiver could grow its receive window slightly slower, because skb->len/skb->truesize ratio will decrease. This is mostly ok, we prefer being conservative to not risk OOM, and eventually tune TCP better in the future. This is consistent with other drivers using 2048 per ethernet frame. 3) Because we allocate one page per RX slot, we consume more memory for the ring buffers. XDP already had this constraint anyway. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We will soon use order-0 pages, and frag truesize will more precisely match real sizes. In the new model, we prefer to use <= 2048 bytes fragments, so that we can use page-recycle technique on PAGE_SIZE=4096 arches. We will still pack as much frames as possible on arches with big pages, like PowerPC. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We only need to store the page and dma address. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
No need to duplicate it per RX queue / frags. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Using per frag storage for frag_prefix_size is really silly. mlx4_en_complete_rx_desc() has all needed info already. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
This is really a port attribute, no need to duplicate it per RX queue and per frag. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
No need to duplicate it for all queues and frags. num_frags & log_rx_info become u8 to save space. u8 accesses are a bit faster than u16 anyway. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 3月, 2017 2 次提交
-
-
由 Ido Schimmel 提交于
The overrun ignore bit isn't supported by the device's firmware and was recently removed from the programmer's reference manual (PRM). Remove it from the driver as well. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Commit dd82364c ("mlxsw: Flip to the new dev walk API") did some small changes in mlxsw code, but it did not respect the naming conventions. So fix this now. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 3月, 2017 1 次提交
-
-
由 Ido Schimmel 提交于
When the structure of the LPM tree changes (f.e., due to the addition of a new prefix), we unbind the old tree and then bind the new one. This may result in temporary packet loss. Instead, overwrite the old binding with the new one. Fixes: 6b75c480 ("mlxsw: spectrum_router: Add virtual router management") Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 2月, 2017 1 次提交
-
-
由 Eric Dumazet 提交于
The cited commit makes a great job of finding optimal shift/multiplier values assuming a 10 seconds wrap around, but forgot to change the overflow_period computation. It overflows in cyclecounter_cyc2ns(), and the final result is 804 ms, which is silly. Lets simply use 5 seconds, no need to recompute this, given how it is supposed to work. Later, we will use a timer instead of a work queue, since the new RX allocation schem will no longer need mlx4_en_recover_from_oom() and the service_task firing every 250 ms. Fixes: 31c128b6 ("net/mlx4_en: Choose time-stamping shift value according to HW frequency") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 2月, 2017 11 次提交
-
-
由 Eric Dumazet 提交于
Or we might miss the fact that a page was allocated from memory reserves. Fixes: dceeab0e ("mlx4: support __GFP_MEMALLOC for rx") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jack Morgenstein 提交于
When creating EQs to handle CQ completion events for the PF or for VFs, we create enough EQE entries to handle completions for the max number of CQs that can use that EQ. When SRIOV is activated, the max number of CQs a VF (or the PF) can obtain is its CQ quota (determined by the Hypervisor resource tracker). Therefore, when creating an EQ, the number of EQE entries that the VF should request for that EQ is the CQ quota value (and not the total number of CQs available in the FW). Under SRIOV, the PF, also must use its CQ quota, because the resource tracker also controls how many CQs the PF can obtain. Using the FW total CQs instead of the CQ quota when creating EQs resulted wasting MTT entries, due to allocating more EQEs than were needed. Fixes: 5a0d0a61 ("mlx4: Structures and init/teardown for VF resource quotas") Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Reported-by: NDexuan Cui <decui@microsoft.com> Signed-off-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Majd Dibbiny 提交于
In the VF driver, module parameter mlx4_log_num_mgm_entry_size was mistakenly overwritten -- and in a manner which overrode the device-managed flow steering option encoded in the parameter. log_num_mgm_entry_size is a global module parameter which affects all ConnectX-3 PFs installed on that host. If a VF changes log_num_mgm_entry_size, this will affect all PFs which are probed subsequent to the change (by disabling DMFS for those PFs). Fixes: 3c439b55 ("mlx4_core: Allow choosing flow steering mode") Signed-off-by: NMajd Dibbiny <majd@mellanox.com> Reviewed-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eugenia Emantayev 提交于
Spoofcheck can't be enabled if VF MAC is zero. Vice versa, can't zero MAC if spoofcheck is on. Fixes: 8f7ba3ca ('net/mlx4: Add set VF mac address support') Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com> Signed-off-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Or Gerlitz 提交于
As ENOTSUPP is specific to NFS, change the return error value to EOPNOTSUPP in various places in the mlx4 driver. Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Suggested-by: NYotam Gigi <yotamg@mellanox.com> Reviewed-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tariq Toukan 提交于
In cqe compression with striding RQ, the decompression of the CQE field wqe_counter was done with a wrong wraparound value. This caused handling cqes with a wrong pointer to wqe (rx descriptor) and creating SKBs with wrong data, pointing to wrong (and already consumed) strides/pages. The meaning of the CQE field wqe_counter in striding RQ holds the stride index instead of the WQE index. Hence, when decompressing a CQE, wqe_counter should have wrapped-around the number of strides in a single multi-packet WQE. We dropped this wrap-around mask at all in CQE decompression of striding RQ. It is not needed as in such cases the CQE compression session would break because of different value of wqe_id field, starting a new compression session. Tested: ethtool -K ethxx lro off/on ethtool --set-priv-flags ethxx rx_cqe_compress on super_netperf 16 {ipv4,ipv6} -t TCP_STREAM -m 50 -D verified no csum errors and no page refcount issues. Fixes: 7219ab34 ("net/mlx5e: CQE compression") Signed-off-by: NTariq Toukan <tariqt@mellanox.com> Reported-by: NTom Herbert <tom@herbertland.com> Cc: kernel-team@fb.com Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Saeed Mahameed 提交于
When the admin enables/disables cqe compression, updating mpwqe stride size is required: CQE compress ON ==> stride size = 256B CQE compress OFF ==> stride size = 64B This is already done on driver load via mlx5e_set_rq_type_params, all we need is just to call it on arbitrary admin changes of cqe compression state via priv flags or when changing timestamping state (as it is mutually exclusive with cqe compression). This bug introduces no functional damage, it only makes cqe compression occur less often, since in ConnectX4-LX CQE compression is performed only on packets smaller than stride size. Tested: ethtool --set-priv-flags ethxx rx_cqe_compress on pktgen with 64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6) verify `ethtool -S ethxx | grep compress` are advancing more often (rapidly) Fixes: 7219ab34 ("net/mlx5e: CQE compression") Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tariq Toukan 提交于
Some of RQ type parameters are derived from CQE compression state flag, CQE compression flag was initialized only after RQ type parameters setup. This leads to load RQ with stride size smaller than what we want for when CQE compression is on. This bug introduces no functional damage, it only makes CQE compression occur less often, since in ConnectX4-LX CQE compression is performed only on packets smaller than stride size. Fix this by marking default status of CQE compression in PFLAG prior to calling mlx5e_set_rq_priv_params(), as it inits some fields based on it. Tested: load driver on systems where rx CQE compress will be on (MH) pktgen with 64 < pkt size < 256 and netperf TCP_STREAM (IPv4/IPv6) verify `ethtool -S ethxx | grep compress` are advancing more often (rapidly) Fixes: 2fc4bfb7 ("net/mlx5e: Dynamic RQ type infrastructure") Signed-off-by: NTariq Toukan <tariqt@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tariq Toukan 提交于
When rq_type is Striding RQ, no room of SKB_RESERVE is needed as SKB allocation is not done via build_skb. Fixes: e4b85508 ("net/mlx5e: Slightly reduce hardware LRO size") Signed-off-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Saeed Mahameed 提交于
Currently vport representors are added only on driver load and removed on driver unload. Apparently we forgot to handle them when we added the seamless reset flow feature. This caused to leave the representors netdevs alive and active with open HW resources on pci shutdown and on error reset flows. To overcome this we move their handling to interface attach/detach, so they would be cleaned up on shutdown and recreated on reset flows. Fixes: 26e59d80 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks") Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Reviewed-by: NHadar Hen Zion <hadarh@mellanox.com> Reviewed-by: NRoi Dayan <roid@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Mohamad Haj Yahia 提交于
Add necessary headers include for s390 arch compilation. Fixes: e586b3b0 ("net/mlx5: Ethernet Datapath files") Fixes: d605d668 ("net/mlx5e: Add support for ethtool self..") Signed-off-by: NMohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 2月, 2017 2 次提交
-
-
由 Eric Dumazet 提交于
Since mlx4 NIC are used on PowerPC with 64K pages, we need to adapt MLX4_EN_ALLOC_PREFER_ORDER definition. Otherwise, a fragment sitting in an out of order TCP queue can hold 0.5 Mbytes and it is a serious OOM risk. Fixes: 51151a16 ("mlx4: allow order-0 memory allocations in RX path") Signed-off-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
1) In the case where rate == priv->pkt_rate_low == priv->pkt_rate_high, mlx4_en_auto_moderation() does a divide by zero. 2) We want to properly change the moderation parameters if rx_frames was changed (like in ethtool -C eth0 rx-frames 16) Signed-off-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 2月, 2017 1 次提交
-
-
由 Eric Dumazet 提交于
All rx and rx netdev interrupts are handled by respectively by mlx4_en_rx_irq() and mlx4_en_tx_irq() which simply schedule a NAPI. But mlx4_eq_int() also fires a tasklet to service all items that were queued via mlx4_add_cq_to_tasklet(), but this handler was not called unless user cqe was handled. This is very confusing, as "mpstat -I SCPU ..." show huge number of tasklet invocations. This patch saves this overhead, by carefully firing the tasklet directly from mlx4_add_cq_to_tasklet(), removing four atomic operations per IRQ. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Acked-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 2月, 2017 2 次提交
-
-
由 Jiri Pirko 提交于
Current behaviour of "mirred redirect" action (forward) offload is a bit odd. For matched packets the action forwards them to the desired destination, but it also lets the packet duplicates to go the original way down (bridge, router, etc). That is more like "mirred mirror". Fix this by using PBS type which behaves exactly like "mirred redirect". Note that PBS does not support loopback mode. Fixes: 4cda7d8d ("mlxsw: core: Introduce flexible actions support") Signed-off-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Using a reader-writer lock in fast path is silly, when we can instead use RCU or a seqlock. For mlx4 hwstamp clock, a seqlock is the way to go, removing two atomic operations and false sharing. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 2月, 2017 2 次提交
-
-
由 Nogah Frankel 提交于
Point back the unregister IPv6 mc table to the bc table. It is done since IPv6 mcast snooping is not supported for Spectrum yet. Reported-by: NJiri Pirko <jiri@mellanox.com> Fixes: 71c365bd ("mlxsw: spectrum: Separate bc and mc floods") Signed-off-by: NNogah Frankel <nogahf@mellanox.com> Signed-off-by: NYotam Gigi <yotamg@mellanox.com> Tested-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Or Gerlitz 提交于
When called by HW offloading drivers, the TC action (e.g net/sched/act_mirred.c) code uses this_cpu logic, e.g _bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets) per the kernel documention, preemption should be disabled, add that. Before the fix, when running with CONFIG_PREEMPT set, we get a BUG: using smp_processor_id() in preemptible [00000000] code: tc/3793 asserion from the TC action (mirred) stats_update callback. Fixes: aad7e08d ('net/mlx5e: Hardware offloaded flower filter statistics support') Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-