- 22 11月, 2020 2 次提交
-
-
由 Huazhong Tan 提交于
For device who has device memory accessed through the PCI BAR4, IO descriptor push of NIC and direct WQE(Work Queue Element) of RoCE will use this device memory, so add support for mapping this device memory, and add this info to the RoCE client whose new feature needs. Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Yonglong Liu 提交于
For DEVICE_VERSION_V1/2, there are total 1024 queues and queue sets. For DEVICE_VERSION_V3, it increases to 1280, and can be assigned to one pf, so remove the limitation of 1024. To keep compatible with DEVICE_VERSION_V1/2 and old driver version, the queue number is split into two part: tqp_num(range 0~1023) and ext_tqp_num(range 1024~1279). Signed-off-by: NYonglong Liu <liuyonglong@huawei.com> Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 21 11月, 2020 38 次提交
-
-
由 Jakub Kicinski 提交于
Thomas Falcon says: ==================== ibmvnic: Performance improvements and other updates The first three patches utilize a hypervisor call allowing multiple TX and RX buffer replenishment descriptors to be sent in one operation, which significantly reduces hypervisor call overhead. The xmit_more and Byte Queue Limit API's are leveraged to provide this support for TX descriptors. The subsequent two patches remove superfluous code and members in TX completion handling function and TX buffer structure, respectively, and remove unused routines. Finally, four patches which ensure that device queue memory is cache-line aligned, resolving slowdowns observed in PCI traces, as well as optimize the driver's NAPI polling function and to RX buffer replenishment are provided by Dwip Banerjee. This series provides significant performance improvements, allowing the driver to fully utilize 100Gb NIC's. ==================== Link: https://lore.kernel.org/r/1605748345-32062-1-git-send-email-tlfalcon@linux.ibm.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Dwip N. Banerjee 提交于
Reduce the amount of time spent replenishing RX buffers by only doing so once available buffers has fallen under a certain threshold, in this case half of the total number of buffers, or if the polling loop exits before the packets processed is less than its budget. Non-exhaustion of NAPI budget implies lower incoming packet pressure, allowing the leeway to refill the buffers in preparation for any impending burst. Signed-off-by: NDwip N. Banerjee <dnbanerg@us.ibm.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Dwip N. Banerjee 提交于
Take advantage of the additional optimizations in netdev_alloc_skb when allocating socket buffers to be used for packet reception. Signed-off-by: NDwip N. Banerjee <dnbanerg@us.ibm.com> Acked-by: NLijun Pan <ljp@linux.ibm.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Dwip N. Banerjee 提交于
If the current NAPI polling loop exits without completing it's budget, only re-enable interrupts if there are no entries remaining in the queue and napi_complete_done is successful. If there are entries remaining on the queue that were missed, restart the polling loop. Signed-off-by: NDwip N. Banerjee <dnbanerg@us.ibm.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Dwip N. Banerjee 提交于
PCI bus slowdowns were observed on IBM VNIC devices as a result of partial cache line writes and non-cache aligned full cache line writes. Ensure that packet data buffers are cache-line aligned to avoid these slowdowns. Signed-off-by: NDwip N. Banerjee <dnbanerg@us.ibm.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Thomas Falcon 提交于
It is not longer used, so remove it. Signed-off-by: NThomas Falcon <tlfalcon@linux.ibm.com> Acked-by: NLijun Pan <ljp@linux.ibm.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Thomas Falcon 提交于
Remove unused and superfluous code and members in existing TX implementation and data structures. Signed-off-by: NThomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Thomas Falcon 提交于
Include support for the xmit_more feature utilizing the H_SEND_SUB_CRQ_INDIRECT hypervisor call which allows the sending of multiple subordinate Command Response Queue descriptors in one hypervisor call via a DMA-mapped buffer. This update reduces hypervisor calls and thus hypervisor call overhead per TX descriptor. Signed-off-by: NThomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Thomas Falcon 提交于
Utilize the H_SEND_SUB_CRQ_INDIRECT hypervisor call to send multiple RX buffer descriptors to the device in one hypervisor call operation. This change will reduce the number of hypervisor calls and thus hypervisor call overhead needed to transmit RX buffer descriptors to the device. Signed-off-by: NThomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Thomas Falcon 提交于
This patch introduces the infrastructure to send batched subordinate Command Response Queue descriptors, which are used by the ibmvnic driver to send TX frame and RX buffer descriptors. Signed-off-by: NThomas Falcon <tlfalcon@linux.ibm.com> Acked-by: NLijun Pan <ljp@linux.ibm.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
Alex Elder says: ==================== net: ipa: add a driver shutdown callback The final patch in this series adds a driver shutdown callback for the IPA driver. The patches leading up to that address some issues encountered while ensuring that callback worked as expected: - The first just reports a little more information when channels or event rings are in unexpected states - The second patch recognizes a condition where an as-yet-unused channel does not require a reset during teardown - The third patch explicitly ignores a certain error condition, because it can't be avoided, and is harmless if it occurs - The fourth properly handles requests to retry a channel HALT request - The fifth makes a second attempt to stop modem activity during shutdown if it's busy The shutdown callback is implemented by calling the existing remove callback function (reporting if that returns an error). ==================== Link: https://lore.kernel.org/r/20201119224929.23819-1-elder@linaro.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Alex Elder 提交于
A system shutdown can happen at essentially any time, and it's possible that the IPA driver is busy when a shutdown is underway. IPA hardware accesses IMEM and SMEM memory regions using an IOMMU, and at some point during shutdown, needed I/O mappings could become invalid. This could be disastrous for any "in flight" IPA activity. Avoid this by defining a new driver shutdown callback that stops all IPA activity and cleanly shuts down the driver. It merely calls the driver's existing remove callback, reporting the error if it returns one. Signed-off-by: NAlex Elder <elder@linaro.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Alex Elder 提交于
The IPA driver remove callback, ipa_remove(), calls ipa_modem_stop() if the setup stage of initialization is complete. If a concurrent call to ipa_modem_start() or ipa_modem_stop() has begin but not completed, ipa_modem_stop() can return an error (-EBUSY). The next patch adds a driver shutdown callback, which will simply call ipa_remove(). We really want our shutdown callback to clean things up. So add a single retry to the ipa_modem_stop() call in ipa_remove() after a short (millisecond) delay. This offers no guarantee the shutdown will complete successfully, but we'll at least try a little harder before giving up. Signed-off-by: NAlex Elder <elder@linaro.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Alex Elder 提交于
When stopping an AP RX channel, there can be a transient period while the channel enters STOP_IN_PROC state before reaching the final STOPPED state. In that case we make another attempt to stop the channel. Similarly, when stopping a modem channel (using a GSI generic command issued from the AP), it's possible that multiple attempts will be required before the channel reaches STOPPED state. Add a field to the GSI structure to record an errno representing the result code provided when a generic command completes. If the result learned in gsi_isr_gp_int1() is RETRY, record -EAGAIN in the result code, otherwise record 0 for success, or -EIO for any other result. If we time out nf gsi_generic_command() waiting for the command to complete, return -ETIMEDOUT (as before). Otherwise return the result stashed by gsi_isr_gp_int1(). Add a loop in gsi_modem_channel_halt() to reissue the HALT command if the result code indicates -EAGAIN. Limit this to 10 retries (after the initial attempt). Signed-off-by: NAlex Elder <elder@linaro.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Alex Elder 提交于
IPA v4.2 has a hardware quirk that requires the AP to allocate GSI channels for the modem to use. It is recommended that these modem channels get stopped (with a HALT generic command) by the AP when its IPA driver gets removed. The AP has no way of knowing the current state of a modem channel. So when the IPA driver issues a HALT command it's possible the channel is not running, and in that case we get an error indication. This error simply means we didn't need to stop the channel, so we can ignore it. This patch adds an explanation for this situation, and arranges for this condition to *not* report an error message. Signed-off-by: NAlex Elder <elder@linaro.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Alex Elder 提交于
If the rmnet_ipa0 network device has not been opened at the time we remove or shut down the IPA driver, its underlying TX and RX GSI channels will not have been started, and they will still be in ALLOCATED state. The RESET command on a channel is meant to return a channel to ALLOCATED state after it's been stopped. But if it was never started, its state will still be ALLOCATED, the RESET command is not required. Quietly skip doing the reset without printing an error message if a channel is already in ALLOCATED state when we request it be reset. Signed-off-by: NAlex Elder <elder@linaro.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Alex Elder 提交于
When a GSI command is used to change the state of a channel or event ring we check the state before and after the command to ensure it is as expected. If not, we print an error message, but it does not include the channel or event ring id, and it easily can. Add the channel or event ring id to these error messages. Signed-off-by: NAlex Elder <elder@linaro.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
Alex Elder says: ==================== net: ipa: platform-specific clock and interconnect rates This series changes the way the IPA core clock rate and the bandwidth parameters for interconnects are specified. Previously these were specified with hard-wired constants, with the same values used for the SDM845 and SC7180 platforms. Now these parameters are recorded in platform-specific configuration data. For the SC7180 this means we use an all-new core clock rate and interconnect parameters. Additionally, while developing this I learned that the average bandwidth setting for two of the interconnects is ignored (on both platforms). Zero is now used explicitly as that unused bandwidth value. This means the SDM845 bandwidth settings are also changed by this series. ==================== Link: https://lore.kernel.org/r/20201119224041.16066-1-elder@linaro.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Alex Elder 提交于
Stop assuming a fixed IPA core clock rate and interconnect bandwidths. Use the configuration data defined for these things instead. Get rid of the previously-used constants. Signed-off-by: NAlex Elder <elder@linaro.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Alex Elder 提交于
Populate the core clock rate and interconnect average and peak bandwidth data for SDM845 and SC7180 in their configuration data files. At this point we still don't *use* this data. Note that SC7180 actually defines a new core clock rate (100 MHz instead of 75 MHz) and new interconnect bandwidth values. They will be activated in the next commit, which uses the configured values rather than the fixed constants. Signed-off-by: NAlex Elder <elder@linaro.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Alex Elder 提交于
Define a new type of configuration data, used to initialize the IPA core clock and interconnects. This is the first of three patches, and defines the data types and interface but doesn't yet use them. Switch the return value if there is no matching configuration data to ENODEV instead of ENOTSUPP (to avoid using the nonstandard errno). Signed-off-by: NAlex Elder <elder@linaro.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Grygorii Strashko 提交于
The mdio_bus may have dependencies from GPIO controller and so got deferred. Now it will print error message every time -EPROBE_DEFER is returned which from: __mdiobus_register() |-devm_gpiod_get_optional() without actually identifying error code. "mdio_bus 4a101000.mdio: mii_bus 4a101000.mdio couldn't get reset GPIO" Hence, suppress error message for devm_gpiod_get_optional() returning -EPROBE_DEFER case by using dev_err_probe(). Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com> Link: https://lore.kernel.org/r/20201119203446.20857-1-grygorii.strashko@ti.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Heiner Kallweit 提交于
Tiny improvement, let dev_err_probe() deal with EPROBE_DEFER. Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/b0c4ebcf-2047-e933-b890-8a20e4bdb19f@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Heiner Kallweit 提交于
Some chip versions have a hw bug resulting in lost door bell rings. To work around this the doorbell is also rung whenever we still have tx descriptors in flight after having cleaned up tx descriptors. These PCI(e) writes come at a cost, therefore let's reduce the number of extra doorbell rings. If skb is NULL then this means: - last cleaned-up descriptor belongs to a skb with at least one fragment and last fragment isn't marked as sent yet - hw is in progress sending the skb, therefore no extra doorbell ring is needed for this skb - once last fragment is marked as transmitted hw will trigger a tx done interrupt and we come here again (with skb != NULL) and ring the doorbell if needed Therefore skip the workaround doorbell ring if skb is NULL. Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/0a15a83c-aecf-ab51-8071-b29d9dcd529a@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
Mat Martineau says: ==================== mptcp: More miscellaneous MPTCP fixes Here's another batch of fixup and enhancement patches that we have collected in the MPTCP tree. Patch 1 removes an unnecessary flag and related code. Patch 2 fixes a bug encountered when closing fallback sockets. Patches 3 and 4 choose a better transmit subflow, with a self test. Patch 5 adjusts tracking of unaccepted subflows Patches 6-8 improve handling of long ADD_ADDR options, with a test. Patch 9 more reliably tracks the MPTCP-level window shared with peers. Patch 10 sends MPTCP-level acknowledgements more aggressively, so the peer can send more data without extra delay. ==================== Link: https://lore.kernel.org/r/20201119194603.103158-1-mathew.j.martineau@linux.intel.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Send timely MPTCP-level ack is somewhat difficult when the insertion into the msk receive level is performed by the worker. It needs TCP-level dup-ack to notify the MPTCP-level ack_seq increase, as both the TCP-level ack seq and the rcv window are unchanged. We can actually avoid processing incoming data with the worker, and let the subflow or recevmsg() send ack as needed. When recvmsg() moves the skbs inside the msk receive queue, the msk space is still unchanged, so tcp_cleanup_rbuf() could end-up skipping TCP-level ack generation. Anyway, when __mptcp_move_skbs() is invoked, a known amount of bytes is going to be consumed soon: we update rcv wnd computation taking them in account. Additionally we need to explicitly trigger tcp_cleanup_rbuf() when recvmsg() consumes a significant amount of the receive buffer. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Florian Westphal 提交于
OoO handling attempts to detect when packet is out-of-window by testing current ack sequence and remaining space vs. sequence number. This doesn't work reliably. Store the highest allowed sequence number that we've announced and use it to detect oow packets. Do this when mptcp options get written to the packet (wire format). For this to work we need to move the write_options call until after stack selected a new tcp window. Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Geliang Tang 提交于
This patch added IPv6 support for do_transfer, and the test cases for ADD_ADDR IPv6. Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NGeliang Tang <geliangtang@gmail.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Geliang Tang 提交于
When ADD_ADDR suboption includes an IPv6 address, the size is 28 octets. It will not fit when other MPTCP suboptions are included in this packet, e.g. DSS. So here we send out an ADD_ADDR dedicated packet to carry only ADD_ADDR suboption, no other MPTCP suboptions. In mptcp_pm_announce_addr, we check whether this is an IPv6 ADD_ADDR. If it is, we set the flag MPTCP_ADD_ADDR_IPV6 to true. Then we call mptcp_pm_nl_add_addr_send_ack to sent out a new pure ACK packet. In mptcp_established_options_add_addr, we check whether this is a pure ACK packet for ADD_ADDR. If it is, we drop all other MPTCP suboptions in this packet, only put ADD_ADDR suboption in it. Suggested-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NGeliang Tang <geliangtang@gmail.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Geliang Tang 提交于
This patch changed the 'add_addr_signal' type from bool to char, so that we could encode the addr type there. Suggested-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NGeliang Tang <geliangtang@gmail.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
This will simplify all operation dealing with subflows before accept time (e.g. data fin processing, add_addr). The join list is already flushed by mptcp_stream_accept() before returning the newly created msk to the user space. This also fixes an potential bug present into the old code: conn_list was manipulated without helding the msk lock in mptcp_stream_accept(). Tested-by: NGeliang Tang <geliangtang@gmail.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Florian Westphal 提交于
Add a test case where a link fails with multiple subflows. The expectation is that MPTCP will transmit any data that could not be delivered via the failed link on another subflow. Co-developed-by: NGeliang Tang <geliangtang@gmail.com> Signed-off-by: NGeliang Tang <geliangtang@gmail.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Florian Westphal 提交于
In case a subflow path is blocked, MPTCP-level retransmit may not take place anymore because such subflow is likely to have unacked data left in its write queue. Ignore subflows that have experienced loss and test next candidate. Fixes: 3b1d6210 ("mptcp: implement and use MPTCP-level retransmission") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
We need to cope with some more state transition for fallback sockets, or could still end-up moving to TCP_CLOSE too early and avoid spooling some pending data Fixes: e16163b6 ("mptcp: refactor shutdown and close") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Only mptcp_close() can actually cancel the workqueue, no need to add and use this flag. Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
Ido Schimmel says: ==================== mlxsw: Add support for nexthop objects This patch set adds support for nexthop objects in mlxsw. Nexthop objects are treated as another front-end for programming nexthops, in addition to the existing IPv4 and IPv6 front-ends. Patch #1 registers a listener to the nexthop notification chain and parses the nexthop information into the existing mlxsw data structures that are already used by the IPv4 and IPv6 front-ends. Blackhole nexthops are currently rejected. Support will be added in a follow-up patch set. Patch #2 extends mlxsw to resolve its internal nexthop objects from the nexthop identifier encoded in the FIB info of the notified routes. Patch #3 finally removes the limitation of rejecting routes that use nexthop objects. Patch #4 adds a selftest. Patches #5-#8 add generic forwarding selftests that can be used with veth pairs or physical loopbacks. ==================== Link: https://lore.kernel.org/r/20201119130848.407918-1-idosch@idosch.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Ido Schimmel 提交于
Add a nexthop objects version of gre_multipath.sh. Unlike the original test, it also tests IPv6 overlay which is not possible with the legacy nexthop implementation. See commit 9a2ad362 ("selftests: forwarding: gre_multipath: Drop IPv6 tests") for more info. Signed-off-by: NIdo Schimmel <idosch@nvidia.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Ido Schimmel 提交于
In a similar fashion to router_multipath.sh and its nexthop objects version router_mpath_nh.sh, create a nexthop objects version of router.sh. It reuses the same topology, but uses device-only nexthop objects instead of legacy ones. Signed-off-by: NIdo Schimmel <idosch@nvidia.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-